The NVIDIA Keynote speech, Agents, Factories and The Move To streamlined, Continuous AI Deployment

Written by Chris Davies | Mar 20, 2026 11:50:53 AM

The recent NVIDIA keynote set out a clear direction for AI. The key message is simple: the industry is moving from building models to running them continuously in real-world use.

At the centre of this is what can be described as the inference inflection.

Inference inflection is the point where the main cost, value and competitive advantage in AI shifts away from training models and towards running them at scale in live environments. Training happens occasionally. Inference happens every time a system is used. Over time, that becomes the dominant activity.

This matters because it changes how firms should think about AI. It is no longer a project or a capability sitting on the side. It becomes part of core infrastructure, embedded into day-to-day operations.

The keynote reinforced that AI is now a new computing model. Traditional software follows fixed rules. AI systems work on probabilities and patterns. That requires different architecture. NVIDIA’s position is that accelerated computing, built around GPUs and high-speed data movement, is now the foundation for this shift.

A second theme is the move to full-stack platforms. The focus is not just on chips, but on the whole environment needed to build, deploy and run AI. This includes hardware, software and integrated systems designed to handle both training and inference. The direction of travel is clear: reducing friction between development and deployment so AI can operate continuously.

The idea of “AI factories” was also central. These are systems where data goes in and decisions or outputs come out on an ongoing basis. This is a production model, not a one-off build. It reflects how AI is being used in practice, where models are constantly processing new inputs and generating outputs at scale.

The keynote also emphasised efficiency. As inference grows, the cost per interaction becomes critical. The focus is shifting to how many outputs can be generated per unit of compute, power and cost. This is less about building the largest model and more about running models efficiently in live environments.

Another point was the importance of data movement. As systems scale, the constraint is not just compute power but how quickly data can be moved between components. This brings networking and system design into focus. AI performance is now dependent on the whole system, not just individual processors.

There was also a clear shift towards practical use cases. AI is moving into everyday business processes, from customer interaction to operational workflows. This includes more specialised models trained for specific domains, rather than relying only on general-purpose systems.

For UK financial services, the implications are direct. The inference inflection aligns with how regulation is evolving. Requirements such as Consumer Duty, systems and controls, and operational resilience all point towards continuous monitoring and evidence, not periodic reviews.

This creates a move from:

periodic compliance reviews to continuous oversight e.g. a move to ongoing client file reviews that run in the background rather than small selections of client file exception reporting
static training to ongoing assessment of competence e.g. longitudinal data driven on ongoing usage of AI within the business
manual reporting to data-driven, real-time dashboards e.g. moving away from checklists and spreadsheets to streamlined data driven regulatory reporting

In practical terms, this means compliance and risk functions become part of live operations. Systems need to monitor advice quality, customer outcomes and risks as they happen, not after the event. This is at the heart of Model Office AI RegTech

The Nvidia keynote does not introduce a new concept of AI. It confirms a change in how it is used. The focus is now on running AI reliably, at scale, and at a sustainable cost.

The firms that adapt to this model will build systems around continuous decision-making and monitoring. Those that do not will remain dependent on retrospective processes that are harder to scale and harder to evidence.

Please click the below icon to learn more about MO RegTech today..

View full post