Blog

Why traditional cloud serverless architectures can’t keep up with AI agents… and what comes next

August 6, 2025

By Mathis Joffre

As AI agents grow in complexity and capability, organizations are racing to deploy them across real-world environments. But many hit an unexpected bottleneck: the limitations of today’s serverless cloud infrastructure.

Traditional cloud platforms, designed for stateless web services, short-lived compute, and microservices operating in fixed environments, are poorly suited to the needs of autonomous, long-running, memory-aware agents that continuously adapt and interact with the outside world.

Unlike batch jobs or ephemeral functions, AI agents resemble decision-makers more than software scripts. They require consistent context, smart routing to models and APIs, and lifecycles that defy the on/off logic of classic serverless. Supporting them demands infrastructure that goes far beyond just spinning containers or scaling replicas.

Modern cloud providers offer flexibility, but with foundational assumptions that start to break when AI agents enter the picture:

● Workloads are stateless

● Compute is short-lived

● Scaling is horizontal, not adaptive

● Usage is either bursty (serverless) or persistent (VMs/containers)

These assumptions work for microservices and ML training jobs, but fail when applied to agentic AI systems. Here are a few reasons why:

1. Agents Need Persistent Memory

AI agents must preserve context across tasks and sessions, sometimes indefinitely. Stateless environments require external storage for memory, which adds latency, complexity, and cost. In-process memory offers better performance, but challenges durability and scalability. Neither option fits neatly into current serverless paradigms.

2. Cold Starts Kill UX

Cold starts are tolerable for event-driven APIs, but fatal for real-time AI agents. Whether the agent is helping a user in conversation or managing a complex task, multi-second boot times break continuity and degrade trust.

3. Usage Patterns Defy Traditional Billing

AI agents often idle, awaiting input, then burst into activity. Keeping them fully provisioned is expensive. Suspending and resuming them should be seamless, but current platforms aren’t

designed for this kind of fine-grained lifecycle. The result is either inefficiency or poor performance.

4. Observability Doesn’t Go Deep Enough

Typical observability stacks give you logs, traces, and metrics, but not insight into how an agent makes decisions. Developers need to understand prompt chains, memory operations, reasoning steps, and model outputs. These layers are invisible in most cloud-native observability tools.

5. Model Access and Switching Is Clunky

AI agents increasingly rely on multiple models, from LLMs to vector stores to speech-to-text APIs. But integrating and orchestrating these tools across providers is manual, rigid, and error-prone. Cloud platforms don’t yet provide unified, dynamic access to AI models.

What Agent-First Infrastructure Looks Like

Meeting the needs of intelligent agents requires a rethink of cloud design. A new generation of infrastructure, sometimes called agent-native or neo-cloud, is emerging to support this shift. It has several defining characteristics:

1. Native Agent Lifecycle Management

Instead of stateless functions, platforms must support persistent runtimes that allow agents to pause, hibernate, and resume while maintaining identity and memory. This lifecycle support allows for efficiency, continuity, and more human-like interactions.

2. Model-Agnostic Routing

Agents should be able to dynamically choose and switch between models based on cost, performance, or fallback logic. Next-gen infrastructures abstract the model layer, allowing developers to build flexible, resilient agents without hardcoding model dependencies.

3. Cost Models Aligned with Usage

Since agents are not always active, pricing models need to account for idle time, memory persistence, and on-demand activation. Smarter metering, down to the agent or session level, enables both scale and sustainability.

4. Deep Observability for Reasoning Systems

To debug and trust agents, developers need access to:

● Prompt and chain-of-thought logs

● Memory access and updates

● Model choices and confidence levels

This visibility is critical not only for debugging but for safety, performance tuning, and user trust. 5. Smarter, Suspendable Serverless

Rather than keeping runtimes hot or starting from scratch, next-gen platforms suspend agents quickly and resume them in milliseconds. This makes it practical and affordable to run thousands of concurrent, semi-idle agents that stay context-aware and responsive.

AI agents represent a fundamental shift in how we think about computation. They’re not just code snippets; they are autonomous, memory-driven systems capable of learning and evolving over time. Supporting them requires cloud infrastructure designed with this new reality in mind.

From real-time strategy bots to customer support agents and research assistants, many AI use cases simply don’t scale or perform, on legacy serverless. But agent-native infrastructure removes those barriers, enabling applications that once seemed out of reach.

The future of cloud isn’t just faster; it’s smarter, more adaptive, and built to support intelligence, not just compute.

About the Author

Mathis Joffre is co‑founder and engineering lead at Blaxel, a next‑generation “neo‑cloud” platform designed to support AI agents. He previously spent three years at OVHcloud, Europe’s largest cloud provider, driving AI and data infrastructure R&D and enabling critical transformation initiatives. Since co-founding Blaxel in September 2024, Mathis has helped build its core infrastructure, including serverless agent runtimes, global model gateways, and developer observability tools. He is passionate about scaling agentic workloads and bridging the gap between research prototypes and production‑grade systems.

How ‘semantic chaining’ jailbreaks image generation models

How Sakana AI’s new technique solves the problems of long-context LLM…

Smarter trade: How AI turns regulatory burden into competitive edge

Recursive Language Models: A new framework for infinite context in LLMs

Microsoft’s new Rho-alpha model brings tactile sensing to robotics

Applied ML: When ‘perfect’ becomes the enemy of ‘good’

AI can’t replace software engineers yet, but here is how to…

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

The evolution of LLM tool-use from API calls to agentic applications

What makes DeepSeek-V3.2 so efficient?

What to know about Claude Opus 4.5

OpenAI’s GPT-5: A reality check for the AI hype train

OpenAI’s grand return to open source: unpacking the gpt-oss release

AI is writing your code, but who’s reviewing it?

Machine learning in space: Building intelligent systems for the harshest environments

Decoding the brain, inspiring AI: How Rahul Biswas is bridging neuroscience…

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Why traditional cloud serverless architectures can’t keep up with AI agents… and what comes next

Like this:

Leave a ReplyCancel reply

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks