This post was originally published on The Data Source, my monthly newsletter covering the top investment themes across cloud-infrastructure, developer tools, and data. Subscribe here!

🚓 Where Agent Compute is Going and Who’s Building It

If the last decade of infrastructure was about running code, the next one is about running cognition.

AI agents are moving from prototypes to production systems that reason, plan, and act. They don’t just execute commands; they decide what to do next. That shift is forcing a rethink of how compute, memory, and coordination work at every level of the stack.

Parts One and Two showed why agent workloads don’t fit today’s infrastructure and explored early architectures that might. This final part looks outward at the new companies being built, the primitives they’re defining, and where the market is starting to converge.

The race is on to build the operating system of the agent era.

šŸ˜¶ā€šŸŒ«ļø Memory and State Become Core Infrastructure

Agents need to remember what they’ve done, learn from it, and share that knowledge with others.

That makes state and memory new building blocks for infrastructure.

On the hardware side, Compute Express Link (CXL) is enabling pooled, shared memory that no longer sits inside a single server. Companies like Astera Labs, Enfabrica, and Panmnesia are building controllers and fabrics that connect tens of terabytes of memory across racks. Liqid and GigaIO package this into ā€œcomposableā€ systems that assemble GPUs and memory on demand, while MemVerge orchestrates those resources as a software service.

On the software side, memory is becoming a first-class system primitive. Pinecone, Weaviate, and Qdrant are evolving from search indexes into memory substrates, separating compute from storage so memory can scale elastically. Mem0 and Zep focus directly on agent recall, maintaining versioned and time-aware context for long-running sessions.

Newer entrants are expanding what ā€œmemoryā€ means altogether. Letta provides a full framework for stateful agents with persistent memory and visual debugging. Cognee combines graph and vector memory to capture relationships and semantic context. Chroma continues to anchor many open-source stacks as a lightweight memory backend. And early open-source projects like Memoripy and Memary are experimenting with durable agent recall and time-based retrieval, showing how grassroots development is pushing this layer forward.

The open question is whether these pooled and programmable memory systems will remain internal to hyperscalers or become shared primitives across clouds. And, most importantly, how they’ll handle synchronization when multiple agents update the same state in parallel.

šŸŽ’ Compute Goes Portable

Agent workloads no longer look like standard inference jobs. They chain model calls, run tools, and adjust on the fly. That shift is pushing compute toward portability, where workloads can run anywhere, on any accelerator.

Modular is building a cross-hardware runtime and its Mojo language to unify performance across chips. Work-Bench portfolio company, Runhouse, makes it easy for developers to run long-lived, stateful jobs such as reinforcement learning and continuous model training, without managing complex infrastructure. Ray Serve is standardizing multi-model serving with predictable latency and autoscaling.

At the serving layer, Together AI, Baseten, and Fireworks AI are building platforms that make deployment portable and hardware-agnostic. They allow teams to serve and scale models across clouds, abstracting away GPU management and inference optimization.

The next generation of compute will make moving between CPUs, GPUs, and custom accelerators as seamless as shifting between cloud regions today.

šŸ¦‘ From Single Agents to Systems of Agents

Once agents begin handling real workflows, they must evolve from single chatbots into systems of collaborating agents. That requires orchestration: coordination, handoff logic, error recovery, and shared state management.

LangGraph Platform enables designing and debugging complex agent workflows with checkpointing, CrewAI supports deploying collaborative agent fleets in cloud or self-hosted environments, and Dust leans into the Model Context Protocol (MCP) for standardized tool and data connectivity. At the same time, open frameworks like Haystack provide pipeline logic and branching to build agentic systems at scale, and AutoGen offers a core orchestration model for agent networks, mixing orchestrator and worker agents. On the tool integration side, Composio helps agents connect to external systems and application triggers, enabling orchestrated agents to interact with real-world services.

This multi-agent orchestration layer is what distinguishes toy demos from production-grade systems that enterprises can trust and run reliably.

šŸ‘ž Model Gateways Become Control Planes

As organizations rely on dozens of model providers, routing and governance become critical. OpenRouter unifies access and billing through a single API. Portkey adds enterprise-grade policy and early MCP compatibility so teams can route both models and tools in one place. LiteLLM offers an open-source gateway teams can run privately to manage spend, rate limiting, and model failover.

Martian offers model routing that dynamically directs each prompt to the best model based on cost and correctness. Nexos AI operates as a unified AI gateway, aggregating MCP servers and LLMs while enforcing governance and security.

Over time, these gateways will optimize not just for latency or price but for correctness, safety, and trust, deciding which model an agent should use for a given task.

šŸ² Hybrid and Edge Execution Take Hold

Not every agent step belongs in the cloud. Some need to run near the user or data for privacy and speed.

Vercel’s Edge Runtime allows functions to execute right at the edge, reducing round-trip delay. Fluid Compute extends that idea by combining the flexibility of serverless with the concurrency of traditional servers so workloads stay active and responsive under changing load. Cloudflare Workers AI and Modal provide serverless GPUs that spin up instantly and scale to zero when idle. Fermyon and Akamai deliver WebAssembly-based execution globally for ultra-low latency inference.

On the hardware side, Hailo, SiMa.ai, and EdgeCortix are building processors optimized for on-premise and embedded AI.

Expect early adoption in sectors such as healthcare, finance, and industrial systems where low latency and data control both matter.

šŸˆā€ā¬› AgentOps and Identity Define Trust

Once agents run in production, visibility and security become paramount. LangSmith, Traceloop, and Arize bring observability, evaluation, and tracing into AI pipelines so teams can debug, monitor drift, and root out failure modes. Helicone supplements with logging, routing, and cost tracking as a proxy layer. For identity and permissions, Astrix and Aembit ensure each agent receives scoped credentials and audit trails, while Descope builds an agentic identity control plane to govern lifecycle and policy across human and AI identities. On the security side, Lasso Security layer in runtime guardrails, red-teaming, and threat detection to control the behavior and safety of agents in production.

AgentOps is becoming what DevOps was a decade ago: the backbone between experimentation and enterprise reliability.

šŸ”® The Bottom Line

Every shift in compute begins with inefficiency and ends with new platforms.

Virtualization, containers, and serverless followed that pattern. Agents are doing it again, exposing gaps in how today’s systems handle state, coordination, and control.

Company formation is clustering around these pressure points. Teams are rebuilding core primitives across the stack: memory that can persist and be shared, runtimes that move seamlessly across hardware, orchestration layers that manage systems of agents, gateways that route models with governance, and security layers that give every agent an identity.

This is not a tooling cycle. It is an infrastructure reset. The winners will make state portable, compute flexible, orchestration dependable, and identity verifiable. These capabilities will define the next generation of enterprise platforms.

šŸ’Œ Call for Startups

If you are building in this direction, rethinking how intelligence runs, scales, and secures itself, I want to hear from you.

Reach me at priyanka@work-bench.com.

šŸ‘‹ I’m a Principal at Work-Bench, a Seed stage enterprise-focused VC fund based in New York City. Our sweet spot for investment at Seed correlates with building out a startup’s early go-to-market motions. In the cloud-native infrastructure and developer tool ecosystem, we’ve invested in companies like Cockroach Labs, Run.house, Prequel.dev, Autokitteh and others.