This post was originally published on The Data Source, my monthly newsletter covering the top investment themes across cloud-infrastructure, developer tools, and data. Subscribe here!
š“ Where Agent Compute is Going and Whoās Building It
If the last decade of infrastructure was about running code, the next one is about running cognition.
AI agents are moving from prototypes to production systems that reason, plan, and act. They donāt just execute commands; they decide what to do next. That shift is forcing a rethink of how compute, memory, and coordination work at every level of the stack.
Parts One and Two showed why agent workloads donāt fit todayās infrastructure and explored early architectures that might. This final part looks outward at the new companies being built, the primitives theyāre defining, and where the market is starting to converge.
The race is on to build the operating system of the agent era.
š¶āš«ļø Memory and State Become Core Infrastructure
Agents need to remember what theyāve done, learn from it, and share that knowledge with others.
That makes state and memory new building blocks for infrastructure.
On the hardware side, Compute Express Link (CXL) is enabling pooled, shared memory that no longer sits inside a single server. Companies like Astera Labs, Enfabrica, and Panmnesia are building controllers and fabrics that connect tens of terabytes of memory across racks. Liqid and GigaIO package this into ācomposableā systems that assemble GPUs and memory on demand, while MemVerge orchestrates those resources as a software service.
On the software side, memory is becoming a first-class system primitive. Pinecone, Weaviate, and Qdrant are evolving from search indexes into memory substrates, separating compute from storage so memory can scale elastically. Mem0 and Zep focus directly on agent recall, maintaining versioned and time-aware context for long-running sessions.
Newer entrants are expanding what āmemoryā means altogether. Letta provides a full framework for stateful agents with persistent memory and visual debugging. Cognee combines graph and vector memory to capture relationships and semantic context. Chroma continues to anchor many open-source stacks as a lightweight memory backend. And early open-source projects like Memoripy and Memary are experimenting with durable agent recall and time-based retrieval, showing how grassroots development is pushing this layer forward.
The open question is whether these pooled and programmable memory systems will remain internal to hyperscalers or become shared primitives across clouds. And, most importantly, how theyāll handle synchronization when multiple agents update the same state in parallel.
š Compute Goes Portable
Agent workloads no longer look like standard inference jobs. They chain model calls, run tools, and adjust on the fly. That shift is pushing compute toward portability, where workloads can run anywhere, on any accelerator.
Modular is building a cross-hardware runtime and its Mojo language to unify performance across chips. Work-Bench portfolio company, Runhouse, makes it easy for developers to run long-lived, stateful jobs such as reinforcement learning and continuous model training, without managing complex infrastructure. Ray Serve is standardizing multi-model serving with predictable latency and autoscaling.
At the serving layer, Together AI, Baseten, and Fireworks AI are building platforms that make deployment portable and hardware-agnostic. They allow teams to serve and scale models across clouds, abstracting away GPU management and inference optimization.
The next generation of compute will make moving between CPUs, GPUs, and custom accelerators as seamless as shifting between cloud regions today.
š¦ From Single Agents to Systems of Agents
Once agents begin handling real workflows, they must evolve from single chatbots into systems of collaborating agents. That requires orchestration: coordination, handoff logic, error recovery, and shared state management.
LangGraph Platform enables designing and debugging complex agent workflows with checkpointing, CrewAI supports deploying collaborative agent fleets in cloud or self-hosted environments, and Dust leans into the Model Context Protocol (MCP) for standardized tool and data connectivity. At the same time, open frameworks like Haystack provide pipeline logic and branching to build agentic systems at scale, and AutoGen offers a core orchestration model for agent networks, mixing orchestrator and worker agents. On the tool integration side, Composio helps agents connect to external systems and application triggers, enabling orchestrated agents to interact with real-world services.
This multi-agent orchestration layer is what distinguishes toy demos from production-grade systems that enterprises can trust and run reliably.
š Model Gateways Become Control Planes
As organizations rely on dozens of model providers, routing and governance become critical. OpenRouter unifies access and billing through a single API. Portkey adds enterprise-grade policy and early MCP compatibility so teams can route both models and tools in one place. LiteLLM offers an open-source gateway teams can run privately to manage spend, rate limiting, and model failover.
Martian offers model routing that dynamically directs each prompt to the best model based on cost and correctness. Nexos AI operates as a unified AI gateway, aggregating MCP servers and LLMs while enforcing governance and security.
Over time, these gateways will optimize not just for latency or price but for correctness, safety, and trust, deciding which model an agent should use for a given task.
š² Hybrid and Edge Execution Take Hold
Not every agent step belongs in the cloud. Some need to run near the user or data for privacy and speed.
Vercelās Edge Runtime allows functions to execute right at the edge, reducing round-trip delay. Fluid Compute extends that idea by combining the flexibility of serverless with the concurrency of traditional servers so workloads stay active and responsive under changing load. Cloudflare Workers AI and Modal provide serverless GPUs that spin up instantly and scale to zero when idle. Fermyon and Akamai deliver WebAssembly-based execution globally for ultra-low latency inference.
On the hardware side, Hailo, SiMa.ai, and EdgeCortix are building processors optimized for on-premise and embedded AI.
Expect early adoption in sectors such as healthcare, finance, and industrial systems where low latency and data control both matter.
šā⬠AgentOps and Identity Define Trust
Once agents run in production, visibility and security become paramount. LangSmith, Traceloop, and Arize bring observability, evaluation, and tracing into AI pipelines so teams can debug, monitor drift, and root out failure modes. Helicone supplements with logging, routing, and cost tracking as a proxy layer. For identity and permissions, Astrix and Aembit ensure each agent receives scoped credentials and audit trails, while Descope builds an agentic identity control plane to govern lifecycle and policy across human and AI identities. On the security side, Lasso Security layer in runtime guardrails, red-teaming, and threat detection to control the behavior and safety of agents in production.
AgentOps is becoming what DevOps was a decade ago: the backbone between experimentation and enterprise reliability.
š® The Bottom Line
Every shift in compute begins with inefficiency and ends with new platforms.
Virtualization, containers, and serverless followed that pattern. Agents are doing it again, exposing gaps in how todayās systems handle state, coordination, and control.
Company formation is clustering around these pressure points. Teams are rebuilding core primitives across the stack: memory that can persist and be shared, runtimes that move seamlessly across hardware, orchestration layers that manage systems of agents, gateways that route models with governance, and security layers that give every agent an identity.
This is not a tooling cycle. It is an infrastructure reset. The winners will make state portable, compute flexible, orchestration dependable, and identity verifiable. These capabilities will define the next generation of enterprise platforms.
š Call for Startups
If you are building in this direction, rethinking how intelligence runs, scales, and secures itself, I want to hear from you.
Reach me at priyanka@work-bench.com.
š Iām a Principal at Work-Bench, a Seed stage enterprise-focused VC fund based in New York City. Our sweet spot for investment at Seed correlates with building out a startupās early go-to-market motions. In the cloud-native infrastructure and developer tool ecosystem, weāve invested in companies like Cockroach Labs, Run.house, Prequel.dev, Autokitteh and others.