Work-Bench | Probservations S1E3: Context Engineering

The intelligence of a frontier LLM is off the charts. These models know something about quite literally every subject, can read 100 pages in less than a minute, and have a far larger working memory than any human. What’s the constraint to full effectiveness? Context. Just as the smartest human won’t be able to perform effectively without full context around what they’re being asked to do, the same applies to models. If prompt engineering was about optimizing the input for chatbots, context engineering takes it to the next level, essentially programming and shaping what’s in an LLM's context window such that the agent operates well in the wild.

In this new world, every human essentially becomes a context engineer. With these intelligent models at our fingertips giving us near-zero marginal cost of cognition, what matters is how we build a world around said unit of cognition (especially because the agent can’t natively recreate this themselves). As Mitchell Troyanovsky from Basis has recently talked about, building stable, coherent context ontologies for agents will be a lot of what matters en route to ramping them into full production. It’s equipping them with a shared source of truth for their specific task, a sense of background knowledge such that as few agentic decisions can be made on the backs of guessing as possible.

In the last few years, the industry converged upon a buzzword that aimed to articulate a best practice around optimizing an LLM’s context - RAG (retrieval augmented generation). While retrieval is still undoubtedly important and vector databases as a technology aren’t going anywhere (though Claude Code has shown that retrieval using tool calls and file search is fairly effective!), a broader space of best practices around context engineering are beginning to emerge (shoutout Lance Martin from LangChain). Context offloading, the act of storing information outside the LLM’s context, has proven a good way to preserve context coherence while preventing context overflow. Context reduction involves the delicate act of boiling down some set of context into a shorter, high-signal summary; Cognition does this with their context compression LLM. These are both important because of context rot: a study from Chroma found that as the number of tokens in the context window increases, the model’s ability to accurately recall information from that context decreases. Lastly, with multi-agent systems rising as an effective way to build agentic applications, context isolation and distributing context across sub-agents such that each has what’s most relevant to them is also a growing area of work. My teammate Jonathan Lehr aptly thinks of this as Hadoop for agentic context, and it will become more and more important as the scale of agentic workloads require parallelization and distribution of resources.

Still very early but looking to spend a lot more time here!

👋 I’m a Researcher at Work-Bench, a Seed stage enterprise-focused VC fund based in New York City. Our sweet spot for investment at Seed correlates with building out a startup’s early go-to-market motions.

Share