Work-Bench | Probservations S1E5: DevDay

OpenAI Dev Day was one to remember. A team with a seemingly infinite product roadmap and the shipping velocity to turn potential energy kinetic in compressed periods of time, OpenAI once again wowed us with announcements and launches. Three main things stood out to me: the Apps SDK, AgentKit, and Codex.

Apps SDK: built on MCP, the Apps SDK lets applications be discovered and used within ChatGPT, in the flow of conversation itself. It’s a win-win for OpenAI and developers alike: ChatGPT gets stronger and more powerful as more applications become “callable” via the Apps SDK, while those apps get distribution on one of the most ubiquitous applications out there today. One question on my mind: will other application layer companies with powerful distribution (Cursor, Decagon, etc) follow suite in building their own version of an Apps SDK? This would establish a bat signal for MCP servers to “register” themselves as invokable within a given application, increasing their distribution with as few lines of code as possible (while expanding the application’s backend with zero marginal work).

AgentKit: this is what OpenAI launched as a complete set of building blocks to help teams build, deploy, and optimize agentic workflows. It includes a visual canvas to create these flows (feels like a no-code workflow builder), a simple, embeddable chat interface, and evals. While Anthropic’s recent Agent SDK release takes the shape of an orchestration platform and feels more geared towards developers, AgentKit feels positioned towards the mass market, collapsing the technical threshold required to create agents.

Codex: OpenAI announced the general availability of Codex, their software engineering agent. Codex not only works in the IDE/terminal, but can also be invoked via Slack thanks to a new integration. What’s most astounding is how much OpenAI is using Codex internally: nearly every OpenAI PR goes through a Codex review, with engineers able to complete 70% more pull requests a week thanks to Codex. AI being used to build more AI, both at the infrastructure (LLMs used for memory compression, LLM-as-a-judge in evals, etc) and team (agent engineers using AI) level is fascinating to see, with the self-reinforcing flywheel only speeding up over time.

How will these hit the market? What does competition look like? The coding agent market in particular has two of the preeminent model labs attacking it in full force, in co-opetition with other players enabled by their very models. AgentKit feels more simple compared to a LangChain, but with abstraction comes trading off robustness— will be interesting to see which use cases lend themselves to AgentKit and which to competing platforms. The interesting question now isn’t whether these tools work, but who captures the downstream value when they do.

👋 I’m a Researcher at Work-Bench, a Seed stage enterprise-focused VC fund based in New York City. Our sweet spot for investment at Seed correlates with building out a startup’s early go-to-market motions.

Share