Work-Bench | Work-Bench AI Snapshot H1'26

In the first half of 2026, software has started to build itself. Below is our mid-year view on where enterprise AI stands right now and where it's headed.

‍Join us July 1 at 1pm ET as we walk through our Work-Bench AI Snapshot report live. Sign up here. Full deck here.

‍

Four years ago, ChatGPT felt like magic. Today, software is starting to build itself. The pace of change has been staggering: AI went from answering questions, to writing code, to completing entire workflows with minimal human involvement. The gap between thinking and doing is shrinking. Agents are escaping the chatbox and becoming coworkers, operators, and increasingly, users of software themselves.

For founders, operators, and investors alike, many of the assumptions that defined the SaaS era are being rewritten in real time. As early stage investors in enterprise software, we're constantly studying these shifts. We don't pretend to know what AI will look like six months from now, but we can be students of the present. This report is our attempt to capture the state of AI in June 2026: what's happening at the frontier, what patterns are emerging, and where we believe the most important opportunities and challenges are beginning to take shape.

Here are a few themes:

The Author of Code Has Shifted from Human to Machine

When early instantiations of AI coding products like GitHub Copilot came to market, it was simultaneously obvious that this would be something big yet impossible to know how fast the change would occur. The early days of AI coding was an LLM being an accelerant to a developer's workflow; now the LLM is the developer.

This has also changed the form factor that AI products have taken. We now have "software factories", which are a set of coding agents working asynchronously in the background to create software. The behavior change this has inspired is extraordinary! Engineers will send a Slack message to the coding agent describing a change they want to make while cooking dinner, and by the time they're eating their food - the change would've been made.

While it's still super early days, the amount of engineering activity that has begun to be routed through these software factories is astounding: 57% of Ramp's PRs were written by their background agent Inspect, while Stripe's Minions and Spotify's Honk both merge over 1,000 PRs on a weekly basis.

That said, shipping faster has created real reliability questions. An AI agent destroyed one company's production data and admitted to it. The industry is still figuring out how to have both speed and trust.

The Harness

The last few years have shown us that while the model is important, the model alone isn't enough for great applications. The harness, which is everything outside of the model, is incredibly key too.

One can think of the relationship between the model and harness as similar to that of the semiconductor and operating system. LLMs are very much like semiconductors: horizontal pieces of technology that can be used for a wide variety of purposes. It's the harness that truly makes it production-usable and translates those capabilities to certain tasks.

What this implies for the longer arc: when frontier models commoditize, the harness is what matters. The moat moves from the intelligence to the scaffolding around it.

The Token-Maxxing Party

There's a really good mental framework called Goodhart's Law. It claims that when a measure becomes a target, it ceases to be a good measure. And we've seen this play out with the token-maxing phenomena in AI. While it's obvious that people who experiment with new technology early in a paradigm shift tend to be better off than those who wait on the sidelines, letting a single number (tokens) define effective AI use and having that be hyper-optimized for across an entire ecosystem is bound to have an impact.

What comes next looks a lot like what happened with cloud spend. It took a decade for cloud FinOps to fully mature: telemetry, attribution, cost governance, unit economics per workload. AI is speedrunning that same cycle.

The instrumentation stack for managing AI spend is starting to form!

OpenClaw and CUA

OpenClaw was a watershed moment for AI. Its nature of being always-on, residing wherever the user wanted to use it from (WhatsApp, Slack, iMessage, etc), and having access to a user's machine and files made it a complete expression of what's possible when agents broke out of the chatbox. OpenClaw became the fastest growing open source project in the history of software, now sitting at over 370K GitHub stars.

It wasn't just a product launch, it was a movement. Lobster suits on the YC podcast. OpenClaw meetups throughout the country. Mac minis sold out for months. ClawCon filled a room in New York. When a piece of software generates that kind of energy, something real is happening.

But as our friends in the cybersecurity world know, with every expansion in technological capability comes an expansion in the attack surface. With computer use agents being able to click on a screen and drive actions like a human would, without needing an API, there's a gap in visibility that IT can't see. It rhymes with the early days of SaaS, when business users could download software on the cloud without direct IT approval. This gap in the cloud era led to the rise of CASB, with great companies such as Netskope and Zscaler, and we expect it to produce some great AI security companies as well.

SaaS Was Built for Humans and Agents Change All of It

Every layer of the SaaS model assumes a human on the other end. Features get designed for screens, business models get built on seats, and distribution works because a person discovers, evaluates, and buys.

Agents break all of it. The user is now human and/or an agent. The moat is no longer stored data but accumulated context, the memory of workflows, preferences, and behavior the agent builds over time. Distribution changes too. An agent discovers or composes the product, not a person clicking through a website.

Companies that haven't thought about what their product looks like to an agent, whether it can find them, transact without a human, and operate them as infrastructure, are going to be caught flat-footed.

Exciting Tech at The Frontier

One theme we're fascinated by is the emergence of continual learning. This is defined as the ability for an AI system to improve from its own experience after training ends, the way an employee gets better at a job by doing it. While continual learning has traditionally been thought of in the realm of the model, we're beginning to see the early innings of it being applied at the harness layer as well.

Case in point, Harvey has started to run an autoresearch loop to enable its harness to self-improve. After each run, the LLM-as-a-judge system grades performance, and hypotheses around why failures happen lead to the harness being edited accordingly. This led the harness to improve in emergent ways that couldn't have been predicted by the Harvey team.

This points toward what we think is the most interesting question in enterprise AI right now. What does it mean to own your loop? The workflows your team runs, the evals you build, the institutional memory you accumulate. That context is yours. When models commoditize, the companies that have been quietly compounding their loop will have built something that travels with them regardless of which model they use.