Everyone knows this pain with LLMs: change one word in a prompt and the output shifts entirely.

The non-deterministic nature of LLMs is both a feature and a bug. Monolithic prompting systems are fragile not only to micro-changes in prompt wording, but also changes in the underlying models (what worked for GPT-4 breaks for Claude 4.5). Each new model release means rebuilding your entire system prompt from scratch.

What if prompting worked like containers (build once, run anywhere) and prompts could improve themselves over time?

That’s the promise of DSPy (Declarative Self-Improving Python). It turns prompting into programming. Part of DSPy’s language are two key concepts: signatures and modules:

– Signatures declare the intended input→output behavior (e.g. question→answer)

– Modules define reusable strategies (Chain of Thought, ReAct, Multi-Chain Comparison, etc.)

But the real magic is the self-improving loop. DSPy uses evals and optimizers to measure what “good” looks like, then automatically adjusts prompts to maximize performance, without needing to touch model weights.

With DSPy, prompts become a first-class building block. Compound AI systems are here to stay, with these reasoning programs essentially the application logic for those systems in this new tech era.If you’re in NYC and working with DSPy (or building novel ways to harness LLMs), I’d love to connect.

👋 I’m a Researcher at Work-Bench, a Seed stage enterprise-focused VC fund based in New York City. Our sweet spot for investment at Seed correlates with building out a startup’s early go-to-market motions.