Reinforcement Learning’s Last Mile Problem

We are thrilled to announce our latest investment as we lead Runhouse’s $5M Seed round, with participation from Hetz Ventures and Fathom Capital.

As AI labs increasingly rely on synthetic data to improve their models for high-value enterprise tasks, leading Fortune 500s are recognizing the immense value of their proprietary data for training powerful, domain-adapted AI agents (“post-training”). However, while reinforcement learning (RL) methods for agentic training have matured, the biggest barrier to mobilizing data is lagging ML infrastructure. Many organizations still rely on outdated tools like AWS SageMaker and Kubeflow, which are insufficient for the speed, customization, and fault tolerance required by modern Machine Learning and Reinforcement Learning workloads.

Runhouse is bringing AI post-training out of the lab and into production for enterprises and AI-native teams. Their flagship product, Kubetorch, allows ML teams to build and scale magically better models with proprietary data to power the next generation of AI use cases.

Why Now

Enterprises are reaching a turning point in how they build and adopt AI. Frontier labs like OpenAI still dominate large-scale pretraining, creating foundation models with hundreds of thousands of GPUs. However, real differentiation is shifting to post-training, where models are adapted with proprietary data and simulation environments to excel at high-value agentic tasks.

DeepSeek released its groundbreaking R1 model in January 2025, with novel reinforcement learning (RL) training methods. Subsequently, the accuracy of AI code agents more than doubled. Now, post-training is being applied to power the next wave of innovation across domains like law, finance, healthcare, life sciences, and mechanical engineering, where even the largest models still struggle with production-level accuracy.

However, unlike coding models or general-purpose language tasks, the data and workflows in specialized enterprise domains are entirely proprietary. Frontier AI labs try to bridge this gap by buying synthetic datasets from providers like Scale AI or Mercor, but this approach is costly and limited. Meanwhile, enterprises already sit on the data needed to bring agentic AI to production quality. Investment banks, for example, have decades of ground-truth financial models, portfolios, and back-office tasks to train “Cursor for Finance.” As RL methods become simpler and more reliable, the best path forward is training first-party models, not praying for an AI lab miracle.

The Problem

Even as reinforcement learning shows enormous promise, a critical barrier remains: there is no developer platform robust enough to support the post-training methods ML and data science teams want to use. For instance, RL is a heterogeneous workload, combining distributed training, inference, and evaluation, something most existing ML platforms, whether cloud-based like SageMaker or Vertex AI, or open-source like Kubeflow, are not designed to handle.

Teams also need a rapid research feedback loop. Today’s ML infrastructure slows this process to a crawl. Every small code change can require a 30-minute wait to test, thanks to slow CI pipelines and long worker restarts. Tools like ML Notebooks accelerate iteration, but cannot handle scale or workload heterogeneity, and create a gap between research and production. No single system currently enables both interactive experimentation and scalable, reproducible execution.

As a result, even as RL research methods become battle-tested and new libraries simplify implementation, infrastructure remains a stubborn bottleneck. A proof of concept might emerge in a day, but it can take an entire quarter for that same training to reach production and prove itself at scale.

The Product

Runhouse solves this bottleneck with Kubetorch, a powerful distributed framework for Kubernetes, purpose-built for modern enterprise ML and reinforcement learning. Kubetorch makes building and running complex distributed workloads on cloud hardware effortless. Regular Python code runs with instant iteration, perfect reproducibility, and unlimited scale.

Built for enterprise security, Kubetorch can be deployed in just 15 minutes entirely within VPC, whether on an on-prem cluster, cloud GPUs, or even multiple different Kubernetes clusters. For ML researchers, complex Kubernetes interactions are fully abstracted behind intuitive Pythonic APIs. For platform teams, Kubetorch’s Kubernetes-native architecture and familiar primitives make it plug-and-play with existing observability and management stacks.

Kubetorch is already powering production deployments at Fortune 500 companies, frontier AI labs, and AI-native startups. These teams are leveraging both proprietary data and valuable user interaction data from live agentic AI deployments, creating a continuous learning loop that drives ever-more performant first-party models.

The Team

Runhouse was founded by Donny Greenberg and Josh Lewittes, two engineers who have built their careers tackling the practical challenges of machine learning systems.

Donny led product for PyTorch at Meta, where he supported hundreds of engineers building on the framework. Before that, he worked at IBM as a tech lead in quantum computing research and at Google as a product manager in large-scale ad infrastructure. Across these roles, he saw a common problem: breakthrough models and platforms were exciting, but engineers struggled when iteration slowed down at the infrastructure layer.

Josh’s path has been rooted in applied machine learning. He built fraud detection models at SecuredTouch, developed backend ML systems at KPMG, and most recently worked as a machine learning engineer at Gloat. In each of these roles he ran into the same bottleneck from a practitioner’s perspective: ideas and data were plentiful, but getting models into production was slow and cumbersome.

Together, Donny and Josh bring the platform and infrastructure expertise alongside hands-on ML engineering experience needed to solve reinforcement learning’s last mile. Runhouse reflects their conviction that enterprise teams should be able to iterate on RL models with the same ease as running code locally.

Congratulations to the Runhouse Team

Enjoying a fun dinner of chicken poppers and Mountain Dew Baja Blasts ;)

We are proud to back Donny, Josh, and the Runhouse team as they tackle one of the most important infrastructure challenges for reinforcement learning. By removing the friction between code and compute, they are making it possible for enterprises to move from experimentation to production at the pace that modern software demands.

Runhouse is currently in open beta, and you can sign up for access at run.house. Read more about their launch on the Runhouse blog.