This post was originally published on The Data Source on February 15th, 2023, my monthly newsletter covering the top innovation in data infrastructure, engineering and developer-first tooling. Subscribe here and never miss an issue!
One of the investment themes I’ve been researching has been around the applicability of foundation models for a wide variety of tasks and the opportunity that it represents for the enterprise.
Foundation Models In < 1 Minute
In recent years, we’ve seen the emergence of foundation models such as BERT and GPT-3, Stable Diffusion, CLIP, etc. as well as early commercialization attempts to power the next generation of applications. By definition, foundation models are trained using self-supervision, on broad data and from various modalities for a wide range of downstream applications. By virtue of foundation models being able to centralize information from data from various modalities, a single model can be useful for a wide range of domain-specific tasks. In fact, research has shown that foundation models incentivize homogenization (i.e., the same models can be reused as the basis for various applications). This is an important characteristic as any improvement in one foundation model can lead to immediate benefits in the different AI systems built on top of it. Given this, I think there’s an opportunity for unified sets of tools to be created for developing foundation models across a wide range of modalities.
Ongoing community efforts by Hugging Face and EleutherAI around training foundation models attempt to simplify the development of such models at scale, but we are still a long way from widespread adoption of foundation models for domain-specific applications. While digging into this research theme, I spoke to many data, AI and Machine Learning practitioners to get their perspective on what’s going on with foundation models. What I uncovered is that with the rise of foundation models and the ongoing research around it, sophisticated solutions will be created to deliver high quality training data for AI applications and accelerate their development. Today, solutions such as Scale AI, MLflow and Snorkel exist to simplify the development of AI applications but I expect to see newer tooling emerge to support the creation and distribution of foundation models.
While there's generally been an uptick in ML talent across the enterprise, there is still a dearth of people with deep domain-specific expertise. As a result, adoption of foundation models, especially for enterprise use cases that tend to be complex and performance-critical and that require fine-tuning, has been a laggard.
Even though the market for foundation models is still nascent, it is one that I foresee growing in the future. In fact, there are a few interesting angles that I've been exploring lately which could be interesting opportunities for startups to capitalize on. These include:
- Technology that enables the applicability of foundation models to close the gap in ML adoption across the enterprise
- Technology that empowers non-ML developers to leverage sophisticated ML technology and apply it to a wide variety of enterprise-wide use cases
A Look Into Existing Challenges with Foundation Models
Amid growing market interest and push for narrowing the gap between foundation models and its adoption in the enterprise, there are important challenges that organizations will need to overcome to move the ball forward:
- Creating high quality training data: Training data has always been a bottleneck in the ML development lifecycle. Fine-tuning a foundation model to a domain-specific task requires large volumes of domain and task-specific labeled training data. This remains an important requirement especially for complex and nuanced use cases. Because of the work it takes to source and label data, to then create a robust end model that can be used in production, this entire process is often time consuming and costly.
- Deploying foundation models in production: This is arguably one of the biggest technical challenges today for the enterprise given foundation models are large, slow and expensive to train and run. Existing solutions such as Hugging Face, A121 Labs and more enable customers to host their own models, but even comparing the cost of deploying foundation models in production versus the benefits a business would see from such deployment it's hard to justify the net ROI. Newer tools in this space will need to drive down the cost of these models to make them more appealing to enterprises.
- Foundation models are risky: As is the case with many ML models, there are legal risks around security, compliance, and model risk management. In the enterprise, risk and governance is an important enterprise mandate and given that foundation models can be complex and may have undetermined behaviors, running them for business critical use cases and within regulated industries is not feasible for many. Furthermore, the reinforcement learning from human feedback (RLHF) loop used by ChatGPT, Bard, and others allows for continual updating of the model -- potentially live in production, or via periodic pull-and-replaces. TL;DR: the model that worked well yesterday may not work well today, necessitating measurement of and monitoring for changes in risk and other metrics, a service offered by MLOps monitoring solutions such as Arthur.
What Does the Opportunity for Foundation Models Look Like In the Enterprise?
One observation from looking at the broader ML infrastructure and Ops categories has been that most existing tools cater specifically to more technical ML personas versus non-ML personas. But that is starting to change, especially as it's getting increasingly expensive to hire for the more technical roles. Given the shifting priorities and budget considerations for the enterprise in today’s economic landscape, many decision makers are driving forward a "do more with what we have" mentality when it comes to investing in their tool stack. To this end, there's been increasing demand in bringing on automation tooling to enable product owners, business line managers and analysts to be self-sufficient. In particular, I'm seeing renewed appetite for automation tools built for non-technical ML personas to help scale ML adoption across enterprises. These are becoming more prominent especially as the value of AI/ML in real business use cases is starting to get proven out.
Given the existing challenges around creating and deploying foundation models in a cost-effective way, I find that interesting opportunities exist at the infrastructure layer for startups to figure out how compute and the cost of training and running foundation models can be made cheaper. Horizontal platform solutions that can centralize compute, surface a model marketplace, and provide extendable templates for users could be an innovative approach to opening up foundation models in the enterprise. Hugging Face has a head start here, but I find that there’s still ample opportunity in this problem space for new startups to tackle.
Throughout my conversations with practitioners in the enterprise world, interesting use cases in the compliance space are surfacing where in the future foundation models could be used as the basis for building KYC, sanction screening, adverse media search and fraud detection applications. The idea is that for organizations that don't have the infrastructure and expertise to build, train, fine-tune and deploy sophisticated models in house, horizontal platform solutions would act as black boxes that spit out task-specific foundation models where each of these tasks can be automated. Instead of throwing massive data sets at building AI models and having to pay more for the required processing power, I’m interested in newer technologies that would make it possible to spin up foundation models trained on low volumes of data and essentially enable the enterprise to do more with less data and less infrastructure. That can be a game changing experience for larger organizations.
There's also interest around low-code ML products to abstract away the deep expertise required for shipping models. These would address a major gap between ML teams and their business counterparts as well as unlock enterprise-wide adoption of sophisticated ML models. As ML usage in the enterprise moves beyond the traditional of scope of ML problems (product recommendation, language translation, personalization etc.) to solve broader business problems, non-technical users of ML models, sitting on compliance, finance, sales and marketing teams will be able to leverage sophisticated ML models under the hood for their own tasks.
While the enterprise is admittedly still in the early stages of ML adoption, it’s interesting to see some momentum around the applicability of foundation models for non-ML personas across the enterprise. As I keep digging into this topic, I’m looking to speak with more practitioners and builders in this category. If you’re an early stage startup building in this space or ML developer looking into foundation models, I’d love to hear from you!