The start of the artificial-intelligence arms race was all about going big: Giant models trained on mountains of data, attempting to mimic human-level intelligence.
Now, tech giants and startups are thinking smaller as they slim down AI software to make it cheaper, faster and more specialized.
This category of AI software—called small or medium language models—is trained on less data and often designed for specific tasks.
The largest models, like OpenAI’s GPT-4 , cost more than $100 million to develop and use more than one trillion parameters, a measurement of their size. Smaller models are often trained on narrower data sets—just on legal issues, for example—and can cost less than $10 million to train, using fewer than 10 billion parameters. The smaller models also use less computing power, and thus cost less, to respond to each query.
Microsoft has played up its family of small models named Phi, which Chief Executive Satya Nadella said are 1/100th the size of the free model behind OpenAI’s ChatGPT and perform many tasks nearly as well.
“I think we increasingly believe it’s going to be a world of different models,” said Yusuf Mehdi , Microsoft’s chief commercial officer.
Microsoft was one of the first big tech companies to bet billions of dollars on generative AI , and the company quickly realized it was becoming more expensive to operate than the company had initially anticipated, Mehdi said.
The company also recently launched AI laptops that use dozens of AI models for search and image generation. The models require so little data that they can be run on a device and don’t require access to massive cloud-based supercomputers as ChatGPT does.
Google—as well as AI startups Mistral, Anthropic and Cohere—have also released smaller models this year. Apple unveiled its own AI road map in June with plans to use small models so that it could run the software entirely on phones to make it faster and more secure.
Even OpenAI, which has been at the vanguard of the large-model movement, recently released a version of its flagship model it says is cheaper to operate. A spokeswoman said the company is open to releasing smaller models in the future.
For many tasks, like summarizing documents or generating images, large models can be overkill—the equivalent of driving a tank to pick up groceries.
“It shouldn’t take quadrillions of operations to compute 2 + 2,” said Illia Polosukhin, who currently works on blockchain technology and was one of the authors of a seminal 2017 Google paper that laid the foundation for the current generative AI boom.
Businesses and consumers have also been looking for ways to run generative AI-based technology more cheaply when its returns are still unclear .
Because they use less computing power, small models can answer questions for as little as one-sixth the cost of large language models in many cases, said Yoav Shoham, co-founder of AI21 Labs, a Tel Aviv-based AI company. “If you’re doing hundreds of thousands or millions of answers, the economics don’t work” to use a large model, Shoham said.
The key is focusing these smaller models on a set of data like internal communications, legal documents or sales numbers to perform specific tasks like writing emails—a process known as fine-tuning. That process allows small models to perform as effectively as a large model on those tasks at a fraction of the cost.
“Getting these smaller, specialized models to work in these more boring but important areas” is the frontier of AI right now, said Alex Ratner, co-founder of Snorkel AI, a startup that helps companies customize AI models.
The credit-rating company Experian shifted from large models to small for the AI chatbots it uses for financial advice and customer service.
Once trained on the company’s internal data, the smaller models performed as well as large ones at a fraction of the cost, said Ali Khan, Experian’s chief data officer.
The models “train on a well-defined problem area and set of tasks, as opposed to giving me a recipe for flan,” he said.
The smaller models also are faster, said Clara Shih , head of AI at Salesforce .
“You end up overpaying and have latency issues” with large models, Shih said. “It’s overkill.”
The move to smaller models comes as progress on publicly released large models is slowing. Since OpenAI last year released GPT 4, a significant advance in capabilities from the prior model GPT 3.5, no new models have been released that make an equivalent jump forward. Researchers attribute this to factors including a scarcity of high-quality, new data for training.
That trend has turned attention to the smaller models.
“There is this little moment of lull where everybody is waiting,” said Sébastien Bubeck, the Microsoft executive who is leading the Phi model project. “It makes sense that your attention gets diverted to, ‘OK, can you actually make this stuff more efficient?’”
Whether this lull is temporary or a broader technological issue isn’t yet known. But the small-model moment speaks to the evolution of AI from science-fiction-like demos to the less exciting reality of making it a business.
Companies aren’t giving up on large models, though. Apple announced it was incorporating ChatGPT into its Siri assistant to carry out more sophisticated tasks like composing emails. Microsoft said its newest version of Windows would integrate the most recent model from OpenAI.
Still, both companies made the OpenAI integrations a minor part of their overall AI package. Apple discussed it for only two minutes in a nearly two-hour-long presentation.
Write to Tom Dotan at tom.dotan@wsj.com and Deepa Seetharaman at deepa.seetharaman@wsj.com