Compact Language Models Take Center Stage as Big AI Systems Battle High Energy Demands
–
Large language models offer an extensive range of capabilities due in part to the enormous number of adjustable factors involved in their training process. Leading organizations such as OpenAI, Meta, and DeepSeek have developed systems that operate with hundreds of billions of parameters. These adjustable elements help the models capture subtle patterns and relationships in data, resulting in improved performance and accuracy for a variety of applications.
Training such massive systems carries a significant price tag. It demands extensive computational resources and financial investments that few can manage. For example, reports indicate that Google spent $191 million to train its Gemini 1.0 Ultra model. Another point to note is that each time a large model processes a request, it uses a considerable amount of energy. The Electric Power Research Institute has observed that a single query to ChatGPT consumes about ten times the energy of a standard Google search.
Researchers have begun to explore more compact language models as a response to these high costs. Recently, companies including IBM, Google, Microsoft, and OpenAI have launched language systems that use only a few billion parameters. Small models are not intended for general usage like their larger counterparts. They excel in performing tasks that have a narrower focus, such as condensing lengthy conversations, responding to patient inquiries on health care platforms, and collecting data from smart devices. “For a lot of tasks, an 8 billion–parameter model is actually pretty good,” said Zico Kolter, a computer scientist at Carnegie Mellon University.
An additional advantage is that these smaller systems are capable of operating on everyday hardware such as laptops or cell phones. The definition of “small” is not fixed, and recent designs usually top out at around 10 billion parameters. This reduction in size makes them easier to deploy in environments that do not have access to extensive data centers.
Methods to train these smaller models have also been refined to improve efficiency and output quality. Instead of relying on disorganized, raw Internet data, larger models create refined training sets which provide superior information quality. This method involves transferring the training insights from a large model to a smaller one, operating in a comparable fashion to a teacher instructing a student. As Kolter explained, “The reason [SLMs] get so good with such small models and such little data is that they use high-quality data instead of the messy stuff.”
Another approach to produce smaller models involves starting with a large system and reducing it by eliminating redundant components. This technique is called pruning. It removes portions of the neural network that do not contribute significantly to the model’s performance, effectively streamlining the complex web of interconnected data. The idea behind pruning is inspired by observations of the human brain, which becomes more efficient by cutting back on unnecessary synaptic connections over time. Yann LeCun, now at Meta, introduced this concept in 1989 with his paper, in which he suggested that up to 90 percent of the parameters in a trained network could be removed without harming its functionality, a process he referred to as “optimal brain damage.” This method lets researchers adjust a language model for a specific assignment or operating condition.
Compact language models serve as an attractive option for researchers looking to experiment with innovative ideas in artificial intelligence. Their reduced complexity means that the individual steps of reasoning may become more evident, offering clearer insights into how these systems operate. Leshem Choshen, a research scientist at the MIT-IBM Watson AI Lab, stated, “If you want to make a new model, you need to try things. Small models allow researchers to experiment with lower stakes.” This opportunity lowers the cost of testing new theories and techniques, making research in language modeling more accessible.
Massive models with their ever-growing parameter counts will continue to play a critical role in wide-ranging applications such as conversational agents, image generators, and pharmaceutical research. In many cases, a compact model designed for a specific task can deliver similar results while reducing computational and financial demands. Leshem Choshen reiterated, “These efficient models can save money, time, and compute.” Such systems offer a practical alternative for users and researchers aiming to optimize performance without incurring excessive costs.