This year, we saw a remarkable advancement in the large language model (LLM) marketplace. There were developmental breakthroughs in model size, complexity, and application, and a massive increase in market penetration. Consider the state of LLMs two years ago. When ChatGPT launched in 2022, the “large language model” was not in the common vernacular, and far less was it used by the average person in day-to-day life.
In addition to the growth in the training data's sheer size, the models' complexity and capabilities have dramatically increased since they entered the public domain. Complementing this rapid expansion are innovations in model structure and efficiency—like the emergence of Mixture-of-Experts and “mini” models—enabling new capabilities and applications. Let's explore the growth trends, multimodality, and open-source models shaping the LLM industry and the critical role of cloud providers in supporting them.
A New Era of Model Training
LLMs have recently reached unprecedented scale in parameter count and context length. A trillion parameter model (1T) was unheard of just a few years ago. To contextualize this growth, GPT-1 was trained on 117 billion parameters, while GPT-4 was reportedly trained on 1.8 trillion parameters - a 15,000x increase in about five years' time. Today, many models meet or exceed the 1T parameter threshold. In February of this year, SambaNova released Samba-1, a 1 trillion parameter model for the enterprise. This set the tone for 2024, putting the “large” in the large language model. The increased parameter count was not the only paradigm shift to happen this year, as we also saw context length get a massive boost seemingly with every successive model launch.
Increasing context length and decreasing model size is an interesting trend within the LLM marketplace today. Magic’s LTM-2-Mini, for example, offers a 100-million-token context window—the equivalent of analyzing 75 novels simultaneously in a model compact enough for desktop use. Other examples of these smaller fine-tuned LLMs are the recently released GPT-4-Mini, Llama 7B, and Mistral 7B.
Model training has become quite an arms race in 2024, with AI labs releasing more and more capable models as the race to artificial general intelligence (AGI) heats up. More than just raw power, though, efficiency in both training and inference has become paramount. With bigger price tags for compute and energy in large-scale AI cluster training, companies are paying more attention to node lifecycle management to maximize performance. The models themselves must also become more efficient, which is why there has been a trend toward compute-maximizing architectures like the Mixture of Experts model.
Large Language Model Structure Is Diversifying
A Mixture of Experts LLM is a big technological breakthrough because it allows models to be pre-trained using far less compute than a traditional model. The “experts” utilize a smaller subset of the total parameter count when they decipher an answer, leading to lower latency, lower costs, and ultimately a much more efficient training process.
Mistral is one of the pioneers of this model architecture, releasing their Mixture of Experts model Mixtral 7x8B almost a year ago. They also released Pixtral this year, their first multimodal model, followed by Pixtral Large just a few days ago. These frameworks signal a change in how LLMs process data efficiently, and many companies are likely to adopt similar approaches.
Multimodal models, such as OpenAI’s GPT 4-o, Mistral’s “Pixtral”, and Runway’s Gen-3 Alpha, have also gained popularity recently. LLMs are no longer limited to text-to-text; we are seeing more text-to-image, image-to-image, and even image-to-video or video-to-video capabilities. These developments are rapidly changing the state of LLMs and producing some truly unbelievable content.
The Open Source Revolution
This year, we have seen more open-source models hit the market, with huge success across the board. CoreWeave has always been a big supporter of the open-source ecosystem. In February 2022, we collaborated with EleutherAI, a grassroots collective of researchers passionate about open-sourcing research, and created GPT-NeoX-20B, the largest publicly available LLM at the time.
Since then, we have continued to work with companies that push the community to new heights. Another partner, Chai Research, operates in an open-source framework and is a large proponent of crowdsourcing and collaborative progress. Their recent release Chai-1 is a multimodal model that predicts the structure of molecules and will be a valuable asset to the field of drug discovery.
Seeing how far the open-source LLM mission has come over the last few years is amazing. Meta just released Llama 3.1 405B this summer, a 405 billion parameter model, which is the world’s largest publicly available model at release, a title held by a 20B model a mere two years ago. It is exciting to see how much this industry has progressed, and there are no signs of it slowing down any time soon. In fact, we expect models by the end of 2025 to be more powerful and intelligent than current ones.
LLMs Are Revolutionizing The AI Infrastructure Landscape
With LLM development progressing at light speed, it is on the cloud providers to up their game as well. Just a few years ago, a cluster size of 10,000 GPUs, all working concurrently, seemed immensely large. The scale of operations has increased at a mind-bending pace in 2024, and cloud providers have had to adjust to serve this insatiable demand for GPUs and reliable infrastructure.
In a Q3 earnings call, Meta announced that they are building a cluster of over 100,000 GPUs, and other large AI companies plan to do the same to serve their respective chatbots. For context, that cluster size represented about 20% of the entire cloud provider GPU stockpile in 2022.
These large-scale AI clusters require evolutions across all layers of cloud computing. Data centers must be equipped with the latest liquid cooling technology and networking to accommodate huge increases in server rack density. Additionally, the hardware and infrastructure must be highly performant and meticulously maintained so enterprises and AI labs can focus on what they care about.
What’s Next for LLMs?
The marketplace for LLMs is changing rapidly, with each successive year far outpacing the previous one. Tens of thousands of GPUs in a single cluster has already become commonplace this year, and 100,000+ clusters are on the menu for 2025. AI labs are revolutionizing what’s possible and pushing the boundaries of LLM capabilities every day.
If you thought this year was crazy, get ready for 2025. AI labs and enterprises have been the engine for massive change in the LLM space, and this progress will only accelerate in the coming year. As GPU demand becomes exceedingly large and infrastructure requirements get increasingly complex, it will be on the cloud providers to keep up with the pace of innovation.
No one could’ve imagined how much the industry would change in 2024, and 2025 will be even more exciting. Explore the CoreWeave blog to learn more about model training and the infrastructure that powers it.