H100 clusters can be an essential stepping stone for AI startups looking to accelerate growth. That’s exactly why Trillion Labs, a startup creating foundational models for Asia, was searching for a hyperscaler partner that could provide both the computing and infrastructure it needed to expand its existing AI model.
While Trillion Labs could go to legacy providers for access to compute, its teams knew they needed more than just power to transform its solution. As a lean startup with streamlined engineering crews, Trillion Labs wanted a deep technical partnership to help support fleet maintenance and management tasks—freeing its experts to focus more deeply on model training, building, and experimentation.
That’s where our teams at CoreWeave came in. With the power of CoreWeave’s purpose-built AI infrastructure, AI startup Trillion Labs can scale above and beyond its initial reach without adding excessive maintenance tasks to its plate—transforming its Korean-focused AI model into a multilingual, international LLM that can serve global users.
The challenge
As Trillion Labs set out to create cutting-edge AI models, they faced several key challenges.
Enabling speed to market was paramount. Its teams needed a cloud provider to support rapid iteration and deployment while adapting to shifting compute demands in real time. With the AI landscape evolving at breakneck speed, any delays in accessing high-performance infrastructure could slow their innovation.
Scalability was another pressing concern. What started as a Korean-focused AI model was set to expand into multiple languages, requiring a cloud platform that could seamlessly scale alongside their growing ambitions. The ability to ramp up compute resources quickly and efficiently would be essential for maintaining their competitive edge.
As an agile startup, operational efficiency was a critical foundation for success. Trillion Labs needed to free up time and resources so its teams could focus on AI model development rather than getting bogged down by infrastructure maintenance. Managing complex cloud infrastructure in-house would divert valuable attention away from its core mission, making it essential to find a solution that could handle these demands seamlessly.
With these challenges in mind, Trillion Labs sought a cloud partner that could provide the speed, scalability, and efficiency needed to power their next-generation AI breakthroughs.
The solution
Trillion Labs first heard about CoreWeave through a strategic partnership with NVIDIA. Its teams were intrigued by CoreWeave’s status as an agile, purpose-built AI hyperscaler. They were also interested in CoreWeave’s depth of knowledge within the sales team alone—a significant value add that Trillion Labs found lacking in other providers and options.
We were thoroughly impressed by the sales team at CoreWeave. Even in our initial discussions, they displayed an impressive level of technical knowledge. They knew the fine-tuned details of their solution and could answer our technical questions with ease.
– Jay Shin, CEO and Co-Founder
As a result, CoreWeave and Trillion Labs embarked on a technical partnership to accelerate growth and expand the AI startup's scale.
With CoreWeave, Trillion Labs received:
- GPU compute: Access to high-performance power
- CoreWeave Distributed File Storage: Highly performant storage with speed and scale
- FleetOps and CloudOps: High-quality support
- CoreWeave Mission Control: Automated health monitoring and reduced downtime
Trillion Labs deployed 40 custom-configured nodes, running a total of 320 H100 GPUs, along with CoreWeave Distributed File Storage and CPUs on demand. Its teams utilized 100TB of Distributed File Storage, which is a highly performant, horizontally scalable, disaggregated NFS storage that scales up to 76 per cluster and sees speeds up to 1GB/s per GPU.
With CoreWeave, Trillion Labs also had access to 24/7 MLOps and engineering support from our seasoned FleetOps and CloudOps teams via an active, responsive Slack channel. This channel also provided a line of communication between Trillion Labs’ teams and CoreWeave’s MLOps/Support Engineers—who could provide hands-on resolutions and insights into critical issues.
Trillion Labs also gained access to CoreWeave Mission Control, our solution that offloads cluster health management with robust monitoring and remediation practices. With proactive health monitoring, Trillion Labs did not have to manually oversee infrastructure health and expend critical time and energy on maintenance tasks. As a result, its teams could offload critical infrastructure maintenance to CoreWeave engineers and focus on improving its AI model.
Additionally, Mission Control allowed Trillion Labs to experience reduced downtime and greater resilience and reliability on CoreWeave clusters. CoreWeave’s offering included a significant pool of spare nodes for immediate hardware replacement, enabling Trillion Labs to leverage more reliable clusters and get more work done with fewer interruptions.
With our previous provider, we had to replace nodes ourselves within our own cluster. With CoreWeave, we had access to a pool of spare nodes for the cluster. I thought the proposed amount was just an exaggeration, but it was genuine and accurate.
– Jay Shin, CEO and Co-Founder
The results
With CoreWeave, Trillion Labs yielded the following results:
- Operational simplicity and support
- Cost efficiency
- Greater resilience
- Faster time-to-market
CoreWeave’s infrastructure addressed major roadblocks for Trillion Labs and reduced the need for manual monitoring, allowing its engineering teams to focus on innovation rather than maintenance. Additionally, access to CoreWeave experts instilled a greater sense of transparency, confidence, and accountability than Trillion Labs had previously experienced.
Having access to a highly responsive Slack channel gives Trillion Labs a strong impression of CoreWeave’s collaborative nature. We truly feel that we are in a technical partnership with CoreWeave and can rely on its experts for solutions and support.
– Jay Shin, CEO and Co-Founder
Plus, Trillion Labs estimated an 8x cost reduction when training its 7B models. This cost savings came from Trillion’s own innovation—which its teams were able to create because they were enabled by CoreWeave’s infrastructure to focus solely on AI. As a result, Trillion Labs achieved the price efficiency AI startups need to responsibly enable growth. Plus, spare nodes and automated lifecycling helped enable greater resilience in Trillion Labs’ clusters—allowing for quick recovery from job interruptions.
With reduced overhead and supportive infrastructure, Trillion Labs was able to focus its efforts and resources on improving its 7B models and offload infrastructure maintenance and management tasks to our teams at CoreWeave. As a result, Trillion Labs was able to greatly accelerate its time to market and reliably scale its model to serve global users.
Greater scale and faster time-to-market—all at an affordable price
Trillion Labs is committed to building the most advanced AI models for Korea and beyond. Its goal is to bring state-of-the-art large language models to Korean users while expanding into additional languages to build a Korea-based ChatGPT.
Trillion Labs’ partnership with CoreWeave showcases how AI startups can achieve scalability, resilience, and cost-efficiency by choosing the right cloud provider built for their specific needs. With CoreWeave, Trillion Labs can continue to scale its model to greater heights and expand its international reach.
At CoreWeave, we’re dedicated to tailoring our solutions directly to our clients’ needs. Learn more about what our platform and partnership can do for you here.
About Trillion Labs
Trillion Labs makes foundational models for Asia, with an initial focus on a Korean-language model. Founder Jay Shin was inspired to establish Trillion Labs after experiencing first-hand the roadblocks and inefficiencies that larger enterprises can face when building their own international LLMs.
About CoreWeave
CoreWeave is the AI Hyperscaler™, delivering a cloud platform of cutting-edge software powering the next wave of AI. The company's technology provides enterprises and leading AI labs with the most performant and efficient cloud solutions for accelerated computing. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe. CoreWeave was ranked as one of the TIME100 most influential companies of 2024.