Client Case Study

How AI Dungeon’s Cloud Adventure with CoreWeave Improved 3 Critical KPIs

How AI Dungeon’s Cloud Adventure with CoreWeave Improved 3 Critical KPIs
who

AI Dungeon uses AI and Machine Learning to generate unlimited, endlessly customizable text adventures for millions of users around the world.

what

AI Dungeon is on fire, expanding from hundreds of users at launch to millions of users and climbing fast. Exponential growth led to soaring costs, frequent outages and sub-optimal slowdowns for users.

wow

With a seamless migration to CoreWeave, AI Dungeon was able to cut spin-up time and inference latency by 50% and lower computing costs by 76%.

Introduction: Meet the Dungeon Master

“If you've played Dungeons and Dragons, AI Dungeon is a little bit like having a Dungeon Master in your pocket,” says Alan Walton, co-founder and chief technology officer for Latitude, the makers of the AI powered text adventure game, AI Dungeon.

Alan and his brother, co-creator Nick Walton, grew up in the perfect household for incubating a project like AI Dungeon. Their father was children’s book author Rick Walton, their mother was a programmer and their grandfather was one of the early managers of Bell Labs, back when they were building Unix and the C programming language.

The merger of technology and storytelling is right there in their DNA.

Dungeons and Dragons isn’t a game that Alan and Nick enjoyed as kids, but they were always curious to try it out and so, in 2019, they gathered their brothers, sisters and spouses together for a yearlong D&D adventure. They convinced a friend to sit in as Dungeon Master and it was during this time, at a hackathon hosted by Google at BYU’s Deep Learning Lab, that lightning struck.

"I wonder what it would be like if GPT-2 was the dungeon master?” mused Nick. That night, he built a prototype – AI Dungeon Classic. He tinkered with it for a couple of months, until a new, more robust version of GPT-2 arrived, then he and Alan plugged it in and officially released AI Dungeon as Python code on the internet.

There was such an outpouring of interest – everything took off like wildfire.

“We had a hundred thousand players in the first week. We actually had so many people trying to play the game that just the download costs for the game alone were hitting $10,000 a day,” said Walton.

After three days of soaring costs, BYU couldn't afford to host the game any longer and had to take it down. Within 12 hours, the AI Dungeon community built a peer-to-peer network, seeded with their own bandwidth and got the game up and running again.

An incredibly passionate fanbase is what Alan and Nick are most proud of so far, and that fanbase has used AI Dungeon to create more than 60 million unique text adventures… and climbing.

image 1

Problems Arise

“AI Dungeon, by our benchmarks, is something like a hundred times more compute intensive than the most intensive AAA game on the market today. That's why there's no offline mode, right? If you want it to play an offline mode, you'd have to buy graphics cards that cost as much as like two Teslas.”

This assessment by Walton, is a nod to the frustrations, planning and costs associated with running large language models like GPT-2 and GPT-3. They are orders of magnitude larger than what machine learning has typically meant in terms of compute resources infrastructure and these models are growing at an astounding rate – multiplying ten times larger every six months.

Originally Alan and Nick were running AI Dungeon with Cortex, an open source framework for rapidly spinning up ML clusters on AWS. They were running their own clusters and doing all of the auto scaling themselves.

“We were probably down once or twice a week in some form, just because managing those workloads as they would sometimes spike a thousand percent in an hour when a streamer, or YouTuber released a video, was really challenging,” said Walton.

The team had constant trouble setting their resource limits for the number of EC2 instances. They kept hitting the limits, needing to request more and anxiously waiting for those requests to go through.

Group 1280

CoreWeave Solution: Sharpened Focus

Nick and Alan saw a huge opportunity to have CoreWeave take on their compute resource infrastructure needs, with the industry's most responsive auto-scaling and access to GPUs via Kubernetes, which gives them the advantages of bare-metal without the headache of managing infrastructure.

“When we saw this opportunity with CoreWeave to just take that entire body of work off our plate so that we could focus on application of these brand new, large language models, that was really attractive, especially with a small team,” said Alan.

Utilizing a best in class service, like CoreWeave’s fully managed Kubernetes infrastructure, frees-up Nick, Alan and the Latitude team to focus on building the next generation of AI games. With CoreWeave, they have easy access to spinning up additional models so they can test out new features – and instead of worrying about infrastructure, they can simply focus on the model itself and on the player experience.

Group 1284

CoreWeave Solution: Engagement + Retention

AI Dungeon has a little bit of latency built into the game-play experience. The game is entirely open-ended and the AI takes some time to digest user responses and harness the creativity and imagination needed to propel each story forward.

Average latency when playing AI Dungeon hovers around 10 seconds, as opposed to a typical game with latency in the milliseconds.

CoreWeave was able to cut AI Dungeon’s latency in half, from 9 or 10 seconds with AWS, down to three or six seconds with CoreWeave.

“This was a really nice improvement in terms of player experience,” said Walton.

As latency dropped, customers played longer and more often.

"It was really great to see that engagement went up and retention went up as a result,” said Walton.

CoreWeave Solution: The Free Tier

We began with a note about Nick and Alan’s proudest achievement – their passionate and highly engaged AI Dungeon fanbase. CoreWeave has helped that fanbase continue to connect and thrive on AI Dungeon’s free tier, a solution which would have been financially impossible for Alan and his team on AWS.

last image

The Future: A Phase Change

There is a phase change right now in how entertainment is being made. Personalized experiences are a gateway to the future.

“People have been dreaming about these types of games for decades,” says Walton, “If you look at the Holodeck or Sword Art Online, or Ready Player One, there's this vision that in the future, there will be games that feel more real, where you can interact with people in realistic ways where the actions that you take matter.”

So far, AI Dungeon has spun more than 60 million unique adventure stories – generated dynamically – for each person, in real time.

CoreWeave is honored to be their partner, helping in ways large and small to deliver the future of gaming… today.

 

Connect with us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.