Streaming Twitch Microservices Live (Previously a monolith)
Chat went down so now its their own service
TLDR;
Twitch put everything on one server (Video, chat, search, everything). They got so big that they needed to migrate to microservices.
So they migrated to microservices.
What’s wrong with using a monolith?
It’s too big and slow. When everything depends on each other, it means breaking one thing breaks everything.
Scaling Struggles: Monoliths can't always grow parts separately, which can be a bummer when your app's blowin' up.
Slow-Mo Changes: At first, making moves in a monolith's like speed dating, but when it gets big, making changes can feel like slogging through molasses. One tweak can wreck the entire app.
Code Chaos: In a huge monolith, it's like tryna clean up a messy room with no storage. Things get tangled, and it's hard to keep the code in check.
Twitch went Live to Offline. What broke?
Twitch's chat system got super popular, and so did the outages.
Particularly chat.
Here’s what happened to chat:
The chat Problem
When twitch delivered their chatting feature, they had 8 machines distribute the traffic. This was back in 2010 and NONE of those machines were really beefy.
That means when someone got wildly popular and hit 20,000 viewers, they were all being served from a SINGLE machine. The sad part is that other smaller twitch streamers might also be hosted on that machine and have their streams compromised.
So Twitch was getting popular.
It was time to migrate to Microservices.
Project Wexit: The Microservice Migration
In 2015, Twitch started a migration from Ruby on Rails to GoLang for their microservices.
This was great and everything except they had a list of things that went wrong leaving the monolith behind:
Using a single deploy pipeline for everyone
Using spreadsheets to coordinate staging environments
Running Puppet in one of the production boxes to test infra changes
Waiting for increasingly slow builds
Struggling to trace errors back to their owners
Slow routing due to too many API endpoints
Slow database queries
Constantly dealing with merge conflicts
(All straight from the official article)
Keeping it all LIVE
Being from a big company doesn’t mean they do crazy heard things. The simple way to migrate was to replicate yourself.
Clone that service and let the whole world know.
To make the migration pain free, Twitch set up an NGINX reverse proxy to route all traffic between the old Ruby on Rails controllers and the new Go API edge services.
This means all they need to do to migrate was to configure a single proxy in front of every request until the entire monolith is killed off.
It took a few years but it works.
If it works, then it works.
(Links to official article and sources are available to paid subscribers. They help maintain and support this newsletter!)
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.