“Wait this isn’t the normal newsletter format!”
Sadly, I reached the subscriber limit for the free-tier newsletter for the newsletter platform I was using (ConvertKit). Byte-sized Design reached an amazing 1000+ Subscribers! This is a simple example of migrating to something that scales better.
To keep this newsletter up and running, we’re now on Substack! The format is a little different but the content is just as informative. Thank you all for understanding.
TLDR;
Discord used to keep their messages on Cassandra to scale for their billions of messages. It then became a trillion messages with tons of latency issues reading from the same messages from the same nodes. To fix this, Discord migrated to Scylla DB. Mainly because it didn’t have garbage collection (it’s in C++) and all of Discord’s other services were on Scylla DB.
What’s the Problem?
Every message on Discord is stored on a specific Cassandra node. Too many users reading the same message can overload that node and slow down latency for all other requests to that same node. Especially if everyone is reading the same message (and node) and the same time.
Doesn’t seem like an uncommon problem but when you’re as big as Discord, you start getting larger concurrent scalability issues.
It gets worse. With so many messages, Discord uses Cassandra to “compact” their database tables. It doesn’t work so well when users are reading from a node that’s also in the middle of compacting. It’s super slow.
Give me the Requirements!
Migrate off of Cassandra.
Migrate off of Cassandra Fast.
Migrate off of Cassandra without downtime.
What are we doing, boss?
Discord decided to move to ScyllaDB for the following reasons:
ScyllaDB has no garbage collector. It’s built on C++. That means no “Stop-the-world” garbage collection.
Every other service at Discord migrated to ScyllaDB except discord messages. It’s not a random database they have to get used to.
It supports reverse querying. Meaning you can look up messages from most recent to least recent. If you know SQL, it means “ORDER BY ASC|DESC” are both reasonably efficient.
Simplify the Design!
Discord fixed this issue by spinning up a ScyllaDB node and copying all write requests from Cassandra to ScyllaDB.
Discord uses their internal database library, rewrites it in rust, and uses it to copy and migrate data from Cassandra to ScyllaDB in 9 days.
After the migration is done, ScyllaDB becomes the primary database and Discord doesn’t need to duplicate writes anymore.
(Full article is available to paid subscribers to support and build this newsletter!)
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.