tldr;
Problem: Instagram has replaced all of their Redis database with Apache Cassandra in some of their system (such as fraud detection, feed, inbox etc). Good for scale but bad for latency.
Solution: Apache Cassandra uses Java as its backend which is undergoing latency spikes too often because of garbage collection. So Instagram just stopped garbage collection to solve the latency spikes.
💰 HELP WANTED
This newsletter has grown to 21,500 → 22,501+ AMAZING READERS. It’s grown to a scale that a single person can’t maintain all of it on their own.
If you’re interested in being a byte-sized design writer, apply here!
Flow
The adventure with the database began in 2012 at Instagram, one of the largest Apache Cassandra deployments worldwide, where it replaced Redis for vital features like fraud detection and the Direct inbox.
The clusters were relocated to Facebook's infrastructure from AWS initially. It was reliable but still struggled with latency.
Finding the Root cause with metrics
There were problems with redis but here’s what they found.
Instagram relies heavily on Apache Cassandra for key-value storage, maintaining a stringent 5–9s reliability SLA.
Active monitoring of P99 read latency ensures performance optimization across different Cassandra clusters.
Client-side latency analysis reveals an average read latency of 5ms and variable P99 read latency ranging from 25ms to 60ms.
Optimization efforts aim to minimize latency spikes and ensure consistent responsiveness for Instagram's massive user base.
Real cause found
There were spikes in JVM garbage collector (GC) delay. They found this analyzing a metric called GC stall percentage that indicated server outages that affected Cassandra's capacity to fulfill client requests.
These interruptions ranged from 1.25 to 2.5 percent during periods of low and high traffic.
The Frankenstein Solution
Instagram mitigated Apache Cassandra's JVM garbage collection problems by developing their own in-house C++ storage engine. Their storage engine replaced components like memtable and compaction to reduce Java heap object creation and JVM overhead.
Using the existing storage engine was a more feasible and time-efficient solution, the uglier option would be to use a separate storage engine.
So they decided to use the storage engine from Rocks DB to reduce garbage collection time.
RocksDB is written in C++ and is a high-performance embedded database optimized for key-value data.
It’s particularly suited for fast storage devices like SSD and is widely used as a storage engine for industry-standard databases such as MySQL and MongoDB.
Giving you the Results
Rocksandra's first version underwent extensive development and testing before successfully integrating into multiple production Cassandra clusters at Instagram.
In a notable production cluster, Rocksandra reduced P99 read latency from 60ms to 20ms and decreased GC stalls from 2.5% to 0.3%, marking a 10X reduction.
Instagram evaluated Rocksandra's performance in a public cloud setting, deploying a Cassandra cluster on AWS with three i3.8 xlarge EC2 instances, each boasting 32 cores CPU, 244GB memory, and raid0 with 4 NVMe flash disks.
References
Cassandra pluggable storage engine
🗞️ Official Article and References
Link to the official in depth article and slack community:
Subscribing and joining the community is the best way to support this newsletter!
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.