Byte-Sized Design

Byte-Sized Design

System Design for Everyone: A Primer On Sharding (Ft. MongoDB)

If a database had more databases.

Byte-Sized Design's avatar
Byte-Sized Design
Sep 02, 2024
∙ Paid

Target: Pre-senior engineer

📚 tldr;

Sharding just means spliting up a database into smaller databases. You only have to do this if you can’t put everything on one database server.

🙏 Please explain “Sharding“

Sharding is a fancy word. What it actually is isn’t fancy.

Sharding

Break down a single dataset across multiple databases.

Example

Instead of putting 10,000,000 users in one database, you can break that down to 5,000,000 million users across two databases.

Yes, this is a form of horizontal scaling.

💬 When Would I Even Use This?

If your database has a ton of data and has too many requests, here are your options:

  1. Vertical Scaling - Just make the database stronger by adding more RAM, CPU, Disk.

    Except this costs $$$ to implement.

  2. Horizontal Scaling - Just add more machines to share the dataset and the work load.

    Except this gets complicated to implement.

So if you’re working on HUGE applications, knowing how to shard the database can keep you up and running.

📖 Ok, Teach me How to Shard

Distribute the Data

If sharding means breaking up a dataset, then we need to talk about how we would do that.

🏹 Ranged Based Sharding

Define how many shards you have and give each a range. If a database record (like a user id) is in that range, send it to that shard.

Shard A: User IDs [0, 20]

Shard B: User IDs [21, 40]

Shard C: User IDs [41, 50]

You can see how the key space is broken down into 4 chunks (shards) above. If a data record is in one of those ranges, it’ll be sent to that shard.

🚨 Watch Out: Make sure the ranges you’re using is equally distributed. Otherwise everything will end up on one shard and slow down performance.

🥔 Hashed Based Sharding

Just use the database record mod number of shards (% total shards)

Shard to Send To = some ID % total shards

It’s almost like range-based sharding except it makes it easier to distribute everything evenly.

🚨 Watch Out: If a record is huge and needs to be split, it can end up on a different shard. Finding that other shard can be difficult.

The other problem is that if the total number of shards needs to be added or removed, then you’ll have to rebalance all your database records.

That’s going to cause a system outage and that’s not a fun time.

🌎 Geography Based Sharding

Keep reading with a 7-day free trial

Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Byte-Sized Design · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture