Byte-Sized Design

Byte-Sized Design

GitHub’s Elasticsearch Problem Was Seven Years in the Making. Here’s How They Finally Fixed It

Why the right fix wasn't available until now, and what they did in the meantime.

Byte-Sized Design's avatar
Byte-Sized Design
Mar 16, 2026
∙ Paid

TL;DR

GitHub Enterprise Server runs search on Elasticsearch. It also runs High Availability with a primary/replica model. For years, those two things could not coexist cleanly. Elasticsearch would move a primary shard to the read-only replica node. If you then took down that replica for maintenance, the whole thing deadlocked. The replica waited for Elasticsearch to recover before it could start. Elasticsearch couldn’t recover until the replica rejoined.

GitHub engineers knew this was broken. They spent years trying to patch around it. It took until Elasticsearch shipped Cross Cluster Replication to actually fix it.

The fix is live in GHES 3.19.1. The lesson underneath it is older than GitHub.


The Original Sin Was a Reasonable Decision

Let’s be precise about what went wrong here, because it’s easy to read this story as “Elasticsearch bad” when the real issue is more interesting.

User's avatar

Continue reading this post for free, courtesy of Byte-Sized Design.

Or purchase a paid subscription.
© 2026 Byte-Sized Design · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture