TLDR;
Twitter struggled using Elasticsearch and letting that cluster handle everything. This was a problem handling tons of traffic and doing large backfill operations.
The simple way they fixed this was to off board handling traffic and backfill requests into another service. Problem Solved.
What’s the Problem?
Twitter’s searching service is handled by Elasticsearch. Every request would go through the Elasticsearch service. This led to clusters failing, requests failing, and data not being backfilled correctly. The Elasticsearch cluster was being overloaded and Twitter needed a new way to handle this.
In short, backfills are large data change operations like updating old database data with new fields.
Give me the Requirements!
The solution needs to be unobtrusive without downtime.
Backfills and search requests need to be handled during low and peak traffic hours.
Separate these services from Elasticsearch entirely.
What are we doing?
To move the services off of Elasticsearch, Twitter built the following services.
A proxy that can read and route requests to appropriate services.
A backfill service where workers can take their time to listen to new work requests and handle backfill them as much as they can.
An ingestion service to batch requests, handle throttling, and retries when handling general search traffic.
Simplify the Design!
Twitter built a backfill service to save backfill requests in temporary storage and have workers pick up work from these requests when they’re able to.
To build the ingestion service, the proxy would route requests to this service into a Kafka topic then have client workers route the requests from the topic into Elasticsearch.
See you in the next edition!
(Link to the full design article is available to paid subscribers! They help support this newsletter and maintain service costs.)
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.