TLDR;
Quora is a huge crowdsourced question-answer platform.
This wrecked their database query performance so they had to come up with better caching strategies including
Range based caching
Making cache keys more broad
Simple strategies for big performance gains
⚡ What’s There to Optimize?
Quora has tons of data and tons of data to optimize.
Here’s some real examples we’ll go over
Problem 1: Querying which languages a user uses - Quora supports 25 languages, lets query if a user uses a specific language
Problem 2: Querying if a question redirects to another question (doesn’t usually happen)
#1 Querying Which Languages a User Uses
Most people speak 1 language (myself included). This means queries like usesLanguage(user, langauge) will usually return false.
Even for people who are multilingual probably don’t speak 25 languages.
🐢 The Inefficient Cache Key
For this service, Quora was using the cache key signature (User_Id, Language_Id).
This means their cache size would be total_users * 25_languages. That’s a HUGE cache size. Especially since Quora has 300 million users.
Making it NOT a HUGE cache size
Most people speak one language, so making multiple queries for all 25 language usually returns “false” or “user does not speak this language”. This means 24/25 queries would usually result in a cache miss and wreck the database.
To optimize, Quora updated their cache key to just (User_ID) to return a list of languages the user uses.
Simple yet elegant solution, good job Quora.
#2 Querying if a Question Redirects to Another Question
At Quora, a question can redirect to another question. Maybe the question has been answered already or maybe all the answers should be located in one place for centralized knowledge.
Either way, Quora questions can be tagged that it redirects to another question.
The thing about querying if a question redirects to another question is that a huge majority of questions don’t do this. So how do does Quora manage this better?
🧠 Query Smarter, Not Harder
Using the cache key (question_id) to check if a question redirects to another will usually return false. This is huge for all the questions quora has on their platform.
If a majority of questions don’t redirect, Quora decided to make cache ranges instead of cache keys. Querying this cache will return a list of which questions redirect to another question.
First find the cache range, then check the list of redirects.
(range_start, range_end) = [{question_Id: redirected_Question_Id}]
📈 Speak to me in Numbers
Optimizing querying for language use dropped queries-per-second by 90%.
Optimizing redirected question queries by 90%
Don’t drop the database. Don’t overload the database.
There’s more to caching than just (cache_key): {cache_value}
(Links to official article and sources are available to paid subscribers. They help maintain and support this newsletter!)
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.