📚 tldr;
Canva needs to accurately record usage metrics on their templates and other resources.
⚙️ The Usage Problem
“How do you accurate record metrics for Canva templates”
Here’s a list of requirements from the official article:
Accuracy. The usage count should never be wrong, and we want to minimize issues such as data loss and overcounting because the income and trust of content creators are at stake.
Scalability. We need to store and process usage data with this large volume and exponential growth over time.
Operability. As usage data volume grows, the operational complexity of regular maintenance, incident handling, and recovery also increases.
🚩 Why is there a problem?
“Because we’re making a database trip for each and every single data point”
The deduplication process went through data one by one, easy to track and fix if there were problems but not efficient due to frequent database trips.
Batching isn’t going to help because O(N/batch size) is still O(N). And that isn’t a scalable solution when you have a billion data points
Using multiple threads would just complicate maintenance without significantly improving scalability.
📈 Explaining the Big Diagram
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.