How Canva Optimized 230 Petabytes of Data and Saved $3.6 Million While Supporting 100 Million Users
The smart storage strategy that kept costs low, performance high, and 100M users happy
🚀 TL;DR
Canva stores over 230 petabytes of user-generated content on AWS, using Amazon S3 for scalable, durable storage. As Canva scaled to 100M monthly users, cost-efficient data management became crucial. By leveraging S3 Glacier Instant Retrieval, they saved $3.6M annually while maintaining fast access for users.
📌 The Challenge: Storing and Managing Billions of Objects
Canva’s rapid growth demanded smarter storage solutions. Their previous strategy used:
✅ S3 Standard for frequently accessed content.
✅ S3 Standard-IA for user-generated uploads.
✅ S3 Glacier Flexible Retrieval for archival and backups.
But as data scaled, inefficiencies emerged, prompting a re-evaluation.
📊 What We’ll Cover
1️⃣ 🛠️ What Happened? (The shift to optimize storage costs)
2️⃣ 🛑 Root Causes (Understanding access patterns and storage classes)
3️⃣ 🤔 Lessons Learned (How Canva balanced cost vs. performance)
4️⃣ 🏗️ Mastering S3 Storage: Cost-Saving Tricks, Tools, Links, and Best Practices (Paid)
🔍 Background: Canva’s Data Challenge
With 15B+ designs created, Canva’s storage needs exploded. They needed a way to reduce costs without impacting user experience. Most user-generated content is accessed soon after creation, then rarely touched. AWS’s S3 Glacier Instant Retrieval provided a cost-effective solution with fast access.
💡 Key Stats:
📌 230PB+ total storage in Amazon S3
📌 80B+ objects migrated to S3 Glacier IR
📌 $3.6M in annual savings
📌 Break-even on transition cost in a few months
🛠️ What Happened?
Canva used S3 Storage Class Analysis to assess their data:
🚨 90% of data was in S3 Standard-IA but contributed only 30-40% of access volume.
🚨 Most data retrieval occurred within the first 15 days.
🚨 Transitioning all data blindly would cost $6M.
🛑 Root Causes
1️⃣ Over-Reliance on S3 Standard-IA
While cost-effective, IA storage was still costlier than necessary for infrequently accessed content.
2️⃣ High Object Transition Costs
Moving billions of objects incurs transition fees, requiring strategic migration.
3️⃣ Lack of Visibility into Access Patterns
Before S3 Storage Class Analysis, Canva had limited insights into retrieval behaviors.
🤔 Lessons Learned
1️⃣ Data Visibility is Critical
Use S3 Storage Class Analysis to track access trends before making storage decisions.
2️⃣ Optimize for Object Size
Canva prioritized buckets with 400KB+ objects for migration, ensuring faster ROI.
3️⃣ Plan for Transition Costs
A $1.6M upfront cost was necessary but paid off in 6 months.
🚀 Mastering S3 Storage: Cost-Saving Tricks, Tools, Links, and Best Practices
Optimizing S3 storage at scale requires a deep understanding of access patterns, cost structures, and retrieval needs. Whether you're managing petabytes of data or just starting with cloud optimization, these best practices will help you reduce costs, improve performance, and ensure long-term efficiency.
📌 1️⃣ Analyze Storage Patterns with AWS Tools
Before making any changes, leverage AWS tools to gain insights into your storage footprint:
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.