70% Latency Decrease: How Squarespace Sped Up Their Media Uploads
How a Simple Write-Back Cache Saved SquareSpace from a GCS Bottleneck
A warm hello to our 223 new subscribers! 👋
Today we’re diving into a clever rewrite from Squarespace, one that slashed latencies, removed brittle code, and made their asset library a lot faster for end users.
Let’s talk about Alexandria. No, not the Egyptian library, the one that powers every photo and video you upload while building a website on Squarespace.
The Write Path That Hit a Wall
It started like many performance-focused backend stories do: with reads. Fast reads, smart caching, only load what's needed, and keep things cheap with object storage like GCS. For Squarespace's Alexandria service (the engine behind users uploading, organizing, and reusing their media assets) it worked well.
The backend stored metadata-rich segment files, manifest headers, and trash bins for soft-deleted assets, all in Google Cloud Storage. When a user interacted with a library, Alexandria pulled it into memory. The read flow was lean: memory-speed fast, minimal cost.
But writing? That’s where things got complicated.
What Happens When You Click "Upload"
Users expect that uploading a file, or deleting one, just works. But every time someone updates their asset library, Alexandria had to write to one or more GCS objects. Header, segment, trash. All had to be updated immediately to ensure consistency.
GCS, as reliable as it is, has a constraint: you can only write to the same object once per second. That’s more than enough for archival use, but not ideal when someone drags in 30 product images in one go or when a bulk import kicks off.
At first, Alexandria tried to stay within the lines. They built write-coalescing logic to group changes and delay writes, essentially buffering the updates until it was safe to commit them.
It solved the surface problem. But created new ones: code that was hard to understand, fragile under test, and deeply coupled to timing assumptions. The write latency had unpredictable spikes, and uploads sometimes just stalled out.
So What About a Queue?
A common pattern might be to push all changes into a job queue for async processing. But Alexandria didn’t work on global state, it worked at the library level. Each user library was effectively a small isolated database. Changes were scoped locally.
Using a queue would mean either building one global queue (and wasting time filtering by library) or creating one queue per library (millions of them). Neither made sense.
They needed something more targeted.
The Fix
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.