🚀 Jetflow: Supercharging Cloudflare's Data Ingestion at Petabyte Scale
How Cloudflare's Business Intelligence team built a custom data ingestion framework to handle their growing data needs
As Cloudflare's business has grown, so has the volume and complexity of the data they need to ingest and process. Their existing Extract Load Transform (ELT) solution could no longer keep up, leading the Business Intelligence team to build their own custom data ingestion framework, Jetflow. In total, about 141 billion rows are ingested every day.
TLDR
Jetflow has delivered:
Over 100x efficiency improvement in GB-s: A 19 billion row job that took 48 hours and 300 GB of memory now completes in 5.5 hours using just 4 GB.
>10x performance improvement: Ingestion rates have increased from 60-80,000 rows/sec to 2-5 million rows/sec per database connection.
Extensibility: Jetflow's modular design makes it easy to add support for new data sources like ClickHouse, Postgres, Kafka, and more.
Why this matters
Cloudflare's data is business-critical, powering product decisions, growth planning, and internal monitoring. As their data has grown to petabyte scale, with thousands of tables ingested daily, their previous ELT solution could no longer keep up. Jetflow was born out of the need for a more performant, flexible, and extensible data ingestion framework.
Designing a flexible framework
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.