đż Inside Netflixâs Radical Shift to a Single Foundation Model
Scaling personalized recommendations from siloed models to a single foundation engine
đ Welcome to the 297 new Byte-Sized Design subscribers since our last edition, glad to have you here!
This week, weâve got something special: a guest post from the State of AI newsletter. (Give them a follow and subscribe if youâre interested in frontier AI research) This edition dives into how Netflix made a bold shift to a foundational model for their recommendation system, and the massive impact it had on performance, personalization, and architecture.
You wonât want to miss it, letâs dive in!
đ¨ TL;DR
Netflix was juggling a swarm of specialized models to recommend content: one for your homepage, another for notifications, another for the âBecause You Watchedâ row. Each was trained separately, each optimized in isolation. This worked until the costs, complexity, and inconsistencies became impossible to manage.
They rebuilt from the ground up: a single foundation model trained on the full timeline of user interaction across the platform. Instead of learning short-term behavior, this system learns long-term intent. It can generate predictions in milliseconds, adapt to new titles without training data, and serve as a shared source of embeddings for downstream teams.
This edition breaks down how Netflix structured the system, where they hit limits, and what challenges youâd face applying foundation models to recommendation at scale.
đ§ The Old Architecture: A Model for Every Use Case
For years, Netflix operated what youâd expect from a mature recommender system: many models, each designed for a narrow goal.
Ranking notifications: Personalized based on recent watch history
Homepage rows: Top Picks, Continue Watching, Trending each ranked by separate pipelines
Search re-ranking: Suggestions fine-tuned by intent
While the models were performant in isolation, the cracks showed over time:
Inconsistent personalization: Two parts of the UI might recommend completely different genres
Repeated feature engineering: Same features rebuilt in multiple training pipelines
Costly innovation: Improvements in one model rarely transferred to others
Short-term bias: Most models only used recent activity due to latency limits
In short, the personalization system didnât scale with the platform or its audience.
đ§ą The Foundation Model Approach
Instead of patching the existing system, Netflix moved to a foundation model paradigmâtraining a single, large model on the entirety of each userâs interaction history.
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.