🍿 Inside Netflix’s Radical Shift to a Single Foundation Model

Scaling personalized recommendations from siloed models to a single foundation engine

Apr 30, 2025

∙ Paid

👋 Welcome to the 297 new Byte-Sized Design subscribers since our last edition, glad to have you here!

This week, we’ve got something special: a guest post from the State of AI newsletter. (Give them a follow and subscribe if you’re interested in frontier AI research) This edition dives into how Netflix made a bold shift to a foundational model for their recommendation system, and the massive impact it had on performance, personalization, and architecture.

You won’t want to miss it, let’s dive in!

🚨 TL;DR

Netflix was juggling a swarm of specialized models to recommend content: one for your homepage, another for notifications, another for the “Because You Watched” row. Each was trained separately, each optimized in isolation. This worked until the costs, complexity, and inconsistencies became impossible to manage.

They rebuilt from the ground up: a single foundation model trained on the full timeline of user interaction across the platform. Instead of learning short-term behavior, this system learns long-term intent. It can generate predictions in milliseconds, adapt to new titles without training data, and serve as a shared source of embeddings for downstream teams.

This edition breaks down how Netflix structured the system, where they hit limits, and what challenges you’d face applying foundation models to recommendation at scale.

🧠 The Old Architecture: A Model for Every Use Case

For years, Netflix operated what you’d expect from a mature recommender system: many models, each designed for a narrow goal.

Ranking notifications: Personalized based on recent watch history
Homepage rows: Top Picks, Continue Watching, Trending each ranked by separate pipelines
Search re-ranking: Suggestions fine-tuned by intent

While the models were performant in isolation, the cracks showed over time:

Inconsistent personalization: Two parts of the UI might recommend completely different genres
Repeated feature engineering: Same features rebuilt in multiple training pipelines
Costly innovation: Improvements in one model rarely transferred to others
Short-term bias: Most models only used recent activity due to latency limits

In short, the personalization system didn’t scale with the platform or its audience.

🧱 The Foundation Model Approach

Instead of patching the existing system, Netflix moved to a foundation model paradigm—training a single, large model on the entirety of each user’s interaction history.

Keep reading with a 7-day free trial

Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.

A guest post by

State of AI

Making frontier AI research more accessible