Byte-Sized Design

Byte-Sized Design

Share this post

Byte-Sized Design
Byte-Sized Design
Lyft Tests In Production ๐Ÿš—๐Ÿ’จ

Lyft Tests In Production ๐Ÿš—๐Ÿ’จ

The best way to know if it works in production is to put it in production.

Jul 10, 2023
โˆ™ Paid
7

Share this post

Byte-Sized Design
Byte-Sized Design
Lyft Tests In Production ๐Ÿš—๐Ÿ’จ
Share

TLDR;

Lyft load tests in production. It was too expensive to have production scale in a staging environment so Lyft thought of testing in production.

It works! They made this work simulating rides using a config on their production environment.

So whatโ€™s wrong with testing in Staging?

Lyft needs to load test, especially when it gets huge traffic during real world events. Events like Super Bowl, New Years Eve parties, and graduation events.

Most companies simulate traffic in a staging environment and checks if their staging environment handles that correctly. But Lyft doesnโ€™t do that here.

Hereโ€™s the problems with testing in staging for Lyft

1. Itโ€™s Expensive

Itโ€™s actually expected that staging environments canโ€™t handle the same load as their production environments. Lyft wanted more realistic results and that means scaling their staging environment to the same capacity as production.

Thatโ€™s expensive and Lyft thought it just made more sense to test directly in production.

2. Replaying Traffic is Risky.

Nobody wantโ€™s to get double charged, ever. Replaying production traffic in staging could replay the customer charges (and driver payouts). So Lyft chose to not replay real traffic.

3. Accurate Results.

Staging environments have test data, and other hacking data configurations. Errors in load tests in staging could be false negatives or positives. The most accurate results come from prod.

Ok, theyโ€™re in Prod. Howโ€™d they do it?

Lyft tested their services by simulating traffic.

The flow looks like this.

  • Have a test bot make a ton of calls to the Simulated Rides API

  • Make that SimulatedRides API handle requests from the SimulationTable.

  • Output those responses on the SimulatedRides Web UI.

  • Let the resource management service clean up resources like drivers, riders, scooters when the test is done.

It Runs The Same Test Everytime?

Thereโ€™s a config for the SimulatedRides service that defines random events a user can take. This creates a decision tree. As an example, there can be a 50% chance the simulation closes the app after opening and checking the prices.

It also defines other things like how many riders and drivers are available, and the odds theyโ€™ll select a specific product.


{
   "name": "chicago",
   "client_configurations": {
       "region": "chicago",
       "rider_close_app_after_price_check_percent": 1,
       "rider_cancel_after_accepting_ride_percent": 10,
       "driver_cancel_after_accepting_ride_percent": 5,
   },
   "client_composition": [
       {
           "client_type": "rider",
           "number": 50,
           "behaviors": {
               "shared_ride": 25,
               "standard_ride": 65
               "luxury_ride": 5,
               "luxury_ride_suv": 5,
           }
       },
       {
           "client_type": "driver",
           "number": 50,
           "behaviors": {
               "standard_ride": 100
           }
       }
   ], 
}

Just change the configs and youโ€™ll have a new load test.

But What Are the Drawbacks?

The thing about testing in production is thatโ€™s itโ€™s pretty dangerous. Lyft knows this so hereโ€™s how they make it safer.

Thereโ€™s always a human watching and managing the load tests. Itโ€™s not automated so well that alerts can let the production load tests run rent-free.

Itโ€™s an internal tool. This means the public doesnโ€™t have experience using a tool like this and everyone who joins Lyft will have to onboard on to the tool with 0 background knowledge.

Quiz questions, answers, and Official Article!

(Links to official article and sources are available to paid subscribers. They help maintain and support this newsletter!)

5 questions the official article answers:

  1. What are clients in the context of SimulatedRides, and what role do they play?

  2. What are behaviors in SimulatedRides, and how do they influence the actions of clients?

  3. How do actions contribute to the functionality of SimulatedRides, and what can engineers configure with them?

Keep reading with a 7-day free trial

Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
ยฉ 2025 Byte-Sized Design
Privacy โˆ™ Terms โˆ™ Collection notice
Start writingGet the app
Substack is the home for great culture

Share