tldr;
Problem : Slack encountered challenges in their CI/CD workflows with Git errors, queuing delays, and dependency issues, impacting the efficiency and reliability of their software development processes.
Solution : Slack introduced circuit breakers at the orchestration level of their CI/CD system to manage request flow, prevent failures, and optimize resource utilization.
Context
Before 2020, Slack used CD for deployment and CI for development, like every other company.
Slack uses an internal platform called Checkpoint to coordinate code builds, tests, deployments, and releases.
However, the number of developers and feature releases has increased along with Slack. At the same time, the load increased which caused service failures and cascading problems.
Flow
Here’s a breakdown on the flow
Test requests (labelled as "Test Requests") are submitted to
Checkpoint by the Git Application.
The test requests are queued up for processing by Checkpoint.
Checkpoint queues up the completed test requests for Jenkins after they have been processed.
From its queue, Jenkins retrieves the completed test requests and runs them.
Keep reading with a 7-day free trial
Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.