Byte-Sized Design

Byte-Sized Design

Share this post

Byte-Sized Design
Byte-Sized Design
AI-Powered Test Automation: 30% Faster Fixes at Salesforce

AI-Powered Test Automation: 30% Faster Fixes at Salesforce

AI-powered Test Failure Triage Agents

Byte-Sized Design's avatar
Byte-Sized Design
Aug 25, 2025
∙ Paid
4

Share this post

Byte-Sized Design
Byte-Sized Design
AI-Powered Test Automation: 30% Faster Fixes at Salesforce
2
Share

TL;DR


Salesforce’s Platform Quality Engineering team built an AI-powered Test Failure (TF) Triage Agent to handle 150K+ monthly test failures across 6M daily tests. By pairing FAISS-based semantic search with LLM reasoning and layering AI insights with historical fix data, they reduced failure resolution time by 30%, scaling from a 20-person pilot to 500+ engineers.


150K Failures and Developer Burnout

Imagine shipping code in an environment with 6M tests running daily across 78B test combinations.

Before this project, engineers burned hours sifting through logs, changelogs, and failure patterns to answer basic questions:

  • Is this a flaky test?

  • Did another team’s change break it?

  • Should I wait, retry, or fix?

With 30K engineers pushing code, failures piled up faster than teams could triage them. Average resolution time? Seven days. Developer trust? Low. Burnout? Rising.


AI With Context, Not Guesswork

The team didn’t just drop in a generic LLM and hope for the best.

They built asynchronous pipelines to process test failure data in real time without slowing CI/CD. At the core:

  • FAISS-based semantic search over historical test failures for sub-30s lookups

  • Contextual embeddings of stack traces, code snippets, and changelists

  • LLM reasoning layered on top to narrow fixes with precision

Instead of asking, “Why did this fail?” the system provided the exact file, feature, and recent changes. Developers saw context-driven suggestions tied directly to past fixes avoiding the usual AI “hallucination” problem.


Building Trust: Incremental Rollout & Human-in-the-Loop

AI tools live or die by developer trust. Too many false positives, and people revert to manual debugging.

Salesforce started small:

  1. 20-person pilot → measured accuracy & adoption

  2. Focused scrum teams → validated improvements

  3. 500+ engineers → full AI Application Development Cloud rollout

Developers reported the highest confidence in features that surfaced the most likely changelist causing the break, letting them skip log-hunting entirely.


The Results: 30% Faster Resolution Times

Keep reading with a 7-day free trial

Subscribe to Byte-Sized Design to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Byte-Sized Design
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share