Managing detected flaky tests
A step-by-step guide for building an automated process to manage detected flaky tests.
It is important to have a follow-up process in place to manage detected flaky tests. A good process ensures that flaky tests do not slow down CI for your development team and prevents flakes from accumulating over time.
This guide walks through Trunk's recommended best practices for building a process around detected flaky tests in your organization.
Flaky tests will be automatically detected by Trunk after you:
Set up your test framework to produce test reports
Integrated with your CI provider to upload those reports on CI runs.
Go through these guides first to start detecting flaky tests.
Step 1: Create tickets for flaky tests
Creating Linear or Jira tickets for detected flaky tests helps to integrate flaky test fixes into your existing workflows.
Start by connecting to Linear or Jira. You can also set default labels or teams for flaky test tickets.
Once connected, you can click Create Ticket on a test detail page in Trunk. Trunk will create the ticket with context, including the test ID, flake rate, and the last failure stack trace and reason.
The ticket status and assignee will be visible on the test details page in Trunk, and these details will stay in sync with changes to the ticket.
Step 2: Broadcast flakes
It is important to keep the team informed on all status changes for flaky tests . This allows for fast follow-up when a test is marked as flaky.
Use the built-in Slack or Microsoft Teams webhook integrations to transform webhook payloads into messages.
Trunk's built-in templates help you get started and test the connection.
You can then customize the transformation to update the message format and content, including @-mentioning test owners so they can follow up right away.
Step 3: Mute monitors
If a flaky test has a known issue or a fix in progress, you can mute the monitor that flagged it. A muted monitor continues to run and record detections, but it does not contribute to the test's flaky status until the mute expires or is manually removed.
To mute a monitor:
Navigate to the test case detail page in the Trunk app.
Find the monitor that flagged the test.
Click Mute and select a duration.
1 hour
Quick suppression for transient issues
4 hours
Short-term suppression
24 hours
Suppress for a full day
7 days
Suppress for a week
30 days
Suppress for a month
Forever
Mute indefinitely until manually unmuted
The Forever option mutes the monitor with no expiration. The monitor stays muted until you explicitly unmute it from the test case detail page. This is useful when a test has a known flake that your team has accepted, or when a fix is planned but not yet scheduled.
For timed durations, the monitor automatically unmutes when the period expires. If the monitor is still detecting flaky behavior at that point, the test will be flagged as flaky again.
You can optionally provide a reason when muting a monitor. This helps your team understand why the monitor was muted and makes it easier to review muted monitors later.
You can unmute a monitor at any time from the test case detail page, regardless of the selected duration.
Muting suppresses the monitor's contribution to the test's status. If the muted monitor was the only active monitor for a test, the test transitions from flaky to healthy for the duration of the mute.
Step 4: Flag flaky tests
If automated detection hasn't caught a test you know is flaky, you can manually flag it as flaky from the test detail page. Flagged tests are treated as flaky regardless of automated detection state, and the flag can be removed at any time.
Step 5: Quarantine flaky tests
Flaky tests slow down CI and have a high negative impact on merge queue throughput. You can minimize or eliminate this CI slowdown by quarantining flaky tests at runtime.
Enable quarantining for your repo at Settings > your repo > Enable Test Quarantining.
Manually quarantine flaky tests by going to the test details page, clicking Quarantine, and setting the status to Always. Leave a comment detailing why you are quarantining this test to keep your team informed. The comment and quarantine status change will appear in the timeline on the test details page.
After quarantining a test, Trunk will ignore the test result (pass/fail) on CI runs, preventing this flaky test from failing CI.
Broken tests are not quarantine candidates. Only tests with a Flaky status are eligible for quarantine. If a test is marked as Broken (consistently failing at a high rate), it represents a real regression that should be investigated and fixed rather than hidden. See detection to understand the difference between flaky and broken tests.
Step 6: Automation
Trunk has webhooks and Flaky Tests APIs that can be used to build custom workflows around ticket creation, linking existing tickets to Trunk, sending notifications, and dealing with quarantined tests.
There is also built-in automation support that handles tasks such as assigning flaky test ownership, ticket creation, and quarantining (so that unblocking CI is not a manual process).
CODEOWNERSfiles can automatically assign ownership of test flakes.Tickets can be auto-created using webhooks as triggers, similar to Slack or MS Teams notifications.
Automatically quarantine flaky tests by enabling Settings > your repo > Auto-Quarantine Flaky Tests.
You can customize how flaky and quarantined tests are handled to suit your team and organization best.
Step 7: Review existing flakes and broken tests
It is important to track and triage existing flaky and broken tests over time. Trunk collects historical failure logs and stack traces, providing developers as much information as possible for debugging high-impact test failures.
Review all new flaky and broken tests to determine their impact and the urgency of a fix. Broken tests (consistently failing at a high rate) should typically be prioritized over flaky tests as they represent real regressions.
Review existing quarantined tests regularly to decide which tests should be fixed and which tests should be deleted from your test suite.
Trunk can send weekly email reports with information such as your total number of flaky tests and the number of PRs blocked, and how those numbers have changed week over week. Frequently failing tests will also be highlighted in the report. Reach out on Slack to ask about enabling weekly reports for your organization.
In summary: Build a process around managing flaky tests
Building processes for dealing with flaky tests helps decrease or eliminate their impact on CI and reduce the amount of developer time lost to debugging flakes and CI reruns.
Trunk allows you to customize this process to fit into your existing tooling and workflows, and automates manual tasks such as notifications and ticket creation.
Reach out to us on our community Slack to chat about how to structure a process for managing flaky tests across your team or organization.
Last updated
Was this helpful?

