Skip to main content
Trunk Flaky Tests detects flaky tests by analyzing test results. The health of your tests is displayed in the Flaky Tests dashboard.
Press K (macOS) or Ctrl K (Windows and Linux) anywhere in the Trunk app to open the command palette. Start typing to jump to Merge Queue, Flaky Tests, your account settings, or any connected repository by name.

Repositories overview

When you navigate to /<your-org>/flaky-tests, you land on a repositories overview showing all monitored repositories at a glance. Each repository row displays:
ColumnDescription
TestsTotal tracked test cases in the repository (60-day window)
FlakyNumber of currently flaky test cases, with a 10-day trend sparkline
BrokenNumber of currently broken test cases, with a 10-day trend sparkline
Runs / DayBar chart of test run volume over the last 10 days, with per-day tooltips
A quarantine status icon appears next to each repository name when quarantining is configured:
IconMeaning
ShieldQuarantining is enabled for this repository — auto-quarantine is off
Shield with checkmarkAuto-quarantine is enabled — flaky tests are quarantined automatically
Active repositories (with test data in the last 30 days) appear at the top of the list. Repositories with no recent data are collapsed under an Inactive Repositories section that you can expand to view. Selecting a repository opens its detailed dashboard. If your organization has no repositories connected yet, the page redirects to onboarding. See Quarantining to learn how to configure quarantine settings.

Key repository metrics

Trunk Flaky Tests provides key repo metrics based on the detected health status of your tests. You’ll find metrics for the following information at the top of the Flaky Tests dashboard.
MetricDescription
Flaky testsNumber of flaky test cases in your repo.
PRs blocked by failed testsPRs that have been blocked by failed tests in CI.
These numbers are important for understanding the overall health of your repo’s tests, how flaky tests impact your developer productivity, and the developer hours saved from quarantining tests. You can also view the trends in these numbers in the trend charts. The trend charts display the New Test Cases added by day, as well as Test Transitions and Quarantined Runs. Test Transitions represent the number of tests that have transitioned to a particular status on a particular day, excluding new test cases (which default to a status of Healthy). If a bar shows 5 Healthy, 10 Flaky, and 2 Broken on a single day, that indicates 5 tests transitioned to Healthy, 10 to Flaky, and 2 to Broken on that day. Quarantined Runs represents the number of runs of quarantined tests by day.

Tests cases overview

You can view a table of all your test cases and their current status in Trunk Flaky Tests. Filters can also be set on the table to narrow test results down by test status, quarantine setting, ticket status, or by the name, file, or suite name of the test case. The table is sorted by default by the number of PRs impacted by the case, which is the best way to measure the impact of a flaky test. You can click on each test case to view the test case’s details.
ColumnDescription
TestsThe variant, file path, and name of the test case.
StatusThe health status of the test case: Healthy, Flaky, or Broken. Broken indicates consistent high-rate failures; Flaky indicates intermittent failures.
Failure RateThe percentage of CI runs failed due to this test case.
PRs ImpactedThe number of PRs that have been affected by this test case failing in CI.
Last RunThe most recent timestamp for an upload test run.
Test Deletion & History
  • Inactive tests disappear from the dashboard automatically after 30 days and are fully removed after 45 days. Tests cannot be manually deleted.
  • Changing test identifiers (e.g., adding file paths) creates new test entries — merging with old history isn’t supported.

Test case details

You can click on any of the test cases listed on the Flaky Tests dashboard to access the test case’s details. The test details page uses a tabbed layout:
  • Summary: Run result charts and failure types grouped by unique failure reason.
  • Test History: A searchable, paginated table of every individual test run with filtering and a detail panel.
  • Monitors: Detection monitors configured for this test (visible when the detection engine is enabled).
  • Events: A timeline of detection events, quarantine actions, ticketing events, and status transitions (Healthy, Flaky, Broken) for this test (visible when the detection engine is enabled). Use the category filter to scope to Flake Detection events to see which monitor triggered each transition.
In addition to the tabbed content, the test details page shows the test’s current status (Healthy, Flaky, or Broken), ticket status, and codeowner information. The Monitors tab opens on a monitor history swimlane: one row per monitor, each showing a 30-day timeline of the classifications it produced. Bars are colored by status (or by the label’s color for label monitors). By default the swimlane shows only monitors with recent activity; the Show inactive and disabled monitors toggle reveals rows for disabled monitors and monitors with no events in the last 30 days.
A swimlane timeline on the Monitors tab with rows for Test Status, a failure-rate monitor, a pass-on-retry monitor, and a second failure-rate monitor, spanning May 27 to June 26.

Code owners

If you have a codeowners file configured in your repos, you will see who owns each flaky test in the test details view. We support code owners for GitHub and GitLab repos.
This information will also be provided when creating a ticket with the Jira integration or webhooks.

Summary tab

The Summary tab shows an overview of the test’s recent run results and groups past failures by unique failure type.

Failure types

The Failure Types table shows the history of past test runs grouped by unique failure types. The Failure Type is a summary of the stack trace of the test run. You can click on the failure type to see a list of test runs labeled by branch, PR, Author, CI Job link, duration, and time.

Failure details

You can click on any of these test runs to see the detailed stack trace:
You can flip through the stack traces of similar failures across different test runs by clicking the left and right arrow buttons. You can also see other similar failures on this and other tests. Go to the CI job logs. If you want to see full logging of the original CI job for an individual test failure, you can click Logs in the expanded failure details panel to go to the job’s page in your CI provider.

Test History tab

The Test History tab gives you full visibility into every individual run of a test. Use it to investigate patterns across branches, find specific failing runs, and drill into error details.
The Test History tab with a result and quarantine filter bar, a daily runs chart, and a paginated table of individual test runs.

Daily runs chart

A stacked bar chart at the top of the tab shows daily test run counts. The legend identifies four categories:
  • Green: Pass
  • Red: Fail
  • Blue: Quarantined
  • Gray: Skipped
Click and drag on the chart to select a date range, which scopes the table below to runs from the selected days. The selected range appears next to the legend with an X button to clear just the range. The Reset button on the filter bar clears all filters at once, including the date range. The Result and Quarantined filters from the filter bar also apply to the chart bars. When you filter to only passing runs, for example, the chart shows only green (Pass) bars. The chart and table always reflect the same set of runs.

Filters

A filter bar below the chart provides four independent controls:
FilterDescription
ResultSegmented control with All, Pass, and Fail to scope the table to a specific outcome.
QuarantinedSegmented control with Include (default), Exclude, and Only to control whether quarantined runs are mixed in, hidden, or shown exclusively.
SHAFilter by commit hash. Matches runs whose SHA starts with the entered text.
BranchFilter by branch name. Accepts exact names or glob patterns. Use * to match any sequence of characters and ? to match a single character.
Branch filter examples:
PatternMatches
mainThe branch named main exactly
release/*All release branches, e.g. release/1.0, release/2.3
feature-??Feature branches with a two-character suffix, e.g. feature-v2
trunk-merge/*All merge queue branches
All filters combine using AND logic, so you can use them together. For example, set Result to Fail and Quarantined to Only to surface only quarantined failures. The Reset button clears every filter at once, including the chart date range. Filter state is saved in the URL, so you can share or bookmark a filtered view. The Result filter accepts result=pass or result=fail. The Quarantined filter accepts quarantined=include, quarantined=exclude, or quarantined=only.

Runs table

The runs table displays a paginated list of individual test runs (25 per page) with the following columns:
ColumnDescription
TimestampWhen the test ran, displayed in your local time zone.
DurationHow long the test took to execute.
PRThe pull request number associated with the run, e.g. #1234. Empty for runs that aren’t tied to a PR.
BranchThe branch the test ran against, e.g. main, feature/x, or trunk-merge/pr-1234/... for merge queue branches.
CommitThe first 7 characters of the commit SHA.
Each row has a colored left border indicating the run’s outcome. Quarantined runs always show blue, regardless of whether the run passed or failed. For non-quarantined runs, the border is green for pass, red for fail, orange for error, and a neutral gray for any other state.

Run detail panel

Click any row in the runs table to open a detail panel on the right side of the page. The panel shows:
  • Run header: Timestamp, a result badge (Pass, Fail, Error, or Quarantined), and run duration.
  • Source control: A CI job link (with the provider’s icon, the job name, and the CI duration), the linked pull request, branch, and commit. Merge queue runs also include a View in Merge Queue link.
  • Error details: For failed, errored, or quarantined runs, an optional AI summary of the failure followed by the raw error text or stack trace.

Debugging a flaky test from the UI

The Summary tab, failure details, and Test History tab give you most of what you need to investigate a flaky test. A few gaps come up often enough to call out, along with the workarounds that exist today.

Drilling into the right parallel worker

The CI job link in the run detail panel points to the parent build, not the specific worker that produced the failure. If your CI runs a single job, this is fine. If you fan out across many parallel workers (some customers run 40+), you’ll have to click through workers in the CI provider to find the one whose log contains the failure. To shortcut this, capture the per-worker URL at test run time and include it in your JUnit output so it surfaces in the failure detail. Most CI providers expose an environment variable for the running job’s URL:
CI providerEnvironment variable
CircleCICIRCLE_BUILD_URL
GitHub Actions${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}
BuildkiteBUILDKITE_BUILD_URL
GitLab CICI_JOB_URL
Read the value at the start of the test job and append it to a test property, log line, or system-out block in your JUnit XML. The link then appears alongside the failure in Trunk instead of routing to the parent build.

Bundle Upload ID lookups

When the trunk-analytics-cli uploads a bundle, it prints a Bundle Upload ID to the job log. This ID does not currently map to a URL in the web app — there’s no search field for it in the dashboard. If you need to trace a specific upload back to its run data, contact support with the Bundle Upload ID.
Uploads are processed periodically, not in real time, so a run you just uploaded may not appear immediately. If it isn’t visible yet, wait and refresh before assuming the upload failed.

CI artifact retention

CI providers typically retain build artifacts (screenshots, videos, traces) for one to two days. Flaky test tickets often take longer than that to investigate and resolve, which means the artifacts that would have helped explain the failure may already be gone by the time you open the ticket. If artifacts matter for your debugging flow, store them outside the CI provider’s retention window:
  • Upload screenshots, videos, and traces to S3 (or another long-lived store) as a CI step, and include the object URL in the JUnit output alongside the per-worker URL described above.
  • For especially noisy tests, attach the artifact URL to the Jira/Linear ticket Trunk opens via the ticketing integration.

Known limitations

A few framework-specific quirks are worth knowing about up front.
RSpec MultipleExceptionErrorWhen an example raises multiple exceptions (for example, an error in the test body plus a separate error in an after hook), RSpec wraps them in RSpec::Core::MultipleExceptionError. The rspec_trunk_flaky_tests gem already uploads the full set of captured exceptions; the failure detail view currently renders only one of them. To see all of them, check the CI job logs for the full exception list when you encounter this error type.
Go subtests with 0ms durationgo test reports the top-level test duration but does not always emit per-subtest durations. When a subtest’s duration is reported as 0ms, timing-based signals — including the AI failure analysis — have less to work with for that case. If timing context matters for your investigation, post-process your JUnit XML to reflect real subtest durations before uploading, or rely on the raw stack trace rather than the AI summary.