Getting Started

Trunk Flaky Tests detects flaky tests by analyzing test results from your CI runs. Setup requires configuring test result output and CI upload integration.

Prerequisites

  • Ability to modify repository CI configuration and add secrets

  • Tests running in CI on both PRs and stable branches (e.g., main)

Step 1: Ensure JUnit XML output

Trunk ingests test results in JUnit XML format. If your CI already generates JUnit XML, note the file paths and skip to Step 2.

If not, configure your test frameworks to output JUnit XML:

  • See Test Frameworks for framework-specific configuration

  • Supports multiple frameworks simultaneously

Step 2: Configure CI uploads

Add test result uploads to all CI jobs that run tests.

  1. See CI Providers for integration instructions

  2. Configure uploads in jobs that run on:

    • Pull request branches

    • Stable branches (main, master, develop, etc.)

    • Merge queue branches (if applicable)

Uploads from both PRs and stable branches are required for accurate flaky test detection.

Step 3: Verify integration

  1. Push your changes and trigger a CI run

  2. Check CI logs for successful upload confirmation

  3. Results typically appear within a few minutes. Verify uploads appear at app.trunk.ioarrow-up-right → your repo → Flaky Tests > Uploads

Uploads tab

Step 4: Configure flake detection

After uploads are flowing, navigate to your repo → Flaky Tests > Monitors to set up detection.

Pass-on-retry is enabled by default and is the recommended baseline for everyone. It catches the most common flakiness pattern — a test that fails and then passes on retry within the same commit — without any configuration needed.

Threshold monitors let you detect flakiness based on failure rate over a rolling time window. How you configure them depends on your CI setup:

  • If tests must pass before merging to main, set up a threshold monitor scoped to main to catch an elevated failure rate. For example, if you run tests 5 times per day on main, a 24-hour rolling window with a minimum of 4 runs and a failure threshold of 25% is a reasonable starting point. This ensures the monitor has enough data before flagging anything.

  • If you use a merge queue, consider a dedicated monitor scoped to your merge queue branches (e.g., trunk-merge/* or gh-readonly-queue/*). Failures here are especially suspicious since the code has already passed PR checks, so a low threshold is appropriate.

How threshold monitors work →

Quarantining

Quarantining suppresses failures from known flaky tests, preventing them from forcing CI re-runs or blocking your merge queue. Flaky tests continue to run and report results — they just don't cause pipeline failures while your team works on fixes. This is especially valuable for unblocking merge queues and keeping development velocity high.

Configure Quarantining →

Last updated

Was this helpful?