Anti-Flake Protection

Using optimistic merging and pending-failure-depth to protect your Merge Queue from flaky failures

Some CI jobs fail for reasons unrelated to a PR's code change, such as due to flaky tests or a CI runner disconnecting. These failures are usually cleared when the CI job is rerun. If a second PR that depends on the first does pass, it is very likely that the first PR was good and simply experienced a transient failure. Trunk Merge Queue can use the combination of Optimistic Merging and Pending Failure Depth to merge pull requests that would otherwise be rejected from the queue.

If you have a lot of flaky tests in your projects, you should track and fix them with Trunk Flaky Tests. Anti-flake protection helps reduce the impact of flaky tests but doesn't help you detect, track, and eliminate them.

In the video below, you can see an example of this anti-flake protection:

anti-flake protection with optimistic merging + pending failure depth
what's happening?
queue

A, B, C begin predictive testing

main <- A <- B+a <- C+ba

B fails testing

main <- A <- B+a <- C+ba

predictive failure depth keeps B from being evicted while C tests

main <- A <- B+a (hold) <- C+ba

C passes

main <- A <- B+a <- C+ba

optimistic merging allows A, B, C to merge

merge A B C

Optimistic Merging only works when the Pending Failure Depth is set to a value greater than zero. When zero or disabled, Merge will not hold any failed tests in the queue.

Enabling anti-flake protection

Achieve anti-flake protection works by enabling Optimistic Merge and setting Pending Failure Depth greater than 0 in the Merge UI settings:

setting to enable anti-flake protection

The Fine Print There is a small tradeoff to be made when optimistic merging is used. You can get into a situation where an actually broken test in say change 'B' is corrected by a change in 'C'. In this case if you later reverted 'C' your build would be broken.

Last updated