Anti-Flake Protection

Using optimistic merging and pending-failure-depth to protect your Merge Queue from flaky failures

Some CI jobs fail for reasons unrelated to a PR's code change, such as due to flaky tests or a CI runner disconnecting. These failures are usually cleared when the CI job is rerun. If a second PR that depends on the first does pass, it is very likely that the first PR was good and simply experienced a transient failure. Trunk Merge Queue can use the combination of Optimistic Merging and Pending Failure Depth to merge pull requests that would otherwise be rejected from the queue.

If you have a lot of flaky tests in your projects, you should track and fix them with Trunk Flaky Tests. Anti-flake protection helps reduce the impact of flaky tests but doesn't help you detect, track, and eliminate them.

In the video below, you can see an example of this anti-flake protection:

what's happening?

queue

A, B, C begin predictive testing

main <- A <- B+a <- C+ba

B fails testing

main <- A <- B+a <- C+ba

predictive failure depth keeps B from being evicted while C tests

main <- A <- B+a (hold) <- C+ba

C passes

main <- A <- B+a <- C+ba

optimistic merging allows A, B, C to merge

merge A B C

Optimistic Merging only works when the Pending Failure Depth is set to a value greater than zero. When zero or disabled, Merge will not hold any failed tests in the queue.

Enabling anti-flake protection

To protect your merge queue from flakes, enable anti-flake protection by enabling Optimistic Merge and setting Pending Failure Depth to an integer greater than 0 in the Merge UI settings:

What are the tradeoffs?

When optimistic merging is used, there is a small tradeoff to be made. You can get into a situation where an actually broken test in, say, change 'B' is corrected by a change in 'C'. In this case, if you later reverted to 'C', your build would be broken.

Last updated 4 months ago