Anti-Flake Protection
Using optimistic merging and pending-failure-depth to protect your Merge Queue from flaky failures
Some CI jobs fail for reasons unrelated to a PR's code change, such as due to flaky tests or a CI runner disconnecting. These failures are usually cleared when the CI job is rerun. If a second PR that depends on the first does pass, it is very likely that the first PR was good and simply experienced a transient failure. Trunk Merge Queue can use the combination of Optimistic Merging and Pending Failure Depth to merge pull requests that would otherwise be rejected from the queue.
If you have a lot of flaky tests in your projects, you should track and fix them with Trunk Flaky Tests. Anti-flake protection helps reduce the impact of flaky tests but doesn't help you detect, track, and eliminate them.
In the video below, you can see an example of this anti-flake protection:
A, B, C begin predictive testing
main
<- A <- B+a <- C+ba
B fails testing
main
<- A <- B+a <- C+ba
predictive failure depth keeps B from being evicted while C tests
main
<- A <- B+a (hold) <- C+ba
C passes
main
<- A <- B+a <- C+ba
optimistic merging allows A, B, C to merge
merge
A B C
Optimistic Merging only works when the Pending Failure Depth is set to a value greater than zero. When zero or disabled, Merge will not hold any failed tests in the queue.
Enabling anti-flake protection
Achieve anti-flake protection works by enabling Optimistic Merge and setting Pending Failure Depth greater than 0 in the Merge UI settings:
The Fine Print There is a small tradeoff to be made when optimistic merging is used. You can get into a situation where an actually broken test in say change 'B' is corrected by a change in 'C'. In this case if you later reverted 'C' your build would be broken.
Last updated