Pending failure depth
What it is
Pending failure depth allows pull requests to wait until other pull requests behind them in the queue complete testing before getting removed from the queue.
By default, a PR that fails testing will be evicted from the queue. The Pending Failure Depth feature allows a failed PR to remain in the queue for pull requests behind it so that testing can be finished before this eviction occurs. The number of PRs that the queue will wait for is the Pending Failure Depth. This depth is configurable and reflects the number of pull requests behind this one that should complete testing before eviction is assessed.
Why use it
Prevent queue stalls - When a PR fails, the queue doesn't grind to a halt. Other PRs continue testing, assuming the failure was isolated. Keeps merge velocity high even during issues.
Faster failure recovery - If PR #3 fails but PR #4 fixes the issue, both can be processed quickly because they tested in parallel. Without pending failure depth, you'd wait for #3 to fail, then wait for #4 to test sequentially.
Optimize for your team size - Small teams benefit from lower values (fewer wasted tests), large teams benefit from higher values (maintain throughput despite occasional failures).
Balance risk vs. throughput - Tune the setting to match your team's tolerance for wasted CI resources vs. need for high queue velocity.
How to enable
Configure Pending Failure Depth in Settings > Repositories > your repository > Merge Queue > select a value from the Pending Failure Depth dropdown.
Configuration options
Start with a small value and observe:
If your queue frequently stalls when PRs fail → Increase value
If you see lots of wasted test runs (many PRs test then all fail) → Decrease value
If your CI infrastructure is constrained → Use lower value (3-4)
If you have abundant CI capacity → Use higher value (7-10)
Verify it's working
When a PR fails, watch for:
✅ Multiple other PRs continue testing (up to your configured depth)
✅ Queue doesn't stop entirely
✅ Failed PR is removed, but others keep going
Tradeoffs and considerations
What you gain
Queue never fully stops - Failures don't block all subsequent PRs
Faster recovery - Independent PRs can merge while others fail
Tunable throughput - Adjust for your team's needs
Better CI utilization - Tests keep running instead of stopping
What you give up or risk
Wasted CI resources - PRs may test against a state that includes failing PRs, then need to retest
Cascading failures - If one PR breaks something, multiple subsequent PRs might fail before the issue is caught
Complexity - More PRs testing simultaneously = harder to understand queue state
When to decrease pending failure depth
Lower the value (3-4) if:
Your tests are flaky (>5% flake rate) - Flaky tests cause false failures, leading to wasted retests
CI resources are expensive/limited - Lower parallelism reduces waste
PRs frequently conflict - Related changes often fail together, so testing them in parallel wastes resources
You're seeing excessive retests - Many PRs testing, failing, retesting pattern
When to increase pending failure depth
Raise the value (7-10) if:
Your queue stalls frequently when PRs fail - Low depth is blocking throughput
PRs are mostly independent - Failures are isolated, not cascading
You have abundant CI capacity - Waste isn't a concern
Large team, high PR volume - Need parallelism to maintain velocity
Understanding the cost
Example cost calculation:
Scenario: Pending failure depth = 5, PR #101 fails testing
PRs #102, #103, #104, #105, #106 all test against a state including #101
All 5 fail because #101 broke something
All 5 retest after #101 is removed
Wasted: 5 test runs
But consider:
Without pending failure depth, PRs would test sequentially (much slower)
In most cases, failures ARE independent, so PRs merge successfully
Occasional waste is preferable to frequent queue stalls
Typical waste rate: 5-10% of test runs are wasted retests in well-configured queues
Common misconceptions
Misconception: "Higher pending failure depth always means faster queue"
Reality: Too high = wasted CI resources and cascading failures. Too low = queue stalls. The sweet spot depends on your team size and test stability.
Misconception: "Pending failure depth should be set to 1 to avoid waste"
Reality: Value of 1 means queue stops on every failure (defeats the purpose of predictive testing). Start at 5 and adjust.
Misconception: "This setting isn't important"
Reality: Poorly tuned pending failure depth can either waste significant CI resources or cause frequent queue stalls. It's worth monitoring and adjusting.
Next Steps
Initial setup:
Start with a small value (2)
Monitor queue behavior
Check metrics for wasted test runs
Adjust based on observations
Optimize the value:
Queue stalls frequently? → Increase depth
Excessive retests (>15%)? → Decrease depth
Make small adjustments and observe impact
Monitor performance:
Metrics and monitoring - Track retest rate and queue throughput
Watch for patterns: Do failures cascade? Are they independent?
Adjust pending failure depth based on data
Combine with other optimizations:
Anti-flake protection - Reduce false failures first
Batching - Understand how pending failure depth affects batch splitting
Predictive testing - Read the full explanation of how these work together
Troubleshooting:
Too many wasted tests → Lower pending failure depth
Queue stops on every failure → Increase pending failure depth
Unclear which value to use → Start at 2, monitor for a week
Last updated

