The Failure Count Monitor in Trunk Flaky Tests

The failure count monitor flags a test the moment it accumulates a configured number of failures on monitored branches within a rolling time window. Unlike the failure rate monitor, which requires a failure rate calculated over many runs, the failure count monitor reacts to individual failures without needing a minimum sample size or a percentage calculation. This makes it well-suited for stable branches like main where any test failure is unexpected and worth investigating immediately.

When to Use This Monitor

Use the failure count monitor when you want immediate visibility into test failures on branches that should be green. Common scenarios:

Stable branch alerting: Flag any test that fails on main, even once. On a branch where all tests should pass, a single failure is a meaningful signal.
Post-merge regression detection: Catch tests that start failing after a merge, before the failure rate accumulates enough data for a failure rate monitor to trigger.
High-confidence branches: Monitor merge queue or release branches where failures are suspicious by definition.

If you need to detect patterns of intermittent failure over time (e.g., a test that fails 20% of the time), use a failure rate monitor instead. If you want to catch tests that fail and then pass on retry within a single commit, pass-on-retry handles that automatically.

How It Works

The monitor counts the number of test failures on configured branches within a rolling time window. When a test reaches the configured failure count, the monitor activates and runs its configured action — by default, flagging the test as flaky or broken.

Example

You configure a failure count monitor with:

Setting	Value
Detection type	Broken
Failure count	1
Window	30 minutes
Resolution timeout	2 hours
Branches	`main`

A developer merges a change that breaks test_checkout. Here is what happens:

test_checkout fails on the next CI run on main.
The monitor sees 1 failure within the 30-minute window, which meets the configured failure count of 1.
test_checkout is immediately flagged as broken.
The developer identifies the issue and merges a correction.
Two hours pass with no new failures for test_checkout.
The monitor automatically resolves the test back to healthy.

If another test, test_signup, also failed during that window, it would be flagged independently. Each test is evaluated on its own.

Configuration

Failure Count

The number of failures required to trigger detection. The default is 1, meaning any single failure on a monitored branch flags the test. Setting this higher (e.g., 3) requires multiple failures before the monitor reacts. This is useful if you want to filter out one-off infrastructure blips while still catching tests that fail repeatedly in a short window.

Window Duration

The rolling time window over which failures are counted. Only test failures within this window contribute to the failure count. A shorter window (e.g., 30 minutes) limits detection to very recent failures. A longer window (e.g., 6 hours) catches failures that are spread out over time but still accumulating. The window should be long enough to capture the failures you care about but short enough that old failures roll off naturally. For a monitor with a failure count of 1, the window mainly controls how quickly a detection event is created after a failure. In practice, the pipeline evaluates frequently, so detection is near-immediate regardless of window size.

Resolution Timeout

How long a flagged test must go without any new failures before it is automatically resolved. This is the only way a failure count monitor resolves — there is no “recovery rate” or sample-based resolution like the failure rate monitor, and no stale timeout. If a test stops running entirely (e.g., it was deleted or renamed), it stays flagged until the resolution timeout elapses from its last observed failure. For example, with a resolution timeout of 2 hours, a test that was flagged at 3:00 PM will resolve at 5:00 PM if no new failures occur. If a new failure arrives at 4:30 PM, the clock resets, and the test will not resolve until 6:30 PM. The resolution timeout must be at least as long as the detection window. If the window is 30 minutes, the resolution timeout should be 30 minutes or longer. Choose a resolution timeout that gives your team enough time to verify a fix has landed. A short timeout (e.g., 30 minutes) resolves quickly but may prematurely clear tests that fail intermittently. A longer timeout (e.g., 24 hours) is more conservative and ensures the test stays flagged until it has been clean for a full day.

Branch Scope

Which branches the monitor evaluates. You can specify branch names or glob patterns. Only test failures on matching branches count toward the failure count. Branch patterns work the same way as failure rate monitor branch patterns, including glob syntax and merge queue patterns. Refer to that section for pattern syntax, examples, and tips.

Action

What happens when the monitor activates on a test. You pick the action at creation and can switch it at any time. Classify test status (default). The test’s status is set according to the monitor’s detection type, and restored to healthy when the monitor resolves. The detection type is either:

Flaky — appropriate when failures on the monitored branch are likely non-deterministic. A test that fails once on main but passes on retry is probably flaky.
Broken — appropriate when failures indicate a real regression. If a test fails on main and you expect it to keep failing until someone fixes it, broken is the right classification.

Apply labels. The configured labels are added to the test while the monitor is active. The test’s health status is not changed by this monitor. See Automatic labeling from monitors for how to configure and what to expect.

Preview Panel

When you create or edit a failure count monitor, a Preview panel appears on the right side of the dialog on larger screens. The preview updates as you adjust the monitor’s settings, giving you a live look at what the monitor would detect against your current branch data. Once the monitor configuration produces detections, the panel shows a Failing tests list. Each row displays the test name as a link to its detail page, along with its failure count. Counts that meet or exceed your configured failure count are highlighted in red; counts below appear in muted text. You can search the list by test name or parent test name. The search is case-insensitive and filters as you type. If no tests match your search term, the list shows a “No tests match” message. When more than 100 tests are detected, only the first 100 are shown with a notice to narrow your search.

Status Filter

A status filter dropdown in the preview panel lets you filter the test list to any combination of statuses: Healthy, Flaky, and Broken. By default, all statuses are shown. Filtering to Healthy is the most useful view: it shows tests that are currently healthy but would be flagged by this monitor if created with the current settings. This lets you see the new coverage the monitor adds without noise from tests already detected by other monitors. Selecting multiple statuses (for example, Healthy and Flaky) shows tests matching any of the selected statuses. When a status filter is active, the info tooltip in the panel header shows “X of Y tests” to indicate how many tests are visible relative to the total that match the monitor configuration. If no tests match the active filter, the empty state includes a hint to clear the filter.

Large Repo Truncation

For repositories with a large number of matching tests, preview results may be truncated. When this happens, an amber warning appears in the panel. The truncation applies to the list of tests shown, not to the underlying detection logic — the monitor evaluates all matching tests when active.

Muting

You can temporarily mute a failure count monitor for a specific test case. See Muting monitors for details.

Choosing Between Monitors

Scenario	Recommended monitor
Any failure on `main` should be flagged immediately	Failure count with count = 1
Tests failing at an elevated rate over many runs	Threshold with appropriate activation percentage
A test fails then passes on retry in the same commit	Pass-on-retry (enabled by default)
Consistently failing tests (80%+ failure rate)	Threshold with broken detection type
Quick alerting on merge queue failures	Failure count scoped to merge queue branches

​When to Use This Monitor

​How It Works

​Example

​Configuration

​Failure Count

​Window Duration

​Resolution Timeout

​Branch Scope

​Action

​Preview Panel

​Status Filter

​Large Repo Truncation

​Muting

​Choosing Between Monitors