Systems & Transformation

The Stability Test Your Forecasts Are Failing

22 Mar 2026·8 min

Six weeks of confident P85 forecasts. Wrong five out of six times. Not by a small margin — items that the P85 said would complete by week eight were finishing in week twelve. Items forecast for week twelve finished in week six.

The percentile calculations were correct. The methodology was sound. The data they were calculated from was not stable. Every confidence interval was a precise description of nothing.

What stability means here

A stable process is not a fast one, or a smooth one, or a well-managed one. It is a process whose natural variation falls within predictable limits — one that is in statistical control.

The distinction matters because stability is the precondition for forecasting, not a measure of performance. A slow stable process produces reliable forecasts. A fast unstable process produces forecasts that look precise and are systematically wrong.

When cycle time data is stable, the historical percentiles describe a real distribution that future items will likely be drawn from. The P85 date has genuine meaning: 85 items out of 100 similar ones have completed by this point. When data is unstable, the historical percentiles describe a distribution that is in the process of changing. The P85 date is calculated correctly from data that does not describe the current system.

The four-question stability test

These four questions — drawn from process behaviour chart rules — are a sufficient pre-forecast gate. If any answer is yes, the data may not be stable enough for reliable forecasting.

1. Any point outside the control limits in the last 20 data points? A point outside limits is the strongest signal that something has changed in the system. One such point warrants investigation before using the data for commitment-level forecasts.

2. Eight or more consecutive points on the same side of the mean? This indicates a shift in the process level — the system is running consistently faster or slower than the historical mean. The distribution has moved.

3. A trend of six or more consecutive increases or decreases? A consistent directional trend means the system is not in a steady state. Cycle times that are consistently rising or falling week over week will produce forecasts that are calibrated to the middle of the trend, not its current position.

4. Any population break events in the data window? Team composition changes, tooling changes, process redesigns — any of these can make the pre-event data invalid as a predictor of post-event behaviour, even if no single data point triggers the other three rules.

Two stability violations in the same chart: an upward trend (Rule 3, amber) and a point outside the upper control limit (Rule 1, red). Either alone fails the stability test.

What forecasting from unstable data produces

The failure modes are specific and consistent.

Wide bands that look precise. A wide confidence interval communicates uncertainty honestly — but only if it is wide because of genuine uncertainty in a stable system. An interval that is wide because the data is unstable looks identical. The two situations are indistinguishable from the output.

Systematic bias from trending data. If cycle times have been rising for six weeks, the historical P50 is lower than the current operating level. Forecasts generated from the distribution will consistently underestimate completion time — not randomly wrong, but wrong in a specific direction.

Overconfident P50s. When the process has shifted recently, recent data points have not yet been incorporated into the percentile distribution in sufficient quantity to shift the P50. The median looks stable when the system has already moved.

Same forecast format, same confidence levels. Stable data produces tight, credible bands. Unstable data produces a spread so wide the forecast conveys almost no useful information.

The correct sequence

Run the stability test before generating the forecast. This is not a bureaucratic step — it is the check that determines whether the forecast output is meaningful.

If the data passes the test: generate the forecast normally. Use it for commitments.

If the data fails the test: two options. Stabilise the process first — address the assignable cause, wait for the system to return to control, then forecast. Or, if stabilisation is not possible in the required timeframe, generate the forecast with explicit disclosure: the data shows instability, the confidence intervals are wider than they appear, commitments should be treated as indicative rather than reliable.

The correct sequence: stability test before forecast generation. Unstable data leads to either stabilisation first, or explicit disclosure before generating an indicative forecast.

Warning

A precise confidence interval built from unstable data is not a conservative estimate. It is a precise estimate of the wrong thing — and it will be treated as reliable by everyone who receives it.

The discomfort of telling leadership that the data is too unstable to forecast from is real. It requires explaining a limitation that was invisible before. The alternative — producing confident forecasts from invalid data — builds credibility in the short term and destroys it in the medium term, when the forecasts are consistently wrong and no one can explain why.

The stability test is not a bureaucratic gate. It is the question of whether your data is describing a system or a random walk.