Flow & Measurement

When Your Historical Data Stops Describing Your Current System

21 Dec 2025·7 min

For six months after the AI tooling rollout, the team's P85 forecasts were consistently wrong. Not by a little. Items that the data said would complete in 14 days were taking 7. Items forecast to take 7 days were taking 12.

The methodology was correct. The percentile calculations were right. The data they were calculated from described a system that no longer existed.

What makes historical data valid for forecasting

A percentile forecast is a statement about a population: given items drawn from this system, X% of them will complete within Y days. The statement is only valid when the items being forecast are drawn from the same system as the items that generated the data.

This property is called population homogeneity. It is the unstated assumption underneath every cycle time forecast — and it fails silently.

When teams forecast delivery using six months of historical data, they are asserting that the system operating today produces items that behave like the items from six months ago. If that is true, the forecast is valid. If the system has changed significantly, the forecast is measuring the past and calling it the future.

Three events that break homogeneity

Not every change breaks the measurement population. Small team shifts, minor process tweaks, and individual tool changes typically produce variation within the existing distribution. Three categories of change reliably break it.

Significant team composition change. When a large proportion of the team changes — through hiring, departure, or reorganisation — the system's capability profile changes with it. Cycle time distributions reflect team skills, review depth, testing approach. A different team is a different system.

Tooling shifts that change how work is done. AI coding tools are the clearest current example, but any tool that meaningfully changes the time distribution of a major stage creates a new population. The old data describes the pre-tool system. Items generated in the new system will behave differently.

Process redesign that changes flow structure. Adding a stage, removing a gate, changing the definition of done, or restructuring how work moves through the system all change what the data is measuring. A cycle time calculated under the old workflow is not comparable to one calculated under the new one.

After the system change, actual cycle time dropped while old-calibrated forecast bands stayed high. The gap widened every week — accurate methodology, invalid data.

How to detect the break

Population breaks are detectable before they become damaging — if you know what to look for.

Process behaviour chart signals. A point outside control limits is the clearest indicator that something has changed. A run of eight consecutive points on the same side of the mean suggests a shift in the process level. These signals appear in cycle time data within weeks of a significant change if you're watching.

Percentile tiers that start separating. P50 and P85 moving in different directions is a sign of distributional shape change, not just level change. The spread is widening or narrowing in ways the old model does not predict.

Forecasts that miss consistently in one direction. If your P85 forecast is consistently too high (items finishing faster than predicted), you are forecasting from an old, slower system. If it is consistently too low, the system has slowed. Either way, the data has lost its validity as a predictor.

The old and new system have different distributions. Forecasting the new system using old data applies the left distribution to items drawn from the right — systematic error.

The reset protocol

Once a population break is identified, the old data is no longer usable as a forecast input. The correct response is to declare a new baseline and accept a period of reduced forecast confidence while new data accumulates.

Declare the break explicitly. Note the date, the cause, and the implication: forecasts from before this date are calibrated to a different system. Do not mix pre- and post-break data without modelling the discontinuity explicitly.

Accumulate new data before committing to forecasts. A reliable percentile distribution requires a minimum of twenty to twenty-five data points from the new system. With fewer points, the percentile estimates are noisy and should be treated as rough guides, not commitments.

What to tell stakeholders during the gap. Teams often avoid declaring the break because they fear losing the credibility that comes from having a forecast. The alternative — producing confident forecasts from invalid data — is worse. An honest framing: "We changed the system significantly. Our historical forecasts were calibrated to the old system. We are accumulating new data and will have reliable forecasts again in approximately [N] weeks."

This is uncomfortable. It is also the only response that doesn't compound the problem.

The transition period after a system change: old baseline is invalid, new data accumulates slowly. Forecasts during the gap carry explicit uncertainty until ~20 new data points exist.

Warning

A forecast is only as valid as its population assumption. Old data from a changed system doesn't give you a conservative estimate. It gives you a wrong one with the appearance of rigour.

The teams that maintain forecast credibility over time are not the ones with the most data. They are the ones who know when their data has stopped describing their system and act accordingly. The measurement gap is not a failure. Using invalid data to avoid it is.

The data wasn't lying. It was describing a system that no longer existed.