Flow & Measurement

Percentiles, Not Averages: Why Your Forecasts Are Wrong

1 Oct 2025·7 min

Your average cycle time is 14 days.

That sounds useful until you ask a basic question: which piece of work took 14 days?

Usually, none of them.

One item took 3. Another took 8. Another took 19. Another sat in a queue for a month. The average compresses all of that into a number that feels precise while describing nobody's actual experience.

That is why averages are such a bad foundation for delivery forecasting.

The average tells a story the system did not live

Delivery data is not usually neat and symmetrical. It has long tails, awkward outliers, and enough variation that one tidy average can hide the thing leaders most need to know: how predictable the system really is.

If the median item completes in 8 days but the 85th percentile takes 22 and the 95th takes 47, your delivery problem is not "14 days." Your problem is that the system is highly variable and your commitments will be wrong if you pretend otherwise.

The average smooths that reality away.

Cycle time distribution across 40 items. The average (14d) lands near the peak — but P85 at 22d and P95 at 42d reveal a long tail your planning must account for.

Warning

An average cycle time is comforting because it hides uncertainty. That is exactly why it is dangerous.

What percentiles give you instead

Percentiles shift the conversation from false certainty to probability.

The 50th percentile tells you what typical performance looks like. Half the items finished faster, half slower.

The 85th percentile tells you what happens often enough to use for commitments. If you say you have 85% confidence an item like this will finish within 22 days, that is a statement a stakeholder can work with.

The 95th percentile is the outer edge. It is useful for planning under caution, setting expectations in risk-heavy contexts, or seeing how ugly the tail has become.

Now the conversation starts sounding like the real system.

Same average. Completely different delivery systems. P85 for Team A is 17 days. For Team B, it's 30.

How this changes forecasting

With averages, teams say things like, "This should take about two weeks."

That phrase is a trap. Nobody knows whether it means best case, typical case, or commitment case. It sounds reasonable because it is vague.

With percentiles, the same conversation gets sharper. "Typical is about 8 days. If you want a stronger commitment number, use 22. If this lands in the ugly tail, it can run much longer."

That is not more complicated. It is more honest.

And honesty is useful. Product can make better trade-offs. Stakeholders can choose whether they want speed, confidence, or optionality. Leaders can stop mistaking optimism for planning.

Why teams resist the shift

Because averages feel simpler.

They fit nicely on slides. They make summaries look clean. They let people talk as if the system is more stable than it is.

Percentiles do the opposite. They expose spread. They force the organisation to look at variation instead of hiding inside one blended number. For teams and leaders who are used to confident storytelling, that can feel uncomfortable at first.

It should.

The discomfort is the point. It is what happens when measurement starts describing reality instead of protecting a narrative.

The practical move

If you want better forecasts, stop putting averages at the center of the conversation.

Show P50, P85, and P95. Teach people what those numbers mean in plain language. Use P85 when the question is "When can we commit?" Use P50 when the question is "What is typical?" Use P95 when the question is "How bad can the tail get?"

Then look at the spread between them. A narrow spread means the system is predictable. A wide spread means you do not have one cycle time. You have a distribution, and the distribution is what you must govern.

Forecasting gets better the moment you stop pretending the system has one true number.

Averages are not evil. They are just too blunt for the job.

If leaders want a delivery conversation they can actually use, the move is simple: stop asking for the average and start asking for the probabilities.