Systems & Transformation

Five Stages Your Delivery Process Passes Through Before It Breaks

24 Feb 2026·8 min

The team's metrics looked acceptable for eight months. Cycle time was within range. Throughput was holding. Then three consecutive sprints missed badly, a key release slipped, and leadership called an emergency review.

The instability had been building for six of those eight months. It was visible in the data. Nobody looked at the right signals.

Delivery systems do not break suddenly. They move through stages. Each stage has a recognisable data signature and a right action. The teams that maintain consistent delivery are not the ones who react fastest at the crisis point. They are the ones who recognise stage two and act there.

The five stages

The five delivery system stages and their cycle time distribution signatures. Each stage has a distinct shape — and a distinct right action.

Stable. Cycle time distribution is narrow and consistent. Throughput is predictable week over week. The constraint is identifiable — there is one place where items spend the most time, and it is understood. Forecasts based on historical data are reliable. The right action at stable is to maintain what you have, make small deliberate improvements, and protect WIP discipline aggressively. Stability is not a resting state — it requires active management. Teams that stop watching it at stable tend to slide toward migrating without noticing.

Migrating. The constraint has shifted. A recent improvement made one stage faster, and now a downstream stage is absorbing the load. Cycle time is temporarily wider than usual — items that hit the new constraint age longer, while items that avoid it flow quickly. The distribution is bimodal or has a heavier tail than before. The right action at migrating is to recognise this as progress. The previous constraint is genuinely resolved. Apply the intervention hierarchy to the new constraint location. Do not mistake migration for regression.

Volatile. Multiple constraints are active simultaneously. The cycle time distribution is wide and irregular — items complete at very different speeds with no consistent pattern. Forecasts are unreliable. The data shows high WIP, multiple stages with elevated age, and throughput that varies significantly week to week. The right action at volatile is aggressive WIP reduction — stop new starts, finish what's in flight, allow the queues to drain. Adding new work to a volatile system makes all the queues worse. This is the stage where the instinct to add capacity is strongest and most counterproductive.

Metastable. Surface metrics look acceptable. Average cycle time is within historical range. Throughput is holding. RAG status is amber, not red. But the system is fragile — it is operating near its stability boundary. One disruption (a person leaving, a dependency delaying, a scope change) will tip it into volatile. The data signals are subtle: a slight upward trend in cycle time, a few items aging longer than usual, WIP creeping up quietly. The right action at metastable is preemptive WIP reduction and investigation of the hidden fragility. This is the hardest action to take because it requires acting on weak signals before the crisis materialises.

A metastable system: the mean looks acceptable, but a consistent upward trend and points approaching the control limit signal fragility before the crisis materialises.

Unstable. Items are aging indefinitely. Some have been in flight for weeks or months. Throughput has collapsed. The team is in constant firefighting mode — every day is about the most urgent thing rather than the most valuable thing. The right action at unstable is crisis intervention: freeze intake completely, drain existing WIP systematically, and rebuild the system from a stable state. This is disruptive. It requires explaining to stakeholders why new work is not starting. It is also the only path back.

The most dangerous stage

Metastable is more dangerous than unstable, because unstable is visible and forces action. Metastable produces false confidence.

Velocity holds. Burndown charts look approximately right. The team is busy. The numbers — the comfort metrics — confirm that things are fine. Underneath, the system has less margin than it appears. The small signals that a stable system produces are present but easy to dismiss individually: this item took longer than expected; that week was slower than usual; these two engineers are stretched a bit thin.

Warning

Metastable is the most dangerous stage because it feels like stable. The data is technically acceptable. The system is one incident away from collapse.

The distinguishing feature of metastable is not any single data point but a pattern: small deteriorations that are individually explainable but collectively form a trend. A process behaviour chart running near its control limits without triggering them. WIP that is high but not obviously excessive. Forecasts that are slightly less reliable than they used to be.

The right action at each stage

Acting on the wrong problem is worse than doing nothing. Adding capacity to a volatile system increases WIP. Treating a metastable system as stable accelerates the slide. Declaring crisis at migrating undermines a genuine improvement.

The common instinct at each stage and the correct action. Acting on the wrong problem is worse than doing nothing — it consumes effort while the actual cause goes unaddressed.

The stage determines the correct response. Identifying the stage requires looking at more than the headline metric. Average cycle time and velocity tell you very little about which stage you are in. Stage-level age distribution, WIP trend, and throughput variability are the signals that locate you on the spectrum.

Every unstable system was metastable first. Most teams only notice at unstable.

The delivery systems that remain stable over time are not lucky. They are the ones whose teams know what metastable looks like and have the discipline to act on it before the crisis makes acting unavoidable.