GenAI / AI Governance

AI Generates Faster Than You Can Review. That's Your Real Constraint.

28 Dec 2025·7 min

Three AI-assisted developers are generating pull requests at four times their previous rate. They have three reviewers. The maths doesn't work.

This is the review queue problem, and it's structural. It doesn't arise from bad process or insufficient effort. It arises from a fundamental asymmetry between how generation and review scale — an asymmetry that AI adoption has made acute.

The asymmetry nobody planned for

Code generation, especially with AI assistance, is effectively parallelisable. Three developers working simultaneously produce three times the output of one. With AI tools, each of those developers may produce three or four times what they generated before. The combined output rate of a small team can increase dramatically in a short time.

Code review is not parallelisable in the same way. A reviewer must read the code, understand its intent, consider its implications for the surrounding system, and form a judgment. This is a fundamentally serial cognitive task. Adding reviewers increases total review capacity, but each reviewer operates independently — and each requires context that doesn't transfer easily.

More importantly, the reviews that matter most — the ones involving architecture, correctness, security, or non-obvious system interactions — can only be done by senior engineers who already hold the relevant context. You can add junior reviewers. You cannot quickly add senior ones. The expertise-gated portion of the review workload serialises through a small group regardless of headcount.

This is Amdahl's Law applied to software delivery. When the parallel portion of a pipeline speeds up, the serial portion becomes the binding constraint. The faster generation gets, the more review dominates total cycle time.

Code generation is parallelisable. Review is serial and expertise-gated. When generation speeds up, review becomes the binding constraint.

How review queues saturate

The failure mode is self-reinforcing, and it moves through a predictable sequence.

PRs arrive faster than they can be reviewed. The queue depth increases. Each PR waits longer before anyone looks at it. While waiting, the author has moved on to the next piece of work — context-switching away from the original change. When the review finally comes back with feedback, returning to the work is expensive. The author's mental model of the change has faded. Responding to review comments takes longer than it would have if the review had arrived the same day.

Meanwhile, reviewers are facing a larger queue. More items means more context-switching between unrelated changes. Review quality degrades under load — not from carelessness but from cognitive saturation. Items that would have received careful attention are approved quickly to clear the backlog. Or the opposite: reviewers become increasingly conservative, requesting more changes to compensate for uncertainty, extending the cycle further.

Warning

Review latency compounds. A PR that waits three days before review carries a higher integration cost than one reviewed in three hours — not because the code changed, but because the author's context did. The queue doesn't just slow delivery. It adds rework.

Review queue depth before and after AI adoption. Pre-AI demand stays within capacity. Post-AI demand exceeds it from week two onward.

The endpoint of an unsaturated review queue resolving itself is not a stable equilibrium. The faster generation rate keeps feeding the queue. Without intervention, the queue grows until it either causes a quality crisis, a delivery slowdown visible enough to force attention, or both.

Why adding reviewers doesn't solve it

The instinct is to hire. If the review bottleneck is a headcount problem, more headcount should fix it.

It's not primarily a headcount problem.

Junior reviewers can absorb some of the load — style issues, obvious errors, documentation gaps. But the expensive reviews, the ones that take forty-five minutes of careful reading and carry real risk if done poorly, belong to a small group of engineers with deep system knowledge. Doubling headcount doesn't double the capacity of that group. It may not increase it at all.

Even for the reviews that juniors can handle, the volume may require a level of hiring that's unrealistic in the short term. And by the time those hires are productive, the generation rate may have increased further.

Hiring at the constrained stage makes sense as part of a solution. It doesn't work as the whole solution.

Amdahl's Law applied to delivery: if review represents 30% of the pipeline and cannot be parallelised, a 4× coding speedup yields only ~2× system-level improvement.

The three interventions, in order of leverage

Throttle generation to match review capacity. Set an explicit WIP limit on items in the review state. When that limit is reached, developers stop starting new work and direct their attention to clearing the review queue instead — reading, commenting, helping resolve review feedback. This is not a comfortable ask. It means productive developers sitting with less in-flight, and the feeling of wasted capacity.

The alternative is worse: a growing pile of items in review, each accumulating integration debt, making future merges harder, and adding hidden cost to work that looks nearly complete. WIP control at the review stage makes the constraint visible and manageable. Without it, the constraint is invisible and growing.

Invest in review infrastructure. Once WIP is controlled and the queue is stable, the capacity investment becomes legible. Now you can see what you need: how much senior review time, what tooling could reduce the cognitive load per review (AI-assisted review summaries, better diff tooling, smarter assignment routing), what conventions would make individual PRs faster to review. This is where the capacity investment goes once WIP discipline has established a baseline.

Change the structure of what gets reviewed. Smaller PRs reduce the serial load per review unit more reliably than adding reviewers does. A pull request covering one behaviour change with a clear description can be reviewed in fifteen minutes. A sprawling multi-feature PR takes an hour and yields less confident approval. Trunk-based development, feature flags, and explicit small-PR conventions all address the same root problem: the unit of review is too large and too infrequent, which is what makes each one expensive.

These three interventions compound. WIP control reduces queue depth and review latency. Review infrastructure reduces time-per-review and increases senior capacity. Smaller PRs reduce the cognitive load on both. Applied together, they change the economics of the constrained stage rather than just adding resources to it.

Where to invest when AI changes the cost equation

Before AI adoption, the highest-leverage investment in a development team was usually in the coding phase itself — better tooling, better practices, reducing the time it took to write and refactor code. That was the dominant cost.

That's no longer true for most teams.

The coding phase is no longer where the time goes. Review is where the time goes — in queue latency, in context-switching cost, in rework from rushed approvals. The investment case for improving review is now what the investment case for improving coding was five years ago: it's where the constraint lives, and addressing it is how delivery speed actually improves.

The team didn't get slower. Review was always this slow. The difference is that now it's the constraint.

Teams that recognise this and invest in review capacity, review hygiene, and WIP discipline at the review stage will be the ones where AI adoption actually improves delivery speed. The teams that don't will keep adding coding throughput into a saturated queue and wonder why all this new tooling didn't help.

The review queue is the last bottleneck between AI-assisted coding and faster delivery. It's also the most tractable one, because it responds to the same tools that have always worked on constrained stages: reduce WIP, invest in capacity, reduce the unit size. The answer is the same. The stage is different.