GenAI / AI Governance

Specification-Driven Delivery: Governing What AI Builds

25 Nov 2025·10 min

AI changed the economics of software delivery very quickly.

Code used to be the expensive part. Now candidate code is cheap.

That sounds like an unqualified win until you look at where the pressure moved. Requirements still have to be interpreted. Boundaries still have to be respected. Tests still have to prove the right thing. Review still has to determine whether the change is safe, not just plausible. Accountability still has to sit somewhere real.

That is why the next question in software delivery is not "How do we prompt better?"

It is "How do we govern what gets built when generation is no longer the bottleneck?"

The bottleneck moved upstream

Most teams adopted AI tools as if faster production would simply compress the existing workflow.

Instead, it exposed the weak parts of the system.

If product intent is vague, AI makes the vagueness operational faster. If engineering standards exist mostly in tribal memory, AI runs into them too late. If review is the only serious control point, review becomes overloaded because it is now expected to absorb a much larger volume of output.

The result is predictable: more code, more movement, and a growing gap between how quickly the organisation can produce changes and how confidently it can accept them.

Insight

When code gets cheap, the scarce resource becomes trustworthy change.

Why specification matters again

Not because we need heavier paperwork.

And not because the answer is to turn software delivery into a requirements factory.

Specification matters because organisations need more durable ways to carry intent through the workflow than chat history and reviewer memory can provide. Teams need a clearer statement of what the change is for, what it must not violate, what counts as acceptable evidence, and who is responsible for judging whether the result is good enough.

That does not require one universal template. It does require treating intent as something the delivery system can work from repeatedly instead of something reconstructed after the fact from prompts, diffs, and good luck.

This is a governance problem

That phrase matters.

If you frame the whole conversation as a prompting problem, leaders will optimise prompts. If you frame it as a tooling problem, they will buy more tooling. If you frame it as a developer productivity problem, they will count output.

None of those frames is wrong. They are just too small.

The harder problem is organisational. Where should constraints live? What should be checked automatically? What must be made explicit before generation begins? What evidence should travel with a change? How much human judgment belongs before merge, and how much after? Which work deserves tight control, and which work does not?

Those are governance questions, even when the answers show up in engineering practice.

What public-facing good looks like

At a practical level, well-governed AI-assisted delivery usually has a few characteristics in common.

Teams make intent legible before the implementation starts. The important boundaries are visible rather than implied. There are credible checks in the workflow that do not rely on a reviewer remembering every local rule. Review focuses on whether the change matches what the organisation actually wanted, not only whether the code looks tidy. Monitoring is good enough to tell when a plausible change was still wrong.

None of that is glamorous. It is also the difference between impressive demos and repeatable delivery.

What it is not

It is not a claim that every change needs a heavyweight specification.

It is not a claim that AI can be made safe by process alone.

It is not a claim that templates are more important than engineering judgment.

And it is definitely not a return to old-school bureaucracy dressed up in current language.

The point is narrower and more practical. Once AI becomes a real participant in how software is built, organisations need better ways to connect intent, constraints, evidence, and accountability than most of them use today.

The shift leaders need to make

Leaders should stop asking whether AI can produce code and start asking whether the surrounding system can absorb the consequences.

Can the team explain what the change is meant to do? Can it show what was checked? Can reviewers judge the outcome without rebuilding the whole story from scratch? Can the organisation trace responsibility when something goes wrong? Can it tell the difference between fast output and validated progress?

Those questions are less exciting than the demo. They are also the ones that determine whether AI improves delivery or simply accelerates confusion.

The real unit of progress in the AI era is not code produced. It is accepted, trustworthy change.

This article is the public version of the argument.

The fuller operating model goes deeper into how teams can govern AI-assisted delivery without collapsing into ceremony. That is the work behind the book.