Prototype driven development — an SDLC for the AI age

Gareth Williams

Lead Engineer

March 18, 2026

Most teams I talk to are doing the same thing: bolting Claude Code, Copilot or Cursor onto their existing Scrum workflow and calling it AI transformation. It’s the path of least resistance. It’s also a trap.

You end up with agents churning out code against vague acceptance criteria, PRs stacking up faster than anyone can review them, and a codebase that nobody, least of all humans, can reason about. Without specs, agents go wild. Like some sci-fi Lord of the Flies, except instead of kids with sharpened sticks it’s LLMs hallucinating architecture decisions at 3am.

There’s a better model. But it requires rethinking something sacred.

The Agile Manifesto‘s blind spot

The Agile Manifesto was written in 2001 by humans, for humans. The twelve principles are sound: customer focus, frequent delivery, sustainable pace, reflection. I’ve got no beef with any of them. But two of the four core values are starting to crack under the weight of what AI actually changes about software delivery.

“Individuals and interactions over processes and tools.”

“Working software over comprehensive documentation.”

When humans write all the code, these make sense. Talented people collaborating beats rigid process. A working demo beats a 200-page spec. But when agents deliver the code? The process, the tools, the harness. That IS the craft now. And documentation isn’t overhead, it’s the input that determines output quality.

McKinsey’s Martin Harrysson and Natasha Maniar make a similar case: teams stuck on marginal gains are the ones layering AI onto legacy Agile workflows. The teams breaking through are redesigning the workflow entirely: continuous planning, spec-driven development, smaller pods of 3–5 people orchestrating agents end-to-end. The structural shift, not the tooling upgrade, is where the gains live.

Agents need specs the way humans needed whiteboards

Here’s the thing about “working software over comprehensive documentation”: it was a reaction to a real problem. Waterfall-era teams spent months writing specs that were outdated before anyone wrote a line of code. The manifesto was right to push back on that.

But the new tools we have at our disposal have new needs. We now have teams giving agents a Jira ticket with two sentences of acceptance criteria and a link to a Confluence page nobody’s read since 2024, then wondering why the output needs heavy rework. Loose specs don’t just slow agents down. They actively mislead them. An agent working from a vague requirement will confidently produce something that looks right, passes a basic sanity check, and is architecturally wrong in ways you won’t catch until integration.

The same applies to “individuals and interactions.” If the future of engineering is about long-running autonomous agents, then the success of those agents is determined by the harness: the process and tools that govern how the agent breaks down problems, gathers context, selects tools, and knows when it’s done. The craft isn’t writing code anymore. It’s engineering the system that writes the code.

So if documentation is now the primary input to production, and process is now the primary determinant of quality, why are we still treating them as the lesser half of two value statements?

The shift left nobody’s talking about

Shifting left isn’t new. We’ve been moving static analysis, security scanning, and performance testing earlier in the pipeline for years. But why stop there?

Imagine your design team hands over not just mockups, but working code with passing end-to-end tests. End to end flows, mocked up with hardcoded data. Not a Figma file that says “final_v3_FINAL” in the filename with some red-lines and a prayer. Actual running pages, built using the prod UI tech stack, tested with real users, automated, and signed off before a single line of backend code exists.

That’s prototype-driven development. And it’s a more radical shift-left than anything we’ve done with linting or SAST.

Here’s how it works. A creative technologist (a frontend developer embedded in the design team) builds the prototype as code. Not wireframes. Not clickable PNGs. Real components, real pages, real flows. They iterate on those with users, write end-to-end tests against the happy and sad paths, add animations, refine interactions. The prototype IS the user testing artefact and the test suite simultaneously.

Once user-tested, automated, and signed off, the handoff happens. The engineering team receives the coded prototype, which becomes the TDD baseline.

Hopefully, in that time the tech team have developed:

  • A low-level design: data models, sequence diagrams, class diagrams, state diagrams, flow diagrams, pseudocode
  • An OpenAPI spec for the API contract
  • Prototypes of their own for the complex or experimental unknowns in their stack

The engineering team and their agents then integrate the prototype’s pages and templates from an NPM package into the production codebase, hook up the API integrations, and build the backend. The agents work from the LLD and build against the prototype’s test suite. TDD, but where the tests were written by designers and validated by users, not reverse-engineered from a figma prototype.

The manifesto says we should value working software over comprehensive documentation. Fine. Let’s use TDD to make working software the specification.


flowchart TD
    A[Discovery & Ideation] --> B[Design Team + Creative Technologist]
    B --> C[Coded Prototype]
    C --> D[User Testing & Iteration]
    D --> C
    D --> E{Signed Off?}
    E -->|No| C
    E -->|Yes| F[Handoff Artefacts]
    F --> G[Coded Prototype\ne2e Test Suite]
    F --> H[OpenAPI Spec]
    F --> I[Low-Level Design\nSequence · Class · State\nFlow · Data Models]
    G --> J[Engineering Team + Agents]
    H --> J
    I --> J
    J --> K[Integration\nBackend · APIs · Infrastructure]
    K --> L[Agent builds against\nTDD from prototype tests]
    style A fill:#f0f0f0,stroke:#333
    style B fill:#e8d5f5,stroke:#333
    style C fill:#e8d5f5,stroke:#333
    style D fill:#e8d5f5,stroke:#333
    style F fill:#ffeaa7,stroke:#333
    style G fill:#ffeaa7,stroke:#333
    style H fill:#ffeaa7,stroke:#333
    style I fill:#ffeaa7,stroke:#333
    style J fill:#dfe6e9,stroke:#333
    style K fill:#dfe6e9,stroke:#333
    style L fill:#55efc4,stroke:#333

Why this actually works

The prototype’s test suite is the killer feature here. When an agent builds autonomously, it isn’t working from a vague ticket. It has a spec to follow, a low-level design to reference, and, critically, a set of end-to-end tests that were written during discovery and validated by real users. The agent’s acceptance criteria aren’t scrawled in a Jira comment. They’re executable.

That’s a fundamentally different proposition. Deterministic-ish output. Faster feedback. Higher confidence. The agent runs the tests, the tests fail or pass, and you know immediately whether the integration works against the signed-off prototype. No more “it compiles and the unit tests pass but nobody’s checked whether the actual user flow still makes sense.”

This only holds if the spec is right, obviously. Bad specs produce confidently wrong code whether a human or an agent is holding the keyboard. That’s why the design and engineering teams aren’t working in isolation. They’re demoing to each other throughout, collaborating on feasibility, signing off that the prototype’s flows are achievable given the backend architecture. The OpenAPI spec and LLD aren’t afterthoughts, they’re the contract both sides agree on before handoff. You’d rather catch a disconnect between frontend assumptions and backend reality in a prototype review than in the last week of delivery.

It also changes what review looks like. Instead of a senior engineer spending hours reviewing AI-generated boilerplate, they’re reviewing architecture decisions and API contracts, the stuff that actually matters. The prototype already validated that the UI works. The tests already validated that the flows are correct. What’s left is the engineering judgement that agents genuinely can’t do yet. It’s great outcome for teams struggling to managing cognitive burden. The final hurdle remains comprehension debt (the next post in the series).

The manifesto said it first

Here’s the bit that most people forget. The twelfth principle of the Agile Manifesto reads: “At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.”

That’s what this is.

We’re not abandoning Agile. We’re honouring its actual intent, the bit about adapting when the world changes, by updating what “working software” and “individuals” mean when the individuals include agents and the working software starts as a tested prototype.

The manifesto was never meant to be permanent. It was meant to be a starting point for teams willing to keep improving. Twenty-five years later, the world it was written for has changed. The teams that recognise this and redesign their SDLC accordingly won’t just ship faster. They’ll ship things that actually work.

The ones still running 2015 Scrum with a Copilot plugin bolted on? They’ll plateau. And they’ll blame the AI.

Share