Comprehension debt: the silent risk of AI-accelerated delivery

Gareth Williams

Lead Engineer

March 30, 2026

Gareth Williams

Lead Engineer

The gap between what your systems do and what your team actually understands

You built an app last week. It works. The tests pass, the CI is green, the demo went well. Now someone asks you to draw the sequence diagram on a whiteboard, and a cold shock runs up your spine.

You can’t do it. Not because you’re a bad engineer, but because you didn’t write most of it. An agent did. You reviewed the pull requests, you ran the tests, you eyeballed the output. But somewhere between the velocity and the green ticks, you lost the thread. You can’t trace a request from the API gateway through to the database and back. You know it works. You just don’t know how.

That’s comprehension debt.

Naming the thing

We’re all familiar with technical debt: the shortcuts you take today that cost you tomorrow. Comprehension debt is its quieter, more dangerous cousin. It’s the gap between what your systems do and what you, the human accountable for those systems, actually understand about how they do it.

This isn’t new. Every engineer has inherited a codebase they couldn’t fully reason about. But AI has changed the rate of accumulation. Writing code is easy in the age of LLMs. “Hey AI, build me an agent.” “Remake Theme Park for 2026.” Fancy stuff. The agents churn out features, the PRs stack up, the tools like CodeRabbit and LinearB help you manage the barrage. It’s wonderful. But the downside is that we see, remember, and understand less and less of what’s being produced.

And here’s the uncomfortable bit: this isn’t limited to code. With tools like Claude’s CoWork and the advent of AI-powered plugins and workflows, our comprehension of what’s happening across entire organisations runs the risk of deteriorating to potentially dangerous levels. Strategy documents authored by AI. Architecture decisions generated from prompts. Compliance frameworks assembled from templates. When something goes wrong, do you want the only available resolution to be “ask the AI”? Would that fill you with confidence?

Code is just where we notice it first, because we have tests, linters, and CI pipelines that make the gap visible. In other domains, the debt accumulates silently.

The chasm

There is a real gap between vibe coding and what I’d call agentic engineering. No small part of that gap is how you handle comprehension debt.

Vibe coding is the developer who prompts an agent, gets working code, ships it, and moves on. It works until it doesn’t. Agentic engineering is the practice of turning the AI in on itself repeatedly until it can diagnose its own problems. If the login UI isn’t working, you give the agent the Playwright MCP server to run the flow and debug. If CI failed, you hand it the GitHub CLI and have it check the most recent run. If integration tests failed in dev, you pass it an AWS profile (read only, unless you’re feeling like YOLOing it) and have it inspect the logs and diagnose.

The difference isn’t just about autonomy. It’s about building systems where comprehension is maintained by design, not assumed by default.

Specs help, but they don’t solve it

I wrote about prototype-driven development as an SDLC that gives agents the specs, tests, and design artefacts they need to produce reliable output. Spec-driven development helps with comprehension debt to a degree. You should have the broad strokes: a high-level design, maybe some sequence diagrams, an OpenAPI contract. That’s the starting point.

But it’s not enough. When you get an unhelpful 500 response on a request, do you know where to look? When a dependency update breaks an integration that was working yesterday, do you have any idea where to start? The spec tells you what was intended. It doesn’t tell you what the agent actually did to get there.

Breaking work into well-scoped specs also helps manage cognitive load, the related but distinct problem of how much complexity you can hold in your head at any given moment. Smaller, well-specified chunks mean less to manage during delivery. But cognitive load is about the moment. Comprehension debt is about the accumulation. You can manage your cognitive load perfectly and still end up three months later unable to explain how half your system works.

Plan, Delegate, Assess, Codify

So what do we do about it?

At its core, we’re going to have to be willing to slow down when we need to. The instinct in AI-accelerated delivery is to keep pushing, keep prompting, keep shipping. But speed without comprehension is just building faster on swampy ground.

A workflow I’ve been using across all generative AI activity, not just code, is Plan, Delegate, Assess, Codify. PDAC. It’s a loop:

Plan — Map the task and decide how to equip the AI. What context does it need? What files, skills, MCP servers, or agents should be available? What does “done” look like?
Delegate — Hand off to the right tool. Claude Desktop for research and documents. Claude Code for implementation. CoWork for automation. Nested agents for complex multi-step work.
Assess — AI-assisted review of the output against criteria. Analyse the claims, the assumptions, the prose, the quality. Check accuracy, completeness, consistency, maintainability. This is where comprehension debt gets paid down, if you’re disciplined about it.
Codify — Extract what you learned and encode it so the next cycle is faster and safer. Create a skill. Update a prompt. Add a test. Refine the spec.

PDAC applies to everything. Writing a document, building a presentation, reviewing an architecture, shipping a feature. The loop is the same. The discipline is the same.

What this looks like on the ground

For a lead or principal engineer working across architecture and delivery, PDAC might translate to a daily workflow something like this:

Review agent deliverables — Start the day assessing what the agents produced overnight. Not just “does it pass the tests” but “do I understand what it did and why.” Tools like CodeRabbit are genuinely useful here. They generate sequence diagrams and change explanations, they can be customised with prompting instructions to flag specific patterns, and they help you step through complexity rather than wave it through.
Refine and fix — Use AI with a human in the loop. If something’s wrong, don’t just re-prompt blindly. Understand the failure, then direct the fix. This is where comprehension debt either gets paid down or compounded.
Codify improvements — Create skills, add nuance and detail to existing ones, add tools to the agentic system. Every fix is an opportunity to make the next cycle better.
Plan and spec tonight’s autonomous activities — Research, high and low-level design, prototyping. Use enterprise search and deep research tools to ground the specs in reality. Write the OpenAPI contracts, the sequence diagrams, the pseudocode. This is the investment that determines tomorrow’s output quality.
Delegate to the coding agent — Hand off the well-specified work for overnight execution.

This isn’t glamorous. It’s not “10x developer prompts AI and ships a startup in a weekend.” It’s a disciplined loop of understanding, directing, and encoding. But it’s the loop that compounds.

Tactics that help

Beyond the workflow, a few specific practices have helped me manage comprehension debt during delivery.

Having the agent chunk its output logically makes a real difference. Each PR should be a reasonable size, with multiple useful commits you can step through to see how the feature was built. I’ve also experimented with waterfall PRs, where PR #3 merges into PR #2, which merges into PR #1, which merges into main. It lets you crush multiple tickets in sequence while still having sufficient history to step through the code as it builds.

Rainbow releases, where each PR has a built and running environment you can test against, both automated and manual, are another big help. You can catch integration issues in isolation rather than discovering them when everything converges.

There’s also a mechanical defence worth setting up early. Cyclomatic complexity, cognitive complexity, and maintainability index aren’t new metrics, but they become critical when AI is generating the bulk of your code. Tools like ESLint (with plugins like eslint-plugin-sonarjs) and SonarQube can enforce thresholds on all three. Configure your CI pipeline to fail when a contribution, human or AI, pushes any of these past acceptable limits. The effect is straightforward: the agent is forced to simplify. It can’t ship a 300-line function with fifteen branches and call it done. It has to decompose, refactor, extract. The code it produces ends up closer to something a human can actually read and reason about, which is the entire point. You’re not just measuring complexity; you’re using the build pipeline as a forcing function for comprehensible output.

On the documentation side, tools like Google Code Wiki are worth watching. It pulls useful information from a repo, highlights patterns, generates component views and sequence diagrams, and, in a nice touch, creates a podcast that talks you through the codebase. It’s not available for private repos yet, but the direction is clear. If you can’t wait, you could achieve something similar with AI-generated markdown, Mermaid diagrams, and open-notebook for audio walkthroughs.

The most valuable person in this workflow is not the one writing thousands of lines of code per day. It’s not even the one reviewing the pull requests, though that’s critical. It’s the person who wrote the spec that prevented the code from being wrong in the first place, and who then reviews the output with enough understanding to catch the things tests can’t. That combination of upstream specification and downstream comprehension is the skill set that separates agentic engineering from vibe coding.

The debt always comes due

AI is going to let some people YOLO it for a while. Ship fast, skip the specs, wave through the PRs, trust the green ticks. They’re building on swampy ground, and for a time it’ll hold.

But comprehension debt always comes due. Knight Capital lost $440 million in 45 minutes because a dormant trading algorithm from 2003 was accidentally reactivated during a deployment. Nobody understood what was still lurking in the codebase. Theranos built a $9 billion valuation on technology that didn’t work, because the people overseeing it couldn’t challenge what they couldn’t understand. The 2008 financial crisis was, at its core, a comprehension debt crisis: financial instruments so complex that the people selling them couldn’t explain how they worked, and when the debt came due, it took the global economy with it.

Different domains, same pattern. Velocity outran understanding, and when the gap was finally exposed, the consequences were catastrophic.

Comprehension debt will be the primary determinant of AI success. Not capability, not cost, not speed. The teams that succeed will be the ones disciplined enough to know when to slow down, when to tighten the leash, and when to invest in understanding over output. The ones that treat comprehension as a first-class concern, not an afterthought.

The ones still shipping code they can’t explain? They’ll plateau. And they’ll blame the AI.