Agentic engineering – the shift nobody’s ready for
Gareth Williams
Lead Engineer
Gareth Williams
Lead Engineer
Last month, I sat in two meetings at the same client. In the first, someone described how they aim to view their AI agents as “employees”, clones to work alongside and under the guidance of their existing employees – similar capability expectations, similar RBAC permissions. In the second, an engineering manager told me flatly that they didn’t trust spec-driven development, didn’t want engineers managing autonomous agents, and would rather keep “in the loop” prompting Claude with the syntactic sugar of agents, skills and MCP for additional convenience. Same company. Same week. Two entirely different futures being imagined in adjacent rooms.
This is the gap that “agentic engineering” exists to close. Not a tooling upgrade. A fundamentally different way of thinking about who does the work, how it gets governed, and what the human’s job actually is when the AI is holding the keyboard.
What we’re actually talking about
Andrej Karpathy coined the term on February 4th, exactly one year after his “vibe coding” tweet went viral. His framing is sharp: “agentic” because you’re not writing the code directly 99% of the time, you’re orchestrating agents who do. “Engineering” because there’s an art, science and expertise to it. It’s a discipline you can learn and get better at.
He’s right, but his definition stops at the IDE. The real shift is bigger than software development. Code just happens to be where we noticed it first, because we have tests, linters and CI pipelines that make the gap between “what the AI produced” and “what we actually needed” painfully visible. In enterprise work – strategy documents, architecture decisions, compliance frameworks, procurement assessments – the same gap exists. It’s just silent. No test suite screams at you when an AI-generated risk assessment misses a critical assumption.
Software engineering is the canary in the coal mine. Everything happening in the SDLC right now is coming for every knowledge-work function in the enterprise. The question isn’t whether. It’s whether you’re paying attention.
The wave you’re riding (or drowning in)
Steve Yegge’s Revenge of the Junior Developer maps six overlapping waves of AI-assisted programming, from traditional coding through completions, chat-based coding, coding agents, agent clusters, and agent fleets. It’s seems spot on to me. I feel we’re in the era of the coding agents, seeing glimpses of agent clusters in Anthropic’s agent teams.
To be clear, Agentic engineering is not vibe coding. Vibe coding isn’t a step on this journey. It’s a scenic detour into a ravine. The “turn your brain off and let the model cook” approach works brilliantly for throwaway prototypes and side projects nobody will maintain. For anything with users, compliance requirements, or a production environment, it’s the engineering equivalent of texting while driving – exhilarating right up until the moment it isn’t.
As Addy Osmani puts it: vibe coding equals YOLO. Agentic engineering equals AI does the implementation, human craft the harness, owning the architecture, quality and correctness. The terminology itself enforces the distinction. It hints at a sprint zero focused on designing a coding agent as well as a project. Agentic Engineers are in the loop, operating as the principal engineer reviewing, comprehending and evaluating the code of an army of AI engineers and, in effect, training them. Refining the harness and iterating on agent context.
The progression that actually matters is AI-assisted engineering to agentic engineering. It’s a shift in who holds the pen. And it has consequences that ripple well beyond the codebase. The AI giants will be launching their skills marketplaces for enterprise users soon – the AI app store moment is imminent.
What it actually looks like
Here’s a concrete example. We built PixieOps, an AI agent for workforce planning, at Versent. The first version took months of AI assisted delivery. Working on small features, task level human direction, reading every generated line, then code review, specifying to patterns, approaches and integrations. I’d call this 1.5x productivity, 2 at a push.
When we rebuilt it using an agentic approach, that changed. I used a long-running agent harness, similar to the one Anthropic describe in their November blog on autonomous coding. I refined it with human-in-the-loop gateways, richer context injection, deeper specification, test expectations both strategically and at the task level, test driven development drove the execution phase, enriched with a definition of ready and done, a validator agent reviewed code, ran tests and checked completeness against the DoD. The agent generated the spec from the existing application, built the CI pipeline, wrote the Terraform, produced the application code, deployed to the dev environment in AWS and tested it against a set of evaluations – effectively, it migrated us to Agentcore and S3 Vector buckets. Greenfield. Overnight.
I didn’t write the application. I architected the system that wrote the application. The harness, the HITL gateways, the context, the specs, the task list, the guardrails. That’s the job now. I’ve gone on to do similar with the likes of Ralph Wiggam, which blew up in January, adding similar attributes to my own experimental creation, Professor Frink.
Yegge claims junior developers will adapt faster than seniors because they will enter the workforce as AI natives with less to unlearn. He’s right, I’ve seen that trend going way back to the years of Internet Explorer 6 (even now, just mentioning it sends a shock of dread down my spine).
The pattern extends beyond code. Every engineer is about to be promoted to a manager, ready or not. Your agents are the team. Fast, cheap, tireless, and about as reliable as a keen but chaotic graduate who needs constant supervision. Treat them accordingly – with clear briefs, defined acceptance criteria, and regular review – and they’ll outperform expectations. Let them loose without structure and you’ll be debugging at 3am wondering how the IaC ended up provisioning resources in a region you’ve never heard of.
The craft nobody’s teaching
“Working software over comprehensive documentation.” The Agile manifesto assumed humans write the code. When the AI writes it, documentation isn’t overhead – it’s the primary interface between human intent and machine execution. Specifications aren’t bureaucratic tax anymore. They’re the means of production. The game changed. The manifesto didn’t.
This is where the skill shift from syntax to systems thinking gets concrete. The craft of agentic engineering is specification depth. Detailed low-level designs. Rich markdown files with PlantUML and Mermaid diagrams: event models, state diagrams, sequence diagrams, flow charts. Tech stack decisions, design patterns, pseudo-code. OpenAPI specifications. Conceptual, logical, and physical data models. Documented semantic layers, ontologies, class diagrams. The artefacts that Agile told us to deprioritise are now the primary input to autonomous agents. Collectively, they give agents enough context to execute between human checkpoints – and give humans enough understanding to debug at 3am when something breaks.
Then there’s the orchestration layer. MCP servers that give agents structured access to your systems. Skills, hooks, and commands that encode repeatable behaviours. Agent configurations that define roles, scoped context, and handoff points. Harnesses that sequence agents through tasks with a validator agent reviewing output, running tests, and checking completeness against a definition of done. This is the engineering in “agentic engineering.” It’s not prompting. It’s architecture.
Context management is the differentiator most people miss. Agent context windows rot – irrelevant information accumulates, signal degrades, output quality decays. Good agentic engineering uses context isolation to scope each agent’s view to exactly what it needs. Progressive disclosure feeds context in stages rather than dumping everything upfront. Get this wrong and your agent starts hallucinating connections between unrelated system components. Get it right and you can run long autonomous loops without the quality cliff.
Code quality guardrails deserve their own mention here. Cyclomatic complexity, cognitive complexity, and maintainability index are metrics most teams already have access to but rarely enforce with any teeth. When agents are generating code autonomously, these become non-negotiable gates. Configure ESLint with plugins like eslint-plugin-sonarjs, or wire SonarQube into your CI pipeline, and set thresholds that fail the build when complexity crosses the line. The agent doesn’t get to produce a function with twenty conditional branches and move on to the next task. It gets bounced, forced to decompose, forced to write something a human reviewer can actually parse. This isn’t about punishing the AI for writing complex code. It’s about encoding your comprehension requirements into the pipeline itself. The agent can’t accumulate comprehension debt if the build won’t let it through the door.
The human’s role threads through all of this. At each HITL gateway, the engineer reviews output against task lists and specs, refines context, refines, redirects. Between gateways, the agent operates autonomously – running its own tests, diagnosing failures, iterating against the definition of done. The differentiator from vibe coding is that you designed the checkpoints, defined what “done” looks like at each one, and blocked certain autonomous actions – destructive operations, production access, external API calls – when running longer loops. Autonomy with architecture, not autonomy with vibes.
This extends to how you structure the work itself. Prototype-driven development delegates the frontend to the design team. A frontend engineer builds full user flows as clickable prototypes for testing and refinement with real users. UI is the hardest thing to specify for agents: animations, micro-interactions, responsive behaviour, the feel of a product. AI still needs a human in the loop for that, for now. But by separating it, you’ve removed the hardest-to-spec surface from the agent’s build. The prototype becomes a visual spec. When integrating the UI with the backend you lean on the E2E tests developed against the protoype UI. It’s TDD, where the prototype become specification – when the agent comes to build, the test suite tells it what “correct” looks like. Less context for human teams to wrestle with. More autonomy for agents to operate within.
The debt that compounds
As agents produce more output than humans can reason about – across code, documentation, architecture decisions, and operational runbooks – the gap between “what the system does” and “what anyone understands about what the system does” widens silently. Comprehension debt is the quiet systemic risk: everything works, nobody can explain why, and the knowledge to maintain it exists only in chat logs and agent context that expired three sprints ago.
Then it’s 3am. Something breaks. Your on-call engineer stares at a terminal, scrolling through code they didn’t write, referencing services they’ve never seen, calling APIs documented in a markdown file generated by an agent. That’s what comprehension debt looks like when it comes due. Agentic engineering without guardrails doesn’t just accelerate delivery. It accelerates the moment nobody can fix what was delivered.
The antidote is problem decomposition, in part it’s everything described above: the rich specifications, the HITL gateways, the test suites as living documentation. But it’s also about knowing when to slow down. We all need to become that person in your organisation that is always available to review a PR. Thoughtfully stepping through a contribution multiple times, reasoning about the approach, comprehending the impact and adding comments.
While on the subject of git, hygiene in this space is important. Commits and pull request descriptions are documentation more so now than ever – I’ve had agents stepping through these artefacts to build context. It’s important to set rules around commit messages and PR descriptions. You need to ensure they are human parsable, with details of the parent task and associated documentation. Make the agent leave breadcrumbs, multiple small commits that are easy to follow, that bubble up into a larger pull request. If the agent is working autonomously on 5 tasks in sequence, it’s cascading pull requests, one into main, another into that, another into that and so on, allowing you to step through 5 tasks, one PR at a time, commit by commit at at time and building up your comprehension.
The bit that matters
Agentic engineering isn’t about whether you use AI. Everyone uses AI. It’s about whether you’re designing the system or just typing into it.
Here’s the compounding upside – the codified skills, the reusable agent configurations, the composable workflows aren’t just good engineering practice. They’re tradeable assets. Skills marketplaces are coming. Tools like Cowork are already pointing toward an enterprise “app store” moment where the expertise you’ve encoded into repeatable workflows becomes something other teams, other organisations, can leverage. The individuals who’ve codified their craft will compound. The ones who kept it in their heads will watch it depreciate.
If you’re a leader reading this and thinking “this is a developer problem,” you’re the person in my second meeting. The one who’ll be blindsided when the same shift hits procurement, legal, finance, and operations. Because it will. The models are getting smarter. The agents are getting more autonomous. And the organisations that figure out how to govern, structure, and scale this – not just in the IDE but across the enterprise – are the ones that will compound their advantage.
The ones still treating AI as a fancy autocomplete? They’ll plateau. And they’ll blame the AI.