The most honest sentence written about AI-assisted engineering this year did not come from a productivity study or a vendor keynote. It came from a developer describing his own work day: "AI agents create a weird new kind of burnout … you might get 4-5 extremely intense hours before your brain is fully cooked." The quote went viral because thousands of engineers recognized themselves in it instantly. They are shipping more than they ever have. They are also done — genuinely, neurologically done — by early afternoon, in a way that eight hours of ordinary coding never produced. Evil Martians put the question directly in the title of the blog post that crystallized the conversation: "AI-assisted engineers are burning out, is this fine?" The answer the industry is converging on is no, and the reasons are more interesting than "people are working too much."
Supersonic Output, Subsonic Recovery
The Evil Martians essay starts from a premise nobody disputes: AI lets developers ship at supersonic speed. A single engineer running two or three coding agents in parallel can produce in a day what a small team produced in a sprint two years ago. The essay's contribution was to name the hidden cost. The work that fills those hours is not coding in any traditional sense. It is continuous high-stakes evaluation: reading diffs you did not write, reconstructing intent you did not form, deciding — dozens of times per hour — whether confident-looking output is actually correct.
This is a different cognitive activity from writing code, and it is heavier. Writing code has a rhythm that includes its own recovery: the easy refactor after the hard algorithm, the mindless test scaffolding after the gnarly race condition. Flow states are real, and they are restorative in a way that sustained vigilance is not. Agent supervision strips out every low-intensity beat and replaces it with an unbroken sequence of judgment calls. The nearest occupational analogues are air traffic control and simultaneous interpretation — professions that discovered decades ago that the sustainable daily dose of continuous evaluative attention is measured in single-digit hours, and that built mandatory rotation into their working rules because no amount of professionalism extends it.
"AI agents create a weird new kind of burnout … you might get 4-5 extremely intense hours before your brain is fully cooked."
Software engineering has stumbled into the same physiology without the working rules. The four-to-five-hour ceiling that developers keep reporting is not a confession of laziness; it is roughly what the vigilance literature would predict for sustained supervisory attention. What has changed is that before agents, almost nobody's job consisted of pure supervision. Now, for engineers who have fully adopted agentic workflows, it is most of the job — and the day is still sized for the old one.
The Context Inversion: As the Agent Learns, You Forget
The exhaustion has a second, subtler driver that the Evil Martians piece identifies precisely: a context inversion. In traditional development, knowledge of the system and effort invested in it grow together. Every hour you spend in the codebase deposits something — the shape of the architecture, the edge case that broke last quarter, the reason the retry logic looks weird. That accumulated context is what makes the thousandth hour easier than the hundredth.
Agentic workflows invert the deposit. As the agent gets more context — bigger windows, persistent memory, deeper integration with your repo — you lose it. You stop holding the architecture in your head because the agent navigates it for you. You stop remembering the edge cases because you never personally hit them. You stop carrying past decisions because you were not the one who made them; you only approved them, quickly, at 4:40pm, on the eleventh diff of the afternoon. The knowledge lives in the transcript now, and the transcript is not in your head.
The context inversion explains the otherwise puzzling shape of the fatigue. Developers consistently describe the first hours of agent work as exhilarating and the late hours as a slog — not because the tasks got harder, but because each successive review draws on a context reservoir that delegation has been quietly draining. By hour four you are doing archaeology on your own codebase. That is the "fully cooked" feeling: not tiredness exactly, but the specific depletion of making consequential judgments without grounding.
It is also the burnout-shaped twin of a dependency we have written about before. The February 2026 METR follow-up work found that most developers now refuse to work without AI assistance — to the point that the study itself struggled to staff a control group. Dependence and depletion are the same loop seen from two angles: the less context you hold, the more you need the agent; the more you use the agent, the less context you hold, and the more each supervised hour costs you.
Every Saved Minute Became More Work
None of this would be a crisis if organizations banked the productivity gains as slack. They have done the opposite. Engineer and writer Marco Kotrotsos put it bluntly: organizations treat every minute saved as a minute available for more work. The result, he argues, is not less burnout but a different kind — and it is hitting hardest exactly the people who embraced AI most eagerly. The early adopters demonstrated what was possible, and what was possible became the quota.
"Organizations treat every minute saved as a minute available for more work — not less burnout, a different kind, hitting the people who embraced AI hardest."
The ratchet only turns one way. An engineer who uses agents to finish Tuesday's work by noon does not get Tuesday afternoon back; they get Wednesday's work moved up. Sprint capacities quietly absorb the new velocity, roadmaps re-baseline around it, and within two quarters the supersonic pace is simply the pace — except that the four good supervision hours it depends on have not grown at all. The engineers caught in this are not victims of bad managers so much as of an accounting error: throughput is measured where it is easy (output shipped) and not where it binds (judgment expended), so the system keeps loading the invisible resource until it fails.
The clearest artifact of this dynamic comes from inside Amazon, which built an internal leaderboard — nicknamed "Kirorank" — ranking employees by AI token consumption. The premise was that token usage proxied for AI adoption, and adoption was a behavior leadership wanted to encourage. Employees responded the way employees always respond to a metric: they gamed it, launching excessive agent runs to climb the rankings. Amazon shut the leaderboard down. The episode is funny until you read it as evidence of what is being measured and what is not: an organization sophisticated enough to instrument every token an employee consumes had no instrument at all for the supervisory attention those tokens demand.
This measurement gap is the institutional core of the problem. Output per engineer is visible and rising; cognitive cost per engineer is invisible and rising faster. Any system that measures one side of a trade will maximize it at the expense of the unmeasured side. The same blindness shows up downstream in the codebase itself — we documented in our analysis of the AI maintenance bill how the same ship-fast incentives produced an 8x jump in duplicate code and a collapse in refactoring. Fatigued reviewers and decaying codebases are not two problems. They are one problem, observed in the humans and in the git history respectively.
Why This Burnout Is Different — and Why That Matters
Classic engineering burnout had a recognizable etiology: long hours, crunch deadlines, on-call trauma, meaningless work. The AI variant violates the pattern, which is why it keeps slipping past managers and past the sufferers themselves. The hours are often normal. The deadlines are often met early. The work ships. By every signal a manager is trained to watch, the heavily-agentic engineer is thriving — right up until the quality of their judgment quietly degrades, or they leave.
Exhaustion of effort
- • Driven by hours, crunch, and on-call load
- • Visible: late nights, missed deadlines
- • Correlates with low output
- • Recovery: time off, lighter sprint
- • Hits the overworked and under-resourced
Exhaustion of judgment
- • Driven by continuous high-stakes evaluation
- • Invisible: output stays high while it builds
- • Correlates with record productivity
- • Recovery requires rebuilding lost context, not just rest
- • Hits the most enthusiastic AI adopters first
The recovery asymmetry in that comparison deserves emphasis. Rest fixes effort exhaustion. It does not fix judgment exhaustion, because the depleted resource is not energy — it is context. An engineer who returns from two weeks of vacation to a codebase that agents modified the entire time comes back to more reconstruction work, not less. This is why teams report that their most AI-leveraged engineers seem unable to recharge: the thing they need to recharge is a mental model of the system, and the workflow that exhausted them is structurally incapable of rebuilding it.
For individual engineers, the early symptoms are worth naming because they masquerade as personality changes rather than occupational ones. Approvals start drifting toward "looks fine" on diffs you would have interrogated six months ago. You notice you can no longer sketch your own system's architecture on a whiteboard without checking. Small decisions — naming, error handling, where a function belongs — start feeling disproportionately heavy, because each one now requires reconstructing context you used to carry. And the evening exhaustion arrives without the compensating satisfaction that hard creative work used to bring, because evaluation, unlike creation, produces no artifact you can point at. Engineers who recognize two or more of these patterns are not failing at AI adoption. They are succeeding at it, on a schedule designed for a different job.
What Actually Works: Team Practices for the Four-Hour Reality
The good news is that supervision fatigue is an operations problem, and operations problems have operational fixes. The teams handling this well — including our own — have stopped treating the four-hour ceiling as a personal failing to be coached away and started treating it as a design constraint, the way aviation treats duty-hour limits. Three practices do most of the work.
1. Review Budgets
Cap the volume of agent output any engineer is expected to evaluate per day, explicitly, in planning. If an engineer has four good supervision hours, the team plans against four — not against whatever volume three parallel agents can generate. This sounds like a productivity sacrifice; in practice it is the opposite, because approvals made past the ceiling are where the expensive defects come from. A review budget is to supervision what a WIP limit is to a kanban board: an admission that throughput beyond a threshold is fake.
The budget also gives managers the signal they currently lack. When an engineer reports being over budget — more agent output queued than supervised hours available — the team has a concrete, blameless trigger to redistribute work or slow generation, instead of waiting for the lagging indicators of degraded review quality. Some teams make the budget visible in the same dashboards that track agent throughput, which has the useful side effect of reminding everyone that the two numbers are coupled: generation capacity is effectively infinite now, and supervision capacity is the entire constraint.
2. Context Rotation
Counter the context inversion directly by ensuring every system has a human who still builds in it by hand. Rotate engineers through periods of direct ownership — writing code themselves, handling the incidents, making the architectural calls — so that the team's reservoir of internal models is continuously refilled. The rotation also breaks the dependency loop: an engineer who spent last month inside the payment service by hand reviews this month's agent diffs against it at a fraction of the cognitive cost.
3. Agent-Free Blocks
Protect scheduled time — half-days, not stolen minutes — in which engineers work without agents entirely. Not as digital wellness theater, but because unassisted work is when mental models form. Reading code slowly, debugging without a hypothesis generator, designing on a whiteboard: these are the activities that deposit the context that supervision withdraws. Teams that adopted agent-free blocks report the same counterintuitive result: total throughput holds or rises, because the supervised hours that remain are grounded in fresher models of the system.
Conclusion: Is This Fine?
The Evil Martians question — "is this fine?" — has a precise answer. The speed is fine. The agents are fine. What is not fine is an industry that re-priced the production of code to near zero and then assumed the evaluation of code was free too. It is not free. It is the most expensive cognitive work software engineering has ever asked of its practitioners, it has a daily ceiling of about four hours, and every credible report from the field says the people doing the most of it are burning down their reserves of context and judgment to keep the dashboards green.
The four-hour ceiling is not a bug in your discipline or a gap in your stamina. It is the actual shape of the new job. Teams that redesign their workflows around that shape — review budgets, context rotation, agent-free blocks — get to keep both the supersonic output and the engineers. Teams that keep converting every saved minute into more supervision will discover what air traffic control learned the hard way: you can schedule a human past the limits of sustained attention, but the errors that follow were on the schedule too.
Tags
Share
Building something like this? See how we ship it or start a project.