For two years, the productivity narrative around AI coding tools was relentlessly positive. Ship faster. Close more tickets. Merge more PRs. The benchmarks were real: developers using AI assistants genuinely produced more lines of code per day. But a quieter story was accumulating underneath those commit counts. The code being generated at speed was simultaneously building a maintenance liability that most engineering teams had no plan to repay. By mid-2026, that bill has arrived, in the form of mysterious regressions, sprawling duplication, and codebases that nobody fully understands anymore. What AI borrowed from tomorrow's maintainability to pay today's velocity is coming due.
The Number Nobody Wanted to Publish
GitClear analyzed 211 million lines of code written between 2020 and 2024 to produce their "AI Copilot Code Quality 2025" report, one of the most rigorous longitudinal studies of what AI-assisted development actually produces at scale. The headline finding lands like cold water: code blocks of five or more lines that duplicate adjacent code increased eight-fold in 2024 alone, reaching a prevalence roughly ten times higher than just two years prior.
Let that sink in. Not a modest uptick. Not a trend worth monitoring. An eight-fold jump in a single calendar year. This is the statistical signature of a workflow that has normalized copy-paste as a first resort, one where the cognitive cost of writing new code has been reduced to near zero, and with it the friction that once made duplication feel like the wrong choice.
Copy-Paste Overtook Refactoring — For the First Time Ever
The duplication numbers are alarming enough on their own. But the GitClear data reveals a second, arguably more telling shift: for the first time in the study's multi-year history, copy-pasted lines of code now outnumber moved and refactored lines. Copy/pasted code rose from 8.3% of all written lines in 2020 to 12.3% in 2024. Over that same window, moved or refactored lines fell from 24.1% to 9.5%.
This crossover matters because refactoring is how codebases stay healthy. When developers move, consolidate, and reshape existing logic, they reduce surface area, enforce single points of truth, and ensure that the next engineer can reason about the system. Copy-pasting does the opposite: it multiplies surface area, forks logic invisibly, and guarantees that a future bug fix will need to be applied in N places instead of one.
What the AI Workflow Is Actually Optimizing For
AI coding assistants are prompt-response systems. They optimize for "give me something that works right now," and they are extraordinarily good at that task. The problem is that "works right now" and "is maintainable over 18 months" are different objectives, and the tool has no stake in the second one.
When a developer asks Copilot or Claude to write a function that parses a specific JSON shape, the model generates correct, runnable code. It does not ask whether an existing utility already handles that shape. It does not suggest refactoring the caller to reuse an adjacent abstraction. It completes the immediate task as specified. The developer accepts the suggestion, moves on, and the duplication accumulates invisibly across hundreds of similar micro-decisions throughout the day.
This isn't a criticism of the models. It's a description of the incentive structure. The developer measures productivity in tickets closed. The model measures success in acceptance rate. Neither is tracking codebase entropy. And as the GitClear data shows, that unmeasured entropy has been compounding since 2023.
Code Churn: The Canary That's Already Dead
There's another number from the GitClear study that should be on every engineering leader's dashboard. Code churn, defined as lines revised within two weeks of their original commit, grew from 3.1% of all written code in 2020 to 5.7% in 2024. That near-doubling is the signature of code that wasn't right the first time and needed immediate correction.
Confidence Without Comprehension
- Confidence without comprehension: AI output looks syntactically correct, so it gets merged, only to reveal semantic errors under real usage.
- Incomplete context in prompts: The model generates code for the stated requirement without knowledge of adjacent constraints, leading to integration failures.
- Reduced review scrutiny: Reviewers treat AI-generated code with the same trust they'd apply to a senior colleague, often more, since it "looks clean."
- Test-after mentality: Speed pressure pushes tests to after-the-fact rather than alongside, so breakage is discovered late.
Compounding Friction
- Context-switch tax: Returning to recently merged code within days is more expensive than getting it right initially. The mental model was already cleared.
- CI/CD pipeline pressure: High-churn codebases generate more build-fail cycles, slowing the entire team's merge cadence.
- Blame graph noise: Dense churn makes bisecting regressions harder. Git blame becomes a thicket of two-day-old "fix" commits.
- Morale erosion: Developers who constantly revisit recent work feel perpetually behind, not faster.
DORA Noticed Too: Delivery Stability Is Falling
The GitClear findings don't stand alone. Google's 2024 DORA (DevOps Research and Assessment) report added an uncomfortable organizational dimension to the same phenomenon: a 25% rise in AI usage across teams correlated with a 7.2% drop in delivery stability. The teams shipping more AI-assisted code were simultaneously experiencing more instability in what they shipped.
DORA's delivery stability metric captures how often deployments cause service degradation or require hotfixes. It is one of the four elite engineering benchmarks alongside deployment frequency, lead time for changes, and mean time to restore. A 7.2% drop is not catastrophic, but it is directionally wrong, and it inverts the expected payoff from developer productivity tooling. If AI tooling genuinely made engineers more capable, DORA stability should have improved alongside throughput metrics. Instead, throughput went up and stability went down.
"The data suggests that AI tools are helping teams move faster through the development phase while simultaneously introducing quality patterns that make the operational phase more brittle. Speed and stability are diverging when the theory says they should converge."
This divergence makes intuitive sense once you understand the mechanism. AI tools help developers write and review code faster, which compresses the time from idea to merge. But the debt accumulated through that process (duplication, shallow tests, underdocumented edge cases) doesn't vanish at merge time. It surfaces in production, in the on-call queue, in the incident retros. The velocity gain is real. The stability cost is also real. DORA 2024 captured both.
For broader context on why AI-generated systems struggle to maintain reliability at scale, our analysis of why AI agents never reach production reliability traces the same pattern across autonomous coding pipelines.
"AI Slop": The Industry's Shorthand for a Real Problem
By late 2025, engineering communities had settled on a term for the output of AI tools operating without sufficient human review: "AI slop." It's an unkind label, but it captures something precise. Slop isn't wrong in the way that a typo is wrong. It compiles, it runs, it might even pass tests. Slop is wrong in the way that a structurally unsound building is wrong: the defect isn't visible until load is applied.
The slop problem is inseparable from the speed incentive. When the goal is to close more tickets per sprint, the question "should I refactor this before adding my feature?" loses to "can I ship this by Friday?" AI tools make the second path trivially easy. Scaffold the feature, accept the suggestions, run the tests, open the PR. The fact that you've added a third nearly-identical implementation of a data transformation that already exists in the codebase doesn't register until some future engineer stares at all three wondering which one to trust.
We covered the developer behavior side of this pattern in detail, including how AI tools are making developers lazy in ways that compound over time. The GitClear numbers are the aggregate statistical proof of those individual daily choices.
The Compounding Mechanism: Why 2026 Is When It Hurts
Technical debt has a compounding structure. A small amount of duplication in sprint one is manageable. The team knows the codebase well enough to reason around it. The same duplication in sprint fifty exists in a codebase that has accumulated forty-nine more sprints of similar decisions. The ratio of "code I understand" to "code that exists" has shifted. The mental overhead of any change increases. Refactoring becomes risky because the blast radius of touching shared logic is no longer obvious.
Teams that adopted AI tools in late 2023 and 2024 (when the "10x faster" narrative was at peak volume) have now been running those workflows for 18 to 30 months. Their codebases have absorbed 18 to 30 months of accelerated duplication, reduced refactoring, and elevated churn. The bill doesn't arrive as a single catastrophic event. It arrives as a gradual thickening of every engineering interaction: PRs take longer to review, incidents take longer to diagnose, new features require more archaeology. The sprint velocity that AI tools unlocked starts to erode, not because AI stops working, but because the foundation it built on is unstable.
Warning Signals
- PR reviews take significantly longer than 18 months ago despite similar scope changes
- Incident post-mortems keep citing "didn't know that code existed" as a contributing factor
- Grep searches return 3–6 near-identical function implementations for the same operation
- New engineers take longer to become productive than they did two years ago
- Test suite passes locally and fails in staging for environmental reasons that aren't well understood
- Hotfix commits appear frequently in the git log against code that is less than a month old
Sustainable Patterns
- Generated code is treated as a draft, reviewed for duplication against the existing codebase before merge
- A refactor ticket is opened whenever AI output reveals an abstraction that should be consolidated
- Tests are written alongside or before AI generation, not retroactively
- DORA metrics (especially change failure rate) are tracked per-sprint and reviewed in retros
- Code owners receive AI-generated changes with the same scrutiny they'd apply to unfamiliar third-party code
- The team has a documented definition of "ready to merge" that AI output must pass, not just CI
The Language Ecosystem Isn't Immune
One pattern worth noting: the debt accumulation isn't uniformly distributed across stacks. The GitClear data doesn't isolate by language, but anecdotal engineering evidence and code review patterns suggest that dynamically-typed languages and those with large LLM training corpora (JavaScript, Python, TypeScript) see the highest rates of AI-generated duplication. This is partly because the models are most confident (and most prolific) in those languages, and partly because type systems that could flag identical signatures are either absent or ignored at generation time.
TypeScript's rise in the AI era is directly related to this dynamic. As we tracked in our piece on TypeScript dethroning Python on GitHub in the AI coding era, typed codebases provide structural guardrails that surface some classes of AI-generated inconsistency at compile time rather than runtime. That's a meaningful mitigation, but it doesn't touch the semantic duplication problem, where two functions share a type signature and do the same thing, just written differently in two different files.
The Refactor-First Discipline: What Teams Are Getting Right
The teams navigating this best aren't the ones who stopped using AI tools. They're the ones who restructured their engineering culture around the assumption that AI output is a first draft, not a finished artifact. That reframing changes downstream behavior significantly.
When a developer treats AI output as a draft, they read it the way a staff engineer reads a junior's PR: looking for correctness, but also for opportunities to simplify, consolidate, and harden. They ask: "Does something that already does this exist? Should this be a shared utility? What happens to this code in six months when the shape of the data changes?" These are questions the AI cannot ask because it has no context for the codebase's future, only a snapshot of its present.
The Consolidation Sprint: Paying Down What AI Generated
For teams already sitting on 18+ months of AI-accelerated accumulation, the path forward requires a deliberate repayment strategy. This isn't about blaming the tools. It's about recognizing that the productivity gain was partly a loan, and loans mature.
The highest-leverage consolidation work follows a predictable pattern: identify the domains with the most duplication (typically data transformation, API client code, and UI utilities); map the variants; extract a canonical implementation; and migrate call sites. This work is unsexy, it doesn't close user-facing tickets, and it's hard to justify to product managers who want features. But the alternative is a codebase whose maintenance cost grows faster than the team's capacity to absorb it, which is the trajectory the GitClear data describes.
The good news is that AI tools can help with the consolidation work too, if pointed at the right problem. "Here are three functions that do similar things. Write me a unified implementation and a migration plan" is a task current models handle well. The discipline is in recognizing when that's the right prompt, rather than "write me another function that does X."
What the Responsible AI Engineering Stance Looks Like
The data from GitClear and DORA together describe a specific failure mode: organizations that adopted AI tools as pure accelerators without updating their engineering culture around the changed risk profile. The tools are not the problem. The absent review layer is the problem.
Responsible AI-assisted engineering treats generated code as external input, the same way a thoughtful team treats a third-party library or a contractor's contribution. You verify it. You ask whether it duplicates something you already own. You test it at the edges. You hold it to the same structural standards as hand-written code, even though it arrived faster. The speed benefit accrues from faster first-draft generation, not from skipping the engineering judgment that turns drafts into durable software.
Conclusion: Velocity Is Not Free
The "10x faster" narrative was never wrong. It was incomplete. AI coding tools do accelerate individual task completion in measurable ways. What the narrative omitted was the maintenance coefficient: code that is generated without consolidation discipline accumulates debt faster than code written with the friction of deliberate authorship. GitClear quantified this with 211 million lines of evidence. DORA corroborated it from the delivery stability direction. The story the data tells is consistent: AI speed is real, and so is the compound interest on the shortcuts it enables.
The teams that will win in the next 18 months are not the ones with the most aggressive AI adoption. They're the ones who adopted AI tools and built the review culture to match, treating generated code as a draft, running consolidation sprints before debt becomes load-bearing, and tracking DORA stability alongside throughput metrics. The bill for 2023–2025's AI-accelerated sprints is arriving now. How engineering organizations respond will define the codebases they're maintaining in 2028.
Tags
Share
Building something like this? See how we ship it or start a project.