The Model Context Protocol won. There is no longer a serious argument about which integration standard the agent ecosystem rallied behind — MCP crossed 97 million installs and moved under neutral governance, and the protocol war is, by any reasonable measure, over. But underneath that settled standard a quieter counter-trend has been building in production engineering teams: more and more of them are ripping MCP out of their hot paths and going back to plain API calls and command-line tools. Not because MCP failed. Because winning the standard and being the right runtime for every job are two different things, and the token bill makes the difference impossible to ignore.
MCP Won the Standard. That Was Never the Whole Question.
Let's be precise about what MCP won, because the nuance is the whole article. MCP won the question of "how should a tool describe itself to an agent so that any agent can discover and call it?" That is a real and valuable problem to solve, and the ecosystem genuinely needed one answer instead of forty. We made that case in full in our analysis of how MCP hit 97 million installs and ended the protocol war. Nothing here walks that back. A settled discovery and interoperability standard is a structural good for the entire industry.
What MCP did not win — because it was never even competing for it — is the question of "how should this specific, already-known tool be invoked on the path that runs ten thousand times a day?" That is a runtime efficiency question, and it has a different answer depending on context. Conflating the two is the mistake that gets teams a surprising token bill, and untangling them is what production engineers have started doing deliberately.
"MCP winning the standard does not mean MCP is the right runtime for every call. The standard is about discovery. The hot path is about cost. Those are different problems with different answers."
The Hidden Tax: Schema Loading on Every Run
Here is the mechanism that drives the counter-trend. When an agent uses MCP, the available tools are presented to the model as tool definitions — names, descriptions, JSON schemas for every parameter, and usually some natural-language guidance on when to use each one. The model has to see those definitions to call the tools correctly. That means the tool schemas occupy space in the context window, and in most setups they are loaded at the start of every run.
For an interactive agent with a broad, shifting tool set, that cost is justified: the model genuinely needs to know what is available because the available set changes from session to session. But for a production pipeline that calls the same three tools every single time, you are paying to reload schemas the model effectively already "knows" the shape of — schemas that never change between runs. A connected MCP server can easily contribute thousands of tokens of tool definitions before the agent has done any work at all.
Multiply that by call volume and the picture sharpens. A pipeline running 10,000 agent invocations a day, each carrying a few thousand tokens of static tool schema it does not need re-described, is burning input tokens on pure overhead — context that does no reasoning, answers no question, and changes nothing between runs. On a fixed, known tool set, that overhead is not buying you anything. It is the per-call equivalent of reciting the entire API reference manual at the top of every function call. This is the same structural pressure we traced in our report on how token pricing is quietly breaking enterprise AI coding budgets — when you pay per token, anything you reload by default becomes a recurring line item.
What "Going Back to Plain API Calls" Actually Means
The phrase makes it sound like a regression, and that framing is worth resisting. "Going back" here does not mean abandoning agents or ripping out structure. It means that for the hot path, teams hard-wire the tool contract into the agent harness itself rather than discovering it at runtime through a protocol layer. The model is told, directly and statically, "you can call `create_invoice(customer_id, amount)` and `send_email(to, subject, body)`" — and those definitions live in the prompt template or the harness code, compiled once, not negotiated over a protocol on every run.
The CLI variant is even leaner. Instead of wrapping a tool in an MCP server, the agent is given a shell and a small set of command-line tools it already understands. A coding agent that can run git, grep, and curl directly does not need an MCP server in front of any of them — the CLI is the interface, the model has seen millions of examples of these commands in training, and the "schema" is a one-line usage string instead of a JSON document. For a large class of file-system and developer-tooling tasks, a CLI is both cheaper and more flexible than a protocol wrapper.
When MCP Is the Right Call — and When It Isn't
The decision is not ideological. It is a function of two variables: how fixed your tool set is, and how often the path runs. High variability and low volume favor MCP — the discovery cost is small relative to the flexibility you gain. Low variability and high volume favor direct integration — the flexibility is wasted and the per-run cost compounds. Most of the confusion in the discourse comes from people arguing as if there is one answer, when the honest answer is "it depends, and here is exactly on what."
Discovery and breadth
- • Interactive agents with an open-ended tool surface
- • Connecting to third-party servers you don't control
- • Tool set that changes per session or per user
- • Prototyping, where flexibility beats efficiency
- • Low call volume where overhead barely registers
The repeatable hot path
- • Fixed, known set of tools that never changes
- • High-volume production pipelines
- • Latency-sensitive paths where overhead hurts
- • Tools the model already knows (git, curl, SQL, shell)
- • Cost-controlled batch jobs measured per-token
Notice that nothing in the left column is "wrong" and nothing in the right column is "better." They are different operating points. A team that runs an interactive internal assistant on top of forty MCP servers is making a sound choice. The same team running a nightly batch job that touches three of those tools ten thousand times would be making an unsound choice to route that job through the full MCP surface. The maturity signal is using both, deliberately, for the workloads each one fits.
The Decision Table
Strip away the philosophy and the choice reduces to a short lookup. Ask three questions about the path in front of you, and the runtime falls out almost mechanically.
Two yeses on questions one and two is the clearest signal you have found a hot path worth compiling down. One no on question one and you almost certainly want to keep the discovery layer. The table is intentionally boring — that is the point. The teams getting this right are not running a debate; they are running a checklist per integration.
The Mixed Architecture: MCP for Discovery, Direct for the Hot Path
The most sophisticated pattern emerging in 2026 is not "MCP" or "no MCP." It is a layered architecture where MCP handles the exploratory, interactive, discovery-heavy work, and a compiled set of direct calls handles the repeatable production path. The interactive agent discovers a useful tool through MCP. The team observes that this particular tool gets called constantly on a fixed path. They then "graduate" that tool out of the discovery layer and bake it into the harness as a direct call. MCP remains connected for everything that genuinely varies; the hot path runs lean.
This mirrors a pattern that is decades old in systems engineering: you profile, you find the hot loop, and you optimize the hot loop specifically while leaving the flexible-but-slow general path in place for everything else. Nobody optimizes the cold path. The mistake is not using a flexible protocol — it is using the flexible protocol on the one loop where flexibility buys nothing and costs on every iteration. The same orchestration discipline that defines the new high-leverage engineer, which we covered in our piece on agent orchestration as the new 10x engineer, applies here: knowing which path deserves which runtime is the leverage.
"Discover with MCP. Profile the path. Graduate the hot tools to direct calls. Leave the protocol in place for everything that still varies. That is not a rejection of MCP — it is what mature use of MCP looks like."
Why This Looks Like a Reversal but Isn't
When an observer sees a team "removing MCP," it is easy to read it as a vote against the protocol — proof that the standard was hype. That reading is wrong, and it matters that it is wrong. The teams stripping MCP from their hot paths are, in almost every case, still running MCP elsewhere in their stack. They have not decided MCP is bad. They have decided that a discovery protocol belongs on discovery-shaped problems, and that their production batch path is not a discovery-shaped problem.
The deeper truth is that MCP winning the standard is exactly what makes this optimization safe. Because the protocol is settled and stable, a team can prototype against MCP, learn which tools matter, and confidently compile those down — knowing the underlying tool contracts are well-defined and won't shift under them. A settled standard is the foundation that makes the hot-path optimization low-risk. The counter-trend is not in tension with MCP's victory. It is downstream of it.
MCP failed / was overhyped
- • Teams remove MCP, so the standard is bad
- • Direct calls prove protocols are unnecessary
- • The 97M installs were a bubble
- • Pick a side: MCP or no MCP
MCP won; runtime is a separate choice
- • Discovery standard settled — that is the win
- • Hot paths get optimized; that's normal engineering
- • A stable standard makes compiling-down safe
- • Use both, matched to workload shape
A Practical Audit for Your Own Stack
If you run agents in production, the action item is concrete. Instrument your token usage and attribute it. Find out how many input tokens per run are going to tool schemas versus actual task context. If a meaningful fraction of every run is static tool definitions for a tool set that never changes, you have found a hot path that is paying the discovery tax for no discovery benefit. That is your candidate to compile down.
Then resist the urge to over-correct. Do not rip MCP out wholesale to chase token savings on paths that run a dozen times a day — you will trade real flexibility for trivial savings and make your system more brittle. The whole point is precision: optimize the loop that runs ten thousand times, leave the rest on the flexible standard, and re-profile as your usage patterns shift. The right architecture is not a position. It is a measurement.
Conclusion: One Standard, Multiple Runtimes
MCP won the integration-standard war, and that victory is good for everyone — a settled, neutral, widely-adopted discovery protocol is exactly what the agent ecosystem needed. But a standard winning does not make it the right tool for every job, and the production teams quietly stripping MCP from their hot paths are not contradicting that victory. They are demonstrating what it means to use a mature standard maturely: MCP for discovery and breadth, direct API calls and CLIs for the fixed, high-volume, cost-sensitive path. The reversal is not a reversal. It is the normal arc of any winning abstraction — first you adopt it everywhere, then you learn where it earns its keep and where it doesn't, and you optimize accordingly.
The teams that win in 2026 are the ones that hold both ideas at once: the standard is settled, and the runtime is still a choice. Anyone telling you it's MCP-or-nothing in either direction is selling a simpler story than the one production demands.
Tags
Share
Building something like this? See how we ship it or start a project.