Twelve days. Four open-weight model releases. MiniMax M2.7, GLM-5.1, Kimi K2.6, DeepSeek V4. The agentic-coding inference price floor moved roughly two orders of magnitude versus the closed frontier, at functionally near-parity on the benchmarks people quote.
A lot of writeups this week are about what that means for the cost of building. I want to write about what it doesn't mean.
Cost was never the bottleneck
For most of the things I am actually shipping, coherence was.
The work I run looks like this: an org of about twelve agents, each with their own home directory, their own rules, their own memory layer. The top-of-org agents (planning, dispatch, synthesis) run on Opus. Workers that do focused execution run on Sonnet. Cheap ops (parsing, log scraping, log audits, simple lookups) run on Haiku or smaller. Each agent reads and writes its own memory layer. Memory doesn't move between models. Agents do.
What changes if you swap Sonnet for open-weight
The question the open-weight wave is forcing is whether I should swap Sonnet for an open-weight model at the worker tier. The cost case is real. A worker that runs at 1/100th the inference price unlocks substantially more parallel work for the same compute budget.
The reason I haven't done it yet, and probably won't for the worker tier, is coherence.
Workers in my setup are not independent function calls. They're agents. They read their own memory between turns. They write hand-backs that another agent reads to authorise GREEN/AMBER/RED. They hold a tier 2 authority bound that constrains what they can decide without notifying their parent. They participate in a hand-off protocol where context is structured, not free-form. Drift on any of those protocols costs me more than the inference savings.
What I've found, running this for months, is that drift is the dominant cost. Drift in voice. Drift in hand-back format. Drift in authority interpretation. Drift in what a tier 3 escalation looks like. Most of these don't show up on a coding benchmark. They show up two weeks later when I notice a worker has been silently flipping its own status fields, or summarising hand-backs in the CEO's voice, or skipping a step in the verification protocol that the prompt said to follow.
Open-weight models at the cost floor are improving fast and the gap is narrowing. But the gap that matters for me isn't the coding gap. It's the instruction-following gap, the context-stability gap, the "does this agent stay in role across a 30,000-token session" gap. On those, the frontier is still where the durable wins live.
The real cost-floor moment
I'm not arguing nobody should switch. I'm arguing the switch decision shouldn't be made off the cost-per-token number on the front page of the model release. The right question is: how much do you depend on the model staying in role, in protocol, in voice, in authority bound, across multi-turn work that lives behind a hand-off?
If the answer is "a lot," the cost floor isn't your bottleneck. Coherence is. And coherence is bought at the top of the stack.
If the answer is "not much," go and run the cheap thing. You'll be fine.
The real cost-floor moment, when it lands, will be when an open-weight model holds protocol across a long session at frontier-tier reliability. That's the unlock. The price gap is the easy part. The protocol gap is what the open-weight wave still has to close.
Until then, my tier policy doesn't change. Cheaper at the bottom for cheap ops. Frontier at the top for the work that has to remember who it is.