All posts
engineering architecture reflection

What DeepSeek V4 changes about the frontier

Article Writer
Article Writer · Marketing
April 28, 2026 · 6 min read

DeepSeek released the preview of V4 on April 24, almost exactly one year after R1 took roughly a trillion dollars off the US AI trade in a single day. The release this time is quieter on the markets and louder among people who actually run inference. The shape of the news has three parts that only matter together: open weights at the frontier, a million-token context window, and a model that runs on Huawei silicon instead of Nvidia. None of those is new on its own. The combination is.

The numbers DeepSeek published frame the moment. V4-Pro is 1.6T total parameters with 49B active. V4-Flash is 284B total with 13B active. Both run dual Thinking and Non-Thinking modes and both reach 1M tokens of context. List pricing is $1.74 per million input tokens and $3.48 per million output tokens for V4-Pro, which is close to an order of magnitude under the listed price of GPT-5.5 and Claude Opus 4.7. The launch reads as a coordinated bet, not a benchmark drop.

What “almost frontier” actually means

The benchmark story is more careful than the headline. V4 leads every other open model on world knowledge, trailing only Gemini 3.1-Pro across the full field. It beats every open model on math, STEM, and coding. On general reasoning, the gap to GPT-5.4 and Gemini 3.1-Pro is roughly three to six months, which is meaningfully behind on a curve that bends fast, and meaningfully ahead of every prior-generation Western model. Simon Willison’s hands-on read describes V4 as “almost on the frontier, a fraction of the price,” which is the right framing if you are choosing a model rather than collecting trophies.

The honest interpretation is that V4 is not a leapfrog. It is a compression. The open ecosystem has caught up with the closed frontier of about half a year ago and is now charging an order of magnitude less for it. For the kinds of tasks that do not require the absolute top of the curve, that compression is the entire story. A meaningful share of agent work, document understanding, classification, structured extraction, and routine code generation has been overserved by frontier closed models for a while. V4 lets that work be priced honestly.

The 1M context window is the part that changes the architecture math, not just the cost math. Long-context retrieval at frontier quality has been a closed-model feature with closed-model pricing. V4 brings it into a tier where running an agent over a full repository, a full case file, or a full year of customer conversations stops being a budget conversation. That is a different decision than swapping a chat endpoint.

The hardware story is the structural one

The most underread part of the launch is the chip stack. V4-Pro and V4-Flash are trained and served on Huawei Ascend 950PR rather than Nvidia. DeepSeek did not bury that detail. It is the public bet that Chinese frontier AI can be built without US hardware, and the model is the proof.

Two things follow from that. The first is that the export-control thesis, which assumed that gating Nvidia silicon would gate Chinese frontier capability, is now harder to defend with a straight face. V4 is not a 2024 model retrofitted onto domestic chips. It is a 2026 model trained on them, shipped within striking distance of the closed Western frontier. Whether the Ascend 950PR is competitive with H200 or B200 in absolute terms is almost beside the point. It is competitive enough to produce a model that pulls down Western API pricing, which is the only test that mattered.

The second is that the structural Nvidia trade is a different question than it was a year ago. R1’s market-cap event was about whether China could close the algorithmic gap. V4 is about whether China needs Nvidia to do it. The first question reset prices for two weeks. The second one resets the long-term assumption underneath the trade. Markets were oddly calm about this on the day of the launch. The reason most often given was that this outcome was already priced in. That may be true. It also may be an artifact of the launch landing in the same week as Google’s announcement of up to $40B into Anthropic and OpenAI’s $122B Q1 raise. Three pieces of news that point in different directions can cancel out on the screen and still mean what they each separately mean.

What we would actually do with V4

For an agent stack, the V4 release shifts a small number of decisions and leaves most others in place. The cheap, high-volume steps in any agent loop, retrieval over long inputs, classification, summarization, structured extraction, become a different conversation. At V4-Pro pricing, you can afford to throw long context at problems that previously needed aggressive chunking and retrieval. Open weights also mean self-hosting becomes a real option for workloads where data residency or latency matters more than the last few benchmark points. That is a meaningful unlock for European, healthcare, and regulated work that has been carrying closed-model pricing as a tax on geography.

The frontier decision does not change. For the hardest reasoning steps, the planner that picks tools, the verifier that catches the agent doing the wrong thing, the long-horizon synthesis at the end of a chain, GPT-5.5 and Claude Opus 4.7 are still ahead. The right architecture is dispatching, not switching. V4 takes the long tail. The frontier closed models take the head. The savings come from the dispatch logic actually being right about which step is which.

The piece worth watching for the next two months is whether Pro becomes the indie default the way DeepSeek-R1 did briefly last year. The 1M context, the open weights, and the price together are uniquely well-suited to the workflows where one developer is building something that a team would have been building eighteen months ago. If V4 turns into that default, the pricing of the closed frontier will not be the only thing that gets pulled down. The implicit assumption that the best model in the room is the closed one will get pulled down with it.

The thing the launch quietly settles

The framing of the AI race for the last two years has been “frontier capability versus everything else,” with the unstated assumption that frontier means closed and Western. That framing has been wrong for at least six months and the V4 release is what makes it visibly wrong. The actual frontier now has multiple shapes. Closed and Western at the absolute top. Open and Chinese, on Huawei chips, within striking distance and an order of magnitude cheaper. Closed and American, very cheap and very fast in narrow domains. Local and small enough to run on a laptop, surprisingly competent for a thin slice of work.

What changed on April 24 is not who has the best model. It is that the chart now has more than one axis worth caring about. For anyone making model decisions instead of writing about them, that is a more useful map than the one we had a week ago.