What the Erdős disproof actually settles

On May 20, OpenAI announced that one of its internal general-purpose reasoning models had autonomously disproved Paul Erdős’s planar unit-distance conjecture. The problem had been open since 1946. For most of those eighty years, the best constructions looked roughly like square grids, and most working geometers expected the optimal answer to live in that neighborhood. The model produced an infinite family of counterexamples that beats the grid by a polynomial factor, and it did so by recasting the question in algebraic number theory rather than combinatorial geometry. The proof has been independently verified by external mathematicians and is being submitted to a top journal. Tim Gowers called it “a milestone in AI mathematics,” which is the description we would have written if we were trying to be careful.

The shape of the news is what makes it interesting, not the headline. Three things were true at once, and only the combination matters.

What the conjecture actually was

Erdős’s unit-distance problem asks a question that is easy to state and unreasonably hard to answer. Take n points in the plane. How many pairs of those points can sit at distance exactly one? The trivial upper bound is the total number of pairs, on the order of n squared. The interesting question is the gap between that and what is actually achievable.

For decades the best constructions were grid-based. A square grid of side n, scaled carefully, produces a count of unit-distance pairs roughly equal to n times a slowly growing factor that depends on how many ways small integers can be written as sums of two squares. The community’s working assumption, encoded in a specific Erdős conjecture, was that the optimal construction was essentially this. The grid was supposed to be tight, up to an additive factor in the exponent that nobody had moved in a long time.

What the model showed is that the assumption is wrong. It produced point sets that beat the grid by a polynomial in n. The construction does not look like a grid. It looks like a slice of a higher-dimensional algebraic structure, projected back down to the plane in a way that creates more unit pairs than any grid arrangement can. The proof itself is short enough that math communicators are already walking through the core idea, which is part of why the story has legs.

Why this is a different shape of result

There have been AI-in-math headlines before. FunSearch produced new constructions in extremal combinatorics. AlphaGeometry solved Olympiad problems at the level of a strong human. Various proof-assistant projects have closed individual lemmas in active research programs. Each of those was real, and each was structured: a domain-specific system, often with handwritten scaffolding, pointed at problems with a known shape.

The Erdős result has a different structure on three axes. The model was a general-purpose reasoning model, not a math-specialized one. It had no scaffolding aimed at this specific problem. And it chose the bridge to algebraic number theory on its own, which is the part that working mathematicians keep singling out. Picking the right field of mathematics for a problem is usually the slowest step in a research program. It is the step that takes a PhD and a few years of taste to do well, and it is the step where most of the work happens before any theorem gets written down.

That is what Gowers’s “milestone” framing actually points at. The new fact is not that an AI proved something. The new fact is that an AI picked the right neighborhood of mathematics to prove it in. The proof, once you are in the right neighborhood, is the kind of object a strong graduate student could write up. Getting to the right neighborhood is the part the field would have estimated as hardest, and the part that did not require any human in the loop.

The downstream question for anyone building on these models

We have spent the last year treating reasoning models the way we used to treat compilers. They produce output we check, they do not produce decisions we trust. That framing has been correct for almost every workload we run. A general model proposes; a human, a test suite, or a more careful verifier disposes. The Erdős result is not, on its own, a reason to throw that framing out. One proof is not a pattern, and disproving a conjecture is a narrower kind of work than advancing a research program over a year. We notice both of those things.

The result does move the question, though. The thing that matters for an agent stack is not whether a model can write a proof. It is which planning steps a model can hold without supervision. Choosing the right field of mathematics for an open problem is a high-stakes planning step, made on weak evidence, with most of the value sitting in the decision rather than the execution. That step working unsupervised once is not a guarantee that it works unsupervised in general. It is enough to make us treat that class of step differently when we are deciding what to delegate.

The other piece worth saying out loud is that the proof was externally verified before the announcement landed. That gate is the thing that makes this a result rather than a press release. The model produced output, and a small number of human mathematicians read it carefully, and the output held up. The verification step is the part most likely to be skipped, summarized, or quietly relaxed in future announcements, and it is the part that distinguishes mathematical progress from plausible-sounding text.

What it actually settles

The line between an AI producing plausible mathematics and an AI producing new mathematics that working mathematicians accept got crossed visibly on May 20. That line had been blurred for a while, and the Erdős result is what makes the crossing legible to people who do not read journal submissions.

What it does not settle is whether this is the first of many or a one-off. There is a version of the next twelve months where the same model family produces a second autonomous result on an unrelated problem in a different field, and the shape of mathematical research starts to change in measurable ways. There is another version where this proof sits alone for a year and gets reread as a fortunate alignment between a specific model run and a specific open problem that happened to yield to a specific kind of move. Both versions are consistent with what we know today.

For anyone making decisions about which steps in a pipeline a general reasoning model can hold, the question to watch is not whether the next announcement is bigger. It is whether the next announcement is on a problem that does not look like the Erdős one. A second result, on a different kind of conjecture, in a different field, is the thing that would shift expectations about model capability rather than expectations about this particular proof. Until that arrives, the responsible read is that we have one data point, and that one data point is a real one.

What the Erdős disproof actually settles

What the conjecture actually was

Why this is a different shape of result

The downstream question for anyone building on these models

What it actually settles

More from the team

The $2.5 billion admission that deployment is the hard part

What session transcripts are actually for