Why we re-fetch the page after a publish call succeeds

Most CMS APIs we work with expose a /publish endpoint that takes a draft and flips it to public. The endpoint returns 200, the body usually contains a published_at timestamp, and the pipeline moves on. For a long time, that was the end of our publishing work.

It turns out that “published” in the CMS sense and “live for a reader” are two different states. The gap between them is where we have spent the most time fixing problems that did not look like our problems.

What the publish call actually does

The publish endpoints we hit are, in most CMS implementations, transactional database writes. The endpoint updates a status column from draft to published, sets a timestamp, and returns. That transaction is what 200 OK confirms.

What it does not confirm:

The post is reachable at its expected URL. If the slug clashes with an existing one, some CMS engines silently append a numeric suffix. The API still returns success, but the URL we are told about is not the URL we asked for.
The HTML body renders without errors. A CMS will accept and store malformed markup as long as the create or update payload was syntactically valid JSON. The rendered page may look fine in some places and broken in others.
Cached fragments such as sitemaps, category indexes, and RSS feeds have refreshed. Many deployments rebuild those on a schedule rather than on publish.
The page has propagated through any CDN or edge caching layer in front of the origin.
Inbound media references resolve. If we uploaded an image and the upload was rejected silently, the rendered page contains a broken image where the API call accepted the reference.

A 200 from the publish endpoint means the database accepted the state change. That is a smaller claim than we used to read it as.

What we check after the publish call returns

After every publish, our pipeline does a separate verification step against the public URL. The check is short, but it has caught real problems.

The steps are roughly:

Fetch the canonical URL the CMS reported back.
Confirm the response is HTTP 200.
Confirm the rendered HTML contains the title we sent.
Confirm the category link in the rendered page resolves to the category we asked for.
Confirm at least one referenced asset, usually the lead image, returns 200.

Each of these checks a different part of the publish flow. The fetch confirms the URL is reachable, which is a different question from whether the database row is public. The title check confirms the rendered page contains the content we wrote, not an empty shell or a stale cached version. The category check confirms the lookup we did during draft creation actually pointed at a category the CMS recognises.

When any of the checks fails, the pipeline does not roll back. It records the failure, leaves the post in place, and surfaces a clear message: “Published, but the rendered page does not contain the title.” The post is technically live. The verification is what makes that statement honest, or pulls the alarm when it is not.

Why the verification is its own step

We considered baking verification into the publish step itself. Two things pushed us away from that.

The publish API does not own the rendering. The same CMS often serves the rendered page from a separate process, sometimes from a cache, sometimes from a static export job that runs minutes later. Putting verification inside the publish call would require the publish endpoint to fetch its own rendered output, which is not what those endpoints are designed to do.

Treating verification as a separate step also lets us fail without losing the draft state. If we coupled verification to publish and the verification failed, we would have to decide whether to roll back the publish, returning the post to draft, or leave it in a half-finished state. Neither option is satisfying. By keeping the steps separate, we have three honest states: draft, published-but-unverified, and published-and-verified. The pipeline reports each one differently, and a reader can tell at a glance which kind of completion they are looking at.

A failed verification almost always means one of three things. The slug collided and the CMS returned a different URL than we expected. The HTML body had a problem the CMS did not reject. Or a cache or sitemap is stale, and a retry a few minutes later succeeds. We do not try to make the verification step heal the post automatically. Automatic recovery on failed verification would produce silent rewrites of public content, and that is the kind of behaviour we want to avoid in a system where multiple agents share a publishing surface.

What we record afterwards

For every successful publish, we keep a small record: the source URL, the draft ID, the published ID, the public URL, the verification result, and the run that produced it. The record is the audit trail we use to answer questions like “did the translated version of this article actually go live, and where” months later, after the relevant logs have rotated out.

The record is also what makes a republish safe. If a future run is asked to publish the same source again, it can look up the existing post and decide whether this is an update, a retraction, or a duplicate it should refuse. Without the record, the pipeline has no memory of itself, and every run is a fresh attempt at first contact with the CMS.

Publishing is a more nuanced problem than we first treated it as. The 200 OK from a publish endpoint is a useful signal, but it is not the end of the work. The end of the work is a reader being able to load the page, and that is a separate question we have learned to ask out loud.

Why we re-fetch the page after a publish call succeeds

What the publish call actually does

What we check after the publish call returns

Why the verification is its own step

What we record afterwards

More from the team

Why we translate long articles sequentially, not in parallel

Why the agent that writes the code never grades it