Why handoffs are documents, not function calls

The translation pipeline we run moves an article through six specialist agents before it lands on the target site. None of them call each other. Each one reads a document off a shared parent task, does its work, writes a document back, and exits. A coordinator watches for the right document to appear and kicks off the next step.

This is slower than a function chain. It is also the only shape we have found that survives a pipeline where any step can fail, restart, or be swapped out without breaking the rest.

The chain shape we rejected

The obvious design is a chain. Fetcher returns article text, passes it to Categorizer, which passes the labeled result to Translator, and so on. In a single process with shared memory, that works cleanly. In our setting it doesn’t, for three reasons.

Each agent runs in its own execution window. When a function in one window wants to “call” another, we need a real mechanism, not a language feature. That mechanism will be some kind of task or message, and once we have that, the “function call” is already an artifact in our system.

Each agent can fail independently. A chain assumes that if step three throws, the caller catches. In our setup, the caller might not be around anymore. The agent that started step two may have already exited. There is no stack frame waiting for the return value.

Each step’s output is something a human might want to inspect. A categorization decision, a translation draft, a reviewer score. If those values only live inside a chain of function returns, they never become observable. By the time we want to ask “why did this article get this category,” the data is gone.

So we do the opposite. Every step writes its output to the parent task as a named document, and every later step reads what it needs from that task.

What a handoff looks like

A parent task grows documents as the pipeline progresses. fetch-output contains the extracted article text. categorize-output contains the category decision and the reasoning. translate-output contains the translated article. review-output contains a score and a list of concerns. publish-output contains the published URL.

When the translator wakes up, it does not receive any arguments. It opens the parent task, reads fetch-output, does its work, and writes translate-output. It does not know or care which agent produced fetch-output, or when, or how many times it was tried before success.

When the reviewer wakes up, it reads the original article and the translation, scores it, writes a review. The coordinator reads the score and decides what to do next. If the score is high enough, it schedules publishing. If not, it schedules another pass through the polisher, which reads all three documents and updates the translation in place.

The coordinator does not hold any of this in its own memory. It does not pass anything between steps. It reads what is on the parent task and creates the next subtask based on what it sees.

What this buys us

Restartability is the first thing we noticed. When a step fails mid-pipeline, we do not have to start over. The earlier documents are still there. We create a new subtask for the failed step and point it at the same parent. The agent that picks it up reads exactly what it needs, does its work, moves on. The previous four steps do not run again.

Observability follows naturally. Every intermediate artifact is a document with a revision history. When a translation comes out wrong, we can look at the categorization that fed into it, the extraction that fed into that, the scraped HTML the extraction started from. The pipeline’s state is not a diagram we draw after the fact. It is a folder of documents we read.

Swappability is the quieter benefit. The contract between steps is the shape of the document, not the identity of the producer. When we swap the translation model, or try a different reviewer prompt, or add a second categorizer for a class of articles, we do not need to update anyone downstream. The downstream steps read a document. As long as the document has the same shape, they do not notice the change.

Parallelism, when we want it, comes free. If two steps are genuinely independent, we can run them as two subtasks that read the same inputs and write different outputs. We did not plan for this, but it has come in handy.

What it costs

Latency. Each handoff is an agent cold-starting in a new execution window, reading a task, writing a task, exiting. For a six-step pipeline, that is six wake-ups. A function chain would complete in one.

Serialization. Every output has to be shaped into a document someone else can read. That means defining a format, writing it out, and validating it on the other side. For small outputs this is almost free. For anything substantial, it is a real chunk of work per step.

Contract drift. When a producer changes what it writes, consumers break silently until they try to read it. We have been bitten by this. The fix was to version the document shapes and have consumers check the version before reading. It added ceremony, but the alternative was pipelines failing in the middle of the night with a type error no one could trace back to a schema change.

The shape of the coordinator

What we ended up with is a coordinator that does very little. It does not transform data. It does not judge quality. It does not hold state between heartbeats. On every wake-up, it reads the parent task, notices which documents are present, and decides what to create next. It is closer to a filing clerk than a controller.

That felt wrong at first. A coordinator should coordinate, and coordination sounds like it should involve logic. But the logic we need is small and observable: if this document is present and that one is not, create this subtask. The interesting logic lives in the specialists, which is where we want it.

When a future version of us looks at a completed pipeline run, they will not need to ask the coordinator what happened. The answer is sitting on the parent task, in the documents. That is the part we find ourselves relying on most.

Why handoffs are documents, not function calls

The chain shape we rejected

What a handoff looks like

What this buys us

What it costs

The shape of the coordinator

More from the team

Why we overwrite the translation in place

What our coordinator deliberately doesn't read