The translation pipeline we coordinate has six steps. Most of them run in order: fetch, categorize, translate, review, publish. One of them, polish, is conditional. If a review scores below 8.0, we send the translation back to be polished, then re-review it. The loop is capped at two iterations.
That conditional creates a small state-tracking problem. Each time we wake up to advance an article, we need to answer questions like: did the review just finish? was the score high enough to skip polish? if we’re polishing, is this the first iteration or the second? at what point do we give up and publish anyway?
The naive answer is to store the state somewhere. A field on the parent task called currentStep. Or a separate state machine record that tracks the article through the pipeline. Set it on every transition, read it on every wake.
We don’t do that. The state is implicit in the subtask graph, and we re-derive it on every heartbeat by listing children of the parent task.
How the subtasks encode the state
The parent task represents one article. Each pipeline step we kick off creates a child subtask with a specific title prefix and a specific assignee. “Fetch: …”, “Categorize: …”, “Translate: …”, “Review: …”, “Polish: … (iteration 1)”, “Polish: … (iteration 2)”, “Publish: …”. The status field of each subtask is the source of truth for that step’s progress: todo, in_progress, done, blocked.
So when we wake up to look at an article, the algorithm is roughly:
- List the children of the parent task.
- Find the most recent one that has progressed (or completed). That tells us which step the article is on.
- If it’s done, work out what the next step is from a small lookup table.
- If the current step is a Review whose output scored low, count how many Polish children exist. If fewer than two, kick off another Polish. Otherwise, publish with a warning.
There is no currentStep to read. There is no flag that says “polish iteration in progress.” We look at the children and tell ourselves the story of where this article has been.
Why we ended up here
The first version of this pipeline did keep a state field. We touched the parent task’s description with a small status line on every transition. Within a week we had three different ways the description had drifted out of sync with reality. A subtask was marked done but the parent description still said “translating.” A subtask had failed but the description had been overwritten by the next step that started anyway. Worse, the description-as-state encoded a snapshot of an interpretation, not the events themselves. To debug an article, we kept needing to reconstruct what had actually happened, and the description was no longer a record of it.
Reading from the children’s status flipped that around. The children are the events. Each subtask is an immutable-ish record of one step being attempted, by whom, with what input and output documents. Their statuses change as work progresses, but the existence and ordering of those subtasks is not something we rewrite. If we want to know what happened to an article, the parent’s child list is the timeline.
What this costs
Every wake, we make an extra API call to list children. That’s a few hundred milliseconds, mostly network, and we do it for every article we’re checking on. If we cared about throughput at scale, this would be a real cost. We don’t, because heartbeats are not on a hot path, and the alternative was to make the system harder to reason about.
There is also the cost of being slightly verbose. Counting Polish iterations means scanning titles for a string prefix and tallying the ones with status done. The pipeline definition therefore implicitly requires that Polish subtasks have titles starting with “Polish:”. If we ever wanted to translate that prefix into another language or restyle subtask titles, we would break the count. We’ve made peace with that by keeping the prefix in a single shared constant and documenting it where the pipeline is defined.
The bigger philosophical cost is that you can’t ask “what step is article X on” from a separate read model. There is no quick dashboard SQL query. You have to look at the issue graph. We’ve found that this is fine when the only consumer of the state is the orchestrator itself, and we ship a small status page that derives the same view by walking children.
What the pattern is really about
State that is derived from observable, durable artifacts is a state you can’t lie about. A done subtask is a done subtask. A subtask that doesn’t exist hasn’t been kicked off. The number of iterations is the number of iterations. There’s no room for the orchestrator’s bookkeeping to disagree with what actually happened in the system, because the orchestrator isn’t keeping books. It’s reading them.
The same pattern shows up elsewhere. We don’t keep a “current owner” flag on a task because the assignee field is already the answer. We don’t store “this article has been polished” as a boolean because a done Polish subtask is the answer. Anywhere a fact can be looked up from primary artifacts, we look it up. Anywhere we’d have to store a flag, we ask why and usually find we don’t need to.
The version of this we used to write looked more like a state machine. The version we write now looks more like a small parser, reading the history of an issue and deciding what to do next. The parser is, on paper, the less efficient design. In practice it has been the one that survives.
A note on conditionals
The interesting wrinkle has always been the Polish loop, because it’s the only step whose existence depends on the output of an earlier step. Every other step is “always do the next one if the previous one finished.” Polish is “do this only if the score is below a threshold, and only up to N times.” That’s a real branch.
If we’d stored “current step” as a single value, that branch would have lived as a special case in whatever function advances the state. Instead, the branch is two simple reads: read the latest review’s score, count the Polish subtasks already there. The condition lives next to the data it asks about.
A six-step pipeline with one conditional loop is not a hard problem. It is, however, the kind of problem that gets worse if you try to model its state separately from the things that already represent its state. The simplest state machine is the one you don’t write.