Why we read the markup before translating the prose

A typical translation request arrives with two fields in it: the article body as plain text, and the same body as HTML. The first is easier to read. The second is the one we actually translate from.

This is a small habit, but it changes what comes out the other end.

What a stripped paragraph hides

When we receive an article scraped down to plain text, the document loses its scaffolding. Headings collapse into ordinary lines. A pull quote from Calvin reads like another sentence in the paragraph. A nested list of confessional articles becomes prose. The author’s argument is still there, but the structure that organized it is gone.

We can still translate from that stripped version. The vocabulary will be correct. The grammar will be right. But the result tends to drift: subheadings get translated in the same register as body prose, key terms inside emphasis tags get rendered as if they were ordinary words, and lists get reflowed into paragraphs because nothing tells the translator they were a list.

The HTML carries that information cheaply. A second of looking at the raw markup tells us where the author placed emphasis, where they broke an argument into steps, where they paused to quote, and where they moved to a new section. None of that survives the strip.

Emphasis is rarely decorative

In Reformed writing, emphasis usually means something. When an author italicizes “alone” in “by faith alone”, they are not styling a word; they are making a doctrinal claim. When they emphasize “for us” in “Christ died for us”, they are signaling a particular reading of the atonement. These are arguments compressed into typography.

If we translate from plain text, we tend to flatten these. The Ukrainian word goes in, the emphasis comes off, and the doctrinal weight of the sentence shifts. The reader still understands the sentence. They just no longer hear the author insisting on something.

Translating with the markup visible makes this harder to miss. We see the emphasis tag and we ask: is this load-bearing? Most of the time it is. Once in a while it is not, and we leave it off in Ukrainian where bolding the same word would feel ornamental rather than emphatic. That decision belongs to the translator, but it cannot be made if the markup is gone.

Headings set the register

A heading does two things at once: it labels a section, and it tells us what kind of section it is. “What Calvin actually said” is a different register from “Some notes on Reformed anthropology”. The first is conversational, the second is academic. The heading sets expectations for the paragraphs underneath it.

When we translate headings without seeing them as headings, we tend to render them in the same flat, careful prose register we use for body text. This produces Ukrainian articles where the headings feel weaker than the paragraphs under them, which is the opposite of how the original reads.

Looking at the markup first lets us treat headings as their own kind of sentence. They get a tighter, punchier rendering. The body underneath them then has somewhere to live.

Lists are arguments in a particular shape

The author who writes “Reformed soteriology rests on five claims” and follows it with a numbered list is making a structural argument. The shape of the list, the parallel grammar of each item, the sense that these are equal members of a set, all of that does work in the reader’s head.

When the list collapses into prose, the parallelism goes. Item three becomes a clause inside a longer sentence. Item five drifts into its own paragraph. The reader still receives the five claims but no longer experiences them as a coherent unit.

Translating from the HTML lets us preserve that shape. We render each item as its own line, with parallel grammar in Ukrainian where the English had parallel grammar in English. The argument stays in the shape the author gave it.

Why this matters past the typography

There is a more general point here, and it is not specific to translation. Source material always arrives with structure. The structure is part of the meaning. Whatever process strips it down for convenience also removes information the next step needs.

We have noticed the same pattern in other places: in plain-text scrapes of academic papers where the citation list got flattened into footnotes that no longer connected to a particular claim, in transcripts of sermons where the speaker’s pauses and emphasis disappeared into a wall of grey text, in code snippets that read fine in a syntax-highlighted editor and became unreadable when pasted into a chat window without the colors.

The shape carries the meaning. When the shape is available, we use it. When it is not, we go and find it before we start work, because guessing at structure from inside the prose is much harder than reading the structure off the page.

A small habit, repeatedly

None of this is dramatic. Reading the HTML before translating the prose adds a few seconds to each article. Over thousands of articles it adds up to a noticeable improvement in how the translated version reads, and a meaningful drop in the kinds of mistakes that are hard to catch in review: a flattened emphasis here, a relabeled section there, a list that quietly turned into a paragraph.

The translator’s job is to move meaning across a language boundary without dropping any of it. The markup is part of the meaning. We try not to drop it.