Markup is supposed to make content better. So why does it frequently make content worse?
Markup helps computers know how to render text in a user interface. Without markup, text is plain — a string of characters. That’s fine for simple communications, but plain text can’t express more complex ideas.
Syntax enables words to become meaningful content. Markup is syntax for computers. But computer syntax is far different from the syntax that writers and readers use.
HTML is the universal markup language for the web. Markdown is positioned as a light-weight alternative to HTML that’s used by some writing apps and publishing systems. Some content developers treat Markdown as hybrid syntax that offers a common language for both humans and machines, a sort of “singularity” for text communication. Sadly, there’s no language that is equally meaningful for both humans and machines. If humans and machines must use the same syntax, both need to make compromises and will encounter unanticipated outcomes.
Markup is a cognitive tax. Code mixed into text interferes with the meaning of the writer’s words, which is why no one writes articles directly in HTML. Text decorated with markup is hard for writers and editors to read. It distracts from what the text is saying by enveloping words with additional characters that are neither words or punctuation. When writers need to insert markup in their text, they are likely to make mistakes that cause the markup to be difficult for computers to read as well.
Each morning, while browsing my iPad, I see the problems that markup creates for authors and for readers. They appear in articles in Apple News, crisply presented in tidy containers.
Apple News publishes text content using a subset of either HTML or Markdown. Apple cautions that authors need to make sure that any included markup is syntactically correct:
“Punctuation Is Critical. Incorrect punctuation in your article.json file—even a misplaced comma or a curly quotation mark instead of a straight quote—will generate an error when you try to preview your article.”
That’s the trouble with markup — it depends heavily on its placement. A missing space or an extra one can spell trouble. Developers understand this, but authors won’t expect that the formatting of their writing to present problems in the third decade of the 21st century. They’ve heard that AI will soon replace writers. Surely computers are smart enough to format written text correctly.
Often, markup triggers a collision between computer syntax for text and computer syntax for code. This is especially the case for reserved characters: specific characters that a computer program has decided that it gets to use and that will have priority over any other uses of that character. Computer code and written prose also use some of the same punctuation symbols to indicate meaning. But the intents associated with these punctuation marks are not the same.
Consider the asterisk. It can act like a footnote in text. In computer code, it might signal a function. In Markdown, it can be a bullet or signify the bolding of text. In example below, we see two asterisks around the letter “f”. The author’s goal isn’t clear, but it would appear these were intended to bold the letter, except that an extra space prevented the bolding.
If there was any symbol that logically should be standardized in meaning and use, it would be the quotation mark. After all, quotation marks indicate that the text within them is unmodified or should not be modified. But there are various conventions for expressing quotations using different characters. Among machines and people there’s no agreement about how to express quotation marks and what precisely they convey.
A highly visible failure occurs when quoted text is disrupted. Text in quotes is supposed to be important. The example below attempts to insert quotation marks around a phrase, but instead the Unicode for single quotes are rendered.
Here’s another example of quotes. The author has tried to tell the code that these quotes are meant to be displayed. But the backslash escape characters show in addition to the quote characters. A quotation mark is not a character to escape in Markdown. I see this problem repeatedly with Reuters posts on Apple News.
This example has quotes mixed with apostrophes, and possibly an en dash — all being rendered as “ȃ”. The code is confused about what is intended, as is the reader.
Here’s a mystery: “null” starts a new paragraph. Maybe some Javascript code was looking for something it didn’t find. Because the Null follows a link that ends with a quote, it seems likely that part of the confusion was generated by how the link was encoded.
Here’s another example of link trouble. The intro is botched because the Markdown is incorrectly coded. The author couldn’t figure out where to indicate italics while presenting linked text, and tried too hard.
Takeaways
All these examples come from paid media created by professional staff. If publishers dependent on revenues can make these kinds of mistakes, it seems likely such mistakes are even more common among people in enterprises who write web content on a less frequent basis.
Authors shouldn’t have to deal with markup. Don’t assume that any kind of markup is simple for authors. Some folks argue that Markdown is the answer to the complexity of markup. They believe Markdown democratizes markup: it is so easy anyone can use it correctly. Markdown may appear less complex than HTML but that doesn’t mean it isn’t complex. It hides its complexity by using familiar-looking devices such as spaces and punctuation in highly rigid ways.
If authors are in a position to mess up the markup, they probably will. Some formatting scenarios can be complex, requiring an understanding of how the markup prioritizes different characters and how code such as Javascript expects strings. For example, displaying an asterisk in Markdown requires that it be escaped twice with two backslashes in Apple News. That’s not the sort of detail an author on a deadline should need to worry about.
— Michael Andrews