Story Needle

This post is motivated by irony and frustration. AI is supposed to make work simpler. But disagreement about what AI does and how it works is more common.

Prospective users of AI technology are bewildered as their capabilities become more elaborate, described by ever more specialized jargon. AI tools invoke a confounding array of metaphors: contexts, arguments, commands, plans, recipes, skills, subagents, and negotiations. There’s no common information architecture defining AI capabilities. Vendors and initiatives propose their own terminologies as they introduce new features and options. I haven’t yet come across a vendor talking about AI wallets, but I won’t be surprised to encounter that term soon.

AI-managed content does represent a new paradigm, but we don’t yet have a shared mental model of how it works.

What is clear is that prior mental models for content management aren’t aligned with the new paradigm.

Evidence of a paradigm shift in content technology

Advanced web content technology has, for over 25 years, been dominated by two paradigms, both focused on automating content. The first, structured content, was introduced by IBM’s development of the XML-based DITA and updated for the API era by a wave of headless CMS vendors ten years later. The second, the Semantic Web, debuted with the development of RDF standards at W3C and was later operationalized by Google’s championing of the schema.org structured data vocabulary for web content. Only a small subset of writers — self-described content engineers — worked with these technical details.

The debut of ChatGPT in 2022 introduced a new paradigm centered on natural-language prompts with LLMs. Language, rather than code, took center stage. Overnight, every writer was directly engaged with the latest content technology.

Momentum has quickly shifted toward natural language technologies: LLMs to generate content and agentic AI to coordinate workflows. IBM long ago dropped DITA, but has lately reinvented itself as an agentic AI powerhouse with soaring stock values. Google has downsized its schema.org operations and reinvented itself as a leader in LLMs and cloud AI.

The vanguard of content technology is now LLMs, not content automation. Two venerable technology giants, pioneers of the older paradigms, have pivoted dramatically (and successfully) to the new paradigm.

How LLMs change content structure and semantics

Computers depend on instructions to do tasks. Text and code are two distinct ways of representing instructions. Because LLMs can understand plain-language instructions, they provide an alternative to computer code to direct computers.

By relying on language, rather than code, to develop content, LLMs are far more human-centric than machine-centric automation. That means that LLMs approach the structure (organization) and semantics (meaning) of a piece of content in a way that’s similar to how humans do. LLMs treat text as knowledge.

The human-centric and the machine-centric approaches to content involve distinct mental models.

A human’s mental model of content involves editorial structure and the meaning of words. Writers develop and draw on resources to help them develop new content. These include examples of other content, templates, style guides, message patterns, and so on. Because LLMs process content as words, they can use these same resources when generating content – but at scale. The foundation is what we will call the “text base.”

The mental model that engineers have when dealing with machines is vastly different, centered on the “code base.” Engineers use a range of tools (databases, code procedures, APIs, etc) to manipulate content. Because these tools don’t understand plain language, the content is translated into machine-interpretable objects, such as a Document Object Model and entities. Engineers’ mental model is to treat content as data.

As LLM-based technologies continue to develop, we see increasing overlap between what LLMs can do and the tasks handled by code-centric technologies. LLMs can generate a document structure based on existing document examples rather than relying on code-based assembly logic. LLMs can recognize the meaning of words without needing to check a schema entity reference.

This means that LLMs are a partial substitute for traditional code. LLMs can assemble text and evaluate words, tasks previously handled by conventional code.

LLMs handle these processes very differently from traditional content automation approaches, which brings both benefits and drawbacks. And while LLMs look at text in ways that are analogous to humans, their processes are quite different.

The most significant difference between humans and LLMs, on the one hand, and content automation, on the other, lies in their scope of action. Humans and LLMs can generate novel content, while traditional content automation can’t. Humans can give LLMs open-ended instructions to create something that isn’t a variation of something past, but a distinctively new output that takes inspiration from many sources and ideas. Traditional content automation can only accept closed-ended instructions that generate routine decisions. Whether this determinism is a virtue depends on the use case.

Different ways of looking at parts and wholes

Humans, LLMs, and traditional code approach content at different levels of granularity. Humans absorb information based on prior knowledge and Gestalt. LLMs simulate these approaches through vector distance. Traditional programming, by contrast, evaluates through decomposition, where each item is assessed within a procedural routine of varying complexity; sometimes routines are short, and at other times they call external routines.

For many years, content professionals have used the term “content chunking,” but this metaphor can gloss over differences. Humans chunk content based on units of recognition: grouped information they perceive and recognize as related. Traditional code encodes structure into content to support computer operations. The encoded content structure may not match the cognitive structure most humans perceive. LLMs also rely on chunking to break text into segments that reveal word context. How LLMs chunk text (by sentence, paragraph, or document, for example) will influence the performance of the LLM.

The mental models of writers and engineers differ in whether they think in terms of narratives or data. Narratives tend to be holistic, while data is atomic. This divergence has implications for how ambiguous the context may be to different parties.

Ambiguity is not an absolute quality that’s always bad, but a relative, contextual issue. A statement could be clear to an insider with contextual knowledge but baffling to an outsider who lacks that knowledge.

Eliminating potential ambiguity can be costly when it results in redundant information that isn’t needed by consuming parties. Determining whether content is unambiguous depends on knowing the audience. Misjudging the audience results in either instructions that are overspecified or underspecified.

Insiders understand information and concepts that outsiders don’t. The relevance of text depends on a determination: Is the content intended for people with insider knowledge, or should it assume that outsiders with no prior knowledge will rely on it?

Narratives often assume prior, insider knowledge. Data also needs context, but it will often be more explicit. The table shows examples of narratives and data that presume insider or outsider knowledge.

	Narrative-focused information	Data-focused information
Insider knowledge: Ambiguous to a random reader, an LLM, or an autonomous IT system	A complex sentence with a referent to something said earlier, or assumed to be understood by the intended reader	Pair values (field name and value) where the reader, LLM, or machines don’t understand the meaning of the field and/or the value (mystery schema).
Contractual statements: Unambiguous to outsiders	A simple declarative sentence with a clear subject, verb, and object, such as many legal contracts.	Structured data based on a declared, referenceable schema and resolvable entities or values

LLMs, like people, may not understand a statement outside of its context. But it’s demanding for everyone, writers and readers alike, to develop and consume context-free text. Legal contracts are tedious because they attempt to define every term. Such documents must stand on their own.

Yet, contrary to some engineers’ beliefs, the root problem is not that language is inherently ambiguous, whereas data is not. Data can also be ambiguous. Machines often can’t understand the structure of content or the meaning of data without an engineer’s oversight. The API era assumed that engineers would write queries directly after reading and understanding API docs. If they couldn’t find the answer in a “read me” doc, they would post a comment or question in GitHub or Slack.

LLMs can’t understand data schemas or interpret the meaning of data fields and values because they lack context about what that information represents. This problem places a greater burden on documentation for APIs, data schemas, IT systems, and protocols. Can LLMs access this information, understand its relevance, and use it to guide how they perform tasks?

Another issue with instructions is precision. Developers assume that code is more precise than language. Just because an instruction is expressed in natural language does not mean it is not deterministic. Highly prescriptive instructions can be written as text, though they are prone to being verbose.

Most recently, agents have emerged as brokers between people and machines. Can they make content work more frictionless?

How agents deal with people and machines

AI agents have cemented the new content technology paradigm by making coding in plain language possible. Using plain words to change outcomes at scale is both exciting and problematic.

Agents answer many problems with the IT Tower of Babel, but also create new ones. They promise to act on our behalf autonomously, making decisions and taking actions. They beg the question: who are they working for?

Some agents are for writers. Yet most agents are for developers and deal with issues that writers should not need to worry about. Writers should be wary of the suggestion that everyone will now become an engineer and be responsible for the debugging process glitches. It’s more likely that agents will elevate some non-writer roles into content contributors.

Even though agents rely on natural language, only a subset of agents handle content management activities such as evaluating content performance, checking quality, and preparing content for distribution. Most agents handle the arcane details of business processes and IT systems that fall outside the scope of content professionals’ responsibilities. When AI engineers talk about “context,” they aren’t necessarily talking about context that relates to content, but rather to business process and IT system context.

Agents support many kinds of goals. It’s best to break down agents by whom they interact and their roles.

Agents act as an intermediary between humans and machines. They can be human-directed, in which an individual specifies what the agent should or should not do, or autonomous, in which the agent makes decisions independently of explicit instructions.

Agents interact with:

humans (humans-to-agents or H2A)
other agents (agents-to-agents or A2A)
machines (agents-to-machines or A2M)

The diagram below illustrates the kinds of interactions.

Agents rely on plain language instructions, but the scope of those instructions varies widely. A key issue is how well the agent is matched with the requisites and responses it receives.

H2A instructions differ from A2M ones. Writers aren’t likely to instruct agents to process files or invoke code routines, but engineers will often develop agents that do those things.

Agents interact in a chain. Writers will craft prompts that become agents and read the agents’ outputs. Agents can have conversations with other agents. They can instruct applications and backend systems to execute tasks, then evaluate and interpret the results, and create a message indicating next steps.

Writers might write a prompt telling an agent to do something general (find the best-performing blog post) and expect the agent to figure out how to do it. Alternatively, the writer might write a prompt that includes a detailed procedure, telling the agent where to access information, the order for tasks, and the criteria to use.

When writers use procedural instructions, they may need to understand the specifics of the options available. Some agents log in to SaaS applications and act as users. In such cases, the writer can base their prompt on the SaaS application’s UI, noting which options they want the agent to access.

But many agents are not acting as proxy users of SaaS applications. They are either coordinating with other agents (A2A) or accessing backend systems and data that have no UI and whose organization isn’t self-evident (A2M). These kinds of agents require developers to create because they depend on opaque knowledge that ordinary users (such as writers) or LLMs can’t discover on their own.

The above agent illustrates how agents can break down a task into subtasks, each of which has dependencies on various data and systems.

Now let’s return to the problem of ambiguity and ignorance of context. Has the writer expressed things clearly? Have they left out important information? Have they included too many prescriptive details that might confuse?

Ambiguity can relate to word meaning, but also to systems’ responsibilities. The context of machines is frequently more ambiguous than confusion over wording. Backend issues are the responsibility of the engineer, not the writer. Agents don’t know how to talk to databases or APIs. They are unaware of protocols (assumptions) or interoperability conformity. They aren’t prepared for various situations. They can’t cope with edge cases.

These failures have little to do with how clearly or precisely writers draft prompts. Rather, they reflect inadequate engineering testing and overambitious automation. Agents are given “skills,” but those skills don’t match the environment.

Agents can fail for multiple reasons. They may crash because they are unable to complete a step. Or they may return the wrong response because their decision-making was flawed. Those decisions may be made using procedural code or LLM “reasoning.”

Agents promise to remove tedious work, but getting them to deliver that work can be stressful.

Having agents perform piecemeal tasks is more likely to succeed than complex, interrelated ones, but piecemeal tasks are less useful.

Giving agents directed tasks may offer more control over decisions but might increase the likelihood of crashes compared with giving agents autonomy to decide how to respond to a request.

What to delegate to agents

Large consultancies and systems integrators imply that AI agents are your new employees and teammates. But it is not obvious what role they have.

Are AI agents like an intern on whom you foist a backlog of non-urgent tasks? Are they like a coach or mentor who can advise you, filling the role you wish your boss did but never has time to?

Agents are a blank slate; organizations must decide how to use them.

Given the intricacies of agents, how much oversight do they need, and when should they be involved?

Who will be delegating work to whom? Do people always task agents, or will agents sometimes task people?

There are no simple answers to these questions, because they involve many variables and are subject to revision as people and bots learn from each other.

It’s useful to look at possibilities through various lenses:

Agents that complete tasks faster
Agents that complete tasks better
Agents that complete tasks more cheaply
Agents that complete tasks that are not immediately relevant

Agents are often faster humans. But not always, if they lack critical information or are poorly guided by prompts. The speed of agents is most noticeable on large procedural tasks that involve many steps or batch actions. Many such tasks are unrewarding to people, who are happy to delegate them to agents. They are considered “low value” because they don’t require special thought, even though they are important.

Many writers hope agents will handle the tedious, time-consuming procedural tasks so they can focus on the important stuff. But agents can play other roles, too.

Another possibility is to use agents to perform tasks that they are better at. A common example is proofreading: while agents can make mistakes, they often detect small errors that would otherwise go unnoticed. Yet agents can also address higher-value tasks. They can make decisions about the best information to incorporate in content or even the most relevant topics to write about. Because they can scan across high volumes of content and information, they can spot opportunities that wouldn’t be apparent to an individual writer.

Even though agents can be better at some tasks, they are not poised to replace the judgment of writers. Yet they can perform many tasks more cheaply than manual work or custom automations.

The cost-effectiveness of agents is a hot topic, as LLM use becomes a noticeable expense in organizations. This issue has brought token cost efficiency into focus.

Token costs are reduced by eliminating verbosity. Shorter prompts and limiting the scope of relevant text to access lessen costs in many cases. But cheaper agents may be less flexible. Over-pruning – removing too much context – can be counterproductive, as agents struggle to match instructions to available resources. Token efficiency involves the balancing of the precision of outcomes, costs, and flexibility.

Finally, agents may take over more content-adjacent tasks. For example, agents are now participating in meetings. Many meetings are time-wasting for content professionals because most of the agenda isn’t directly relevant to their responsibilities. Agents may be surrogates, telling content professionals what they need to know from an all-hands meeting, or provide a 30-second status update for a division-wide project check-in.

Agents don’t have a fixed role

AI agents are moving in many directions. How they will be used will vary according to the organization’s priorities and maturity.

Some will expect the agent’s output to be used to generate content, for example, by retrieving data to be incorporated into a narrative. Others will see agents as inputs to another human-directed process by providing a status message indicating what has changed. Still others seek to use agents to eliminate human involvement in content tasks as much as possible.

Given the diverging expectations for agents, it’s little surprise that content professionals have difficulty forming a clear mental model of how LLMs and agentic AI operate. I hope this discussion helps make those contours more visible.

– Michael Andrews