AI Archives - Page 2 of 6

Relevance is linked to change. Any published content will need changes at some point to stay relevant. Content management practices similarly need to be open to change to remain relevant. The time has come to rethink these practices.

This post examines how the concept of reusing content is changing with the rise of LLMs. I’ll argue that these changes will necessitate a rethinking of practices such as content models and content structuring. It will prod abandoning long-standing “best practices” around single sourcing and content reuse, but will ultimately simplify content development and make content more valuable.

The fragility of single-source reuse and revision

For a long time, the advice about content reuse has been simple: just do it. Reuse your content rather than creating duplicates, and update your content in one place. Approaches such as DITA embodied this philosophy.

Reuse is encapsulated by the slogan “Don’t Repeat Yourself”, known as the DRY principle. Yet its superficial simplicities belie its practical complications.

It’s hard to keep DRY. Many large organizations are unaware of how often they repeat text online because their staff routinely copy and paste previous content when creating new content. Few employees draft content in their CMS, and even fewer bother to locate and use the original version in the repository. DRY content management is more of an ideal than a norm.

Copying and pasting your own content is perfectly legal under copyright law, but it can create problems if the repeated content becomes so inconsistent that various pages present factually conflicting statements. Automation can detect trouble spots. Software can help organizations find where they repeat text within pages or screens. Multiple code libraries can uncover text reuse.

If you must repeat your text, then it makes sense to only have one copy of that text. The intent behind single sourcing is sound. But putting it into practice depends on rigorous planning, processes, infrastructure, and buy-in. The experience of numerous organizations shows it’s a heavy lift. And it’s hard to fault enterprises for having trouble.

In practice, the rationale for single sourcing was never as straightforward as advertised. Content repetition shouldn’t always be consistent, or else you risk forcing a fixed text to fit all circumstances (as if written by a committee) rather than allowing circumstances to guide what wording is needed. Content changes won’t necessarily occur in tandem everywhere; an update won’t be applied everywhere simultaneously.

The DRY attitude presumes you can anticipate the changes you will need to make in the future, which often isn’t realistic. Circumstances are sometimes sloshy and don’t change in an orderly manner. Sometimes you need to create new versions and variations to accommodate diversifying contexts; other times, a shift in context requires content changes so substantial that the revised content has drifted from its original intent.

The false dichotomy of single-source versus single-use content

Single-sourcing practices make an implicit assumption about content reusability. Either the content is intended to be reusable and will be continuously updated, or it will never be reused and therefore has a short shelf life.

Content is sorted into two types: either a single canonical version (single source) or a throwaway (single-use). Non-canonical content has no enduring value.

The philosophy of single-source reuse and revision assumes a logical content model in which content is composed of recurring patterns of distinct pieces. The content can be divided into chunks and strings that can be swapped out as needed. While some technical content is sufficiently formulaic to follow this behavior, most other content is not.

Not all content changes can be reduced to discrete, predictable variables. Frequently, writers seek to reshape existing content rather than merely revise a few words. But the content model, by design, doesn’t allow them to make sweeping changes. It exists to enforce consistency in content and to prevent authors from improvising with it.

The centrality of content modelling has been disrupted by the growing use of LLMs. Many writers question the value of single-sourcing when LLMs can make on-the-fly changes of most any kind.

Content models bind content, while LLMs make content elastic and adaptable. LLMs have exposed the limitations of structured content.

Structured content practices that made sense in the pre-LLM era now look antiquated. Structured content can facilitate simple, discrete changes and updates, but can’t, by itself, address the kinds of sweeping changes that LLMs are capable of.

Exact repetition can be tedious

Reusing content can be beneficial when governance is paramount, such as for legal compliance purposes. Organizations don’t want inconsistent legal disclaimers, for example.

But too often, exact text reuse is lazy and unimaginative, and it hurts the user’s experience with the content. People don’t pay attention when they encounter the same message repeatedly. Airlines struggle to enliven their safety videos because they know passengers tune out of messages they’ve already encountered.

Repeated text is evocatively described as boilerplate. It is standardized text you are not expected to read closely, if at all. It’s faceless text.

The problem arises when you repeat verbatim text that’s meant to express something original or novel, and you expect users to notice it. You treat the text like boilerplate. Doing so can sound like a tiresome advertising jingle.

Reader attention is linked to a source’s credibility. In many situations, readers prefer content that reflects the perspective of a named author over content written by an anonymous corporate body. Readers expect more originality from individual authors.

If people don’t have to read the content, then they need to want to read it. An author who repeats the same content again and again in blog posts won’t gain much traction.

The growth of the subdiscipline of content design in enterprise applications shows that even anonymously created content should embody personality, delight users, and express empathy. Content of all kinds must hold the reader’s attention.

Academic content, like corporate content, is serious business. Academics prioritize originality in their content. Novelty attracts attention. Researchers don’t accrue influence or reputational points for republishing the same material repeatedly. They must publish new material.

There’s a taboo against what’s called “self-plagiarism”: reusing text you’ve already published elsewhere without acknowledging it.

But when is it appropriate to repeat statements made previously? Even the US government is interested in this issue. The National Science Foundation (NSF) funds substantial research and is interested in identifying which content in reports is new versus old. NSF funded the Text Recycling Project to develop guidelines on when it’s appropriate to reuse prior content.

The Text Recycling Project acknowledges that reusing text is sometimes necessary and even desirable, particularly when explaining prior work to provide background for new information. If the exact same text is reused, such as a table from an earlier study, the original source should be cited so that readers understand what’s legacy information and what’s new.

The Text Recycling Project developed a taxonomy of recycling. Its terminology is specific to academic researchers who release materials informally before formal publication, but it is suggestive nonetheless.

What’s worth noting is the distinction drawn between verbatim copying (duplication) and other kinds of reuse involving transformation:

Adaptation (shifting intent)
Generative recycling (reusing facts)
Developmental recycling (releasing in new ways)

Reusing doesn’t have to result in duplication. It can entail making changes, building new items from old ones, and even using prior work as a template for substantially new work.

Exact repetition is not the only, or even the best, way to reuse content.

The emerging paradigm: repeat yourself, but with variation

LLMs have reshaped possibilities. Both writers and readers can now transform existing content in numerous ways.

Since the launch of the generative AI era, authors have had tools to modify content. Now readers are beginning to gain these capabilities, which are providing new insights into what readers want. One such tool is Elicit, which offers generative AI features to help scientists discover published research.

Elicit offers community-developed prompts for users, such as “Explain it Like I’m 14.” That prompt represents a specific intent that diverges from how the content was presented in the original source. LLMs let readers change the intent of a source to reflect their specific interests.

Elicit also offers tools that let readers convert text into other formats or discover related material.

Tools like Elicit show that readers want content in alternative ways from the default presentation, with:

Different focus (broader, narrower, adjacent)
Different pacing (incremental updates)
Different media (diagram, video)

As readers increasingly use such AI tools, the role of editors is called into question. If editors are still needed, they must add value beyond what readers can achieve on their own.

Writers must assume a more editorial role, focused on developing a body of content rather than individual articles. They must actively leverage AI tools to shape the editorial perspective when developing content for readers.

As I have argued previously, content professionals shouldn’t delegate editorial control to third-party AI platforms; they need to take ownership of these tools so that outputs faithfully reflect their organization’s editorial perspective. Unless they do so, AI-generated outputs will be mind-numbingly dull and won’t be read.

AI tools, on their own, don’t create a great editorial experience. They won’t persuade readers to do something they weren’t already intending to do (sorry vendors). AI tools require human oversight by experienced writers to produce outputs that people want to consume.

AI tools for readers provide fresh evidence that existing content, even when of interest, won’t necessarily be in the desired format or rendering. They highlight a gap between what’s available and how readers want to consume the information.

Writers should identify which formats and renderings are in demand and create them for readers, without requiring readers to generate them themselves. Writers must apply their editorial expertise to deliver a superior experience compared to autogenerated outputs based on third-party or community-contributed LLM prompts.

The emerging approach to reusing existing content considers how to capitalize on content that’s proven successful in the past. If certain content is successful with certain groups in specific situations, how can that insight be extended to other groups and situations?

Human oversight remains critical. LLMs have a dubious reputation for being undisciplined. They can commit what’s known as unconscious plagiarism: they aren’t aware that they are lifting and repeating text verbatim. LLMs become a “Plagiarius,” a Latin term meaning “kidnapper” that was historically used to refer to a literary thief. Plagiarism is taking text as-is and claiming it as original.

But suppose writers use LLMs to appropriate existing content in a way that adds value, rather than steals it?

LLMs can help adapt content to be more relevant in two ways. First, they can modify content so that the level of detail and timing are more appropriate to a user’s context. Second, they can adapt discussions to users based on their motivations. Do they have the same goals, parallel goals, or divergent goals compared to an existing editorial framing of a topic? Some readers are experts and want deeper details, while other readers need convincing that a topic is worth paying attention to.

The online news website Axios, which delivers stories in brief, modular form, illustrates how to adapt content to meet users where they are. Many stories are updates of prior ones, but users may not remember these foggy details. Axios uses certain devices to help readers get the context, such as:

Catch up quick – what’s happened in the past (without all the details)?
Flashback – what similar cases happened, and how did they turn out?

LLMs could easily generate “catch up quick” and “flashback” statements from past content.

LLMs change the context of content.

Neither dead nor alive: the value of legacy content

Old content tends to go to the proverbial digital landfill when it’s no longer exactly what users want. But now, the legacy content is appreciating in value, thanks to LLMs.

Managers of digital content should borrow (or steal) insights from the “circular economy” practices that are gaining adoption for physical products. A great overview is in a recent MIT Sloan Review article, “A New Method for Assessing Circular Business Cases” (paywall).

Ordinarily, product manufacturers approached products linearly. Companies realized value only once, when products were made and sold. Once sold, the product was used and disposed of without the company’s involvement. The consumer decided when the product was worn out and no longer useful.

A circular approach to products considers alternative pathways whereby products can have second lives, yielding new revenue streams for manufacturers. A circular model identifies new opportunities for existing products through:

Sharing (enabling more parties to use the product)
Repairing (fixing a weakness in a product)
Recycling (making a new product from old ones)
Remanufacture (rebuilding an existing product)
Regeneration (breaking down an old product into raw source material usable for alternative goals)

Products can have multiple lives after they are first made. But to take advantage of these, the producer must plan ahead.

Content managers can draw inspiration from these new lifecycle management techniques.

The archives as IP

Content professionals must shake off the perception that content has little or no value after publication. Executives see content as an expense, not an asset. Unlike branding assets, published content doesn’t have intangible value, say the accountants. Although copyrightable, online content isn’t considered intellectual property that lawyers consider worth legally defending. Predatory web scraping is rarely challenged in court.

Why retain old content if it has no value after its publication? Even many content professionals have been unable to answer that question satisfactorily, which is why few organizations have a serious process for archiving their content after it is taken offline.

Part of content’s lowly status relates to its quick depreciation. Content’s relevance decays with age. The typical half-life of digital content (the point at which the content loses half its value) can be anywhere from a week (for an announcement) to a year (for an “evergreen” topic). The content doesn’t necessarily become inaccurate. It simply becomes less relevant as user priorities and contextual environment change. Fewer users access and consume the content.

The equivocal status of content value has created another false dichotomy: between live content (currently online) and dead content (content taken offline).

LLMs have blurred the distinction between living and dead content. LLMs can easily revise content, sometimes radically, and bring dead content back to life. Old content can now be reused in ways not originally intended. LLMs can address one of the chief reasons old content loses value: its relevance. Old content can gain new relevancy.

Legacy content can take on new purposes by reaching new audiences, incorporating new developments, or supporting new initiatives.

LLMs change how content can be transformed compared to older tactics, such as “content repurposing”, a mundane content marketing tactic to amplify a piece’s reach. For example, a marketing team might hold a webinar, then create video clips from it to use in social media posts or embed in a blog post summary of the webinar. Such activities don’t really add value to the existing content; they simply spread existing value elsewhere.

The value of LLMs is not their ability to reduce the busywork of making variations and spraying content everywhere. Rather, LLMs are radical because they can tap the latent value from old content to produce something new.

Professionals who design physical objects, such as clothing or furniture, draw on design archives of past works for inspiration for developing new products. Disney and other film studios draw on their film archives when developing new film releases. Similarly, content strategists will draw on digital content archives to generate new content offerings.

Deciding when legacy content has value

The mission of content strategists will be to determine which legacy content may be relevant to users in the future. To do so, they need to rethink how content is valued.

Currently, content gets evaluated based on its external value. Are users reading the content? If not, the content is purged.

In the future, content will be evaluated based on its internal value to the organization. The current content may not be relevant to users as it is, but it could be used to create relevant content later. Just because the content has lost its current relevance doesn’t imply it won’t be useful later. Outdated content may retain internal value even after losing external value.

Not all old content will have future value. The legacy content must be unique. Duplicative content won’t help LLMs. Some legacy content may be irrelevant to the organization’s future mission.

The decision will center on what to purge (content with a single-use) versus what to archive (content with recycling potential).

Legacy content can hold different kinds of value. It may have editorial value. While the factual details are no longer relevant, the narrative framing of the content is powerful and can be applied to other topics. The content might have been highly successful at introducing a new topic to someone inclined to be skeptical of the idea. While the product featured might no longer be offered, the approach would be relevant for other products.

Another example is a complex explanatory graphic that was successful in promoting understanding of a topic, but whose details are no longer current.

In these cases, generative AI can remove irrelevant details and enable the reuse of the editorial structure.

A different situation is when the editorial content is no longer needed, but the informational details are. LLMs can extract information from legacy content, making it available to incorporate into future content.

Generative AI encompasses more than text-oriented LLMs. Visual Language Models can extract information from tables, graphics, photos, and PDFs. Tools such as LandingAI (shown above) can identify implicit editorial structure through layout and textual cues.

Generative AI can be used to modify existing content to maintain its relevance. But more significantly, it can extend the relevance of legacy content through regeneration. It allows legacy content to be adapted and repurposed.

Legacy content can serve as the organization’s institutional memory, providing examples of past efforts that can be leveraged in the future.

Rethinking the role of content models

I’ve long been an advocate of structured content, especially headless content management approaches. Yet my thinking has evolved in light of the radical changes occurring outside the parochial world of content management.

I’ve concluded that long-cherished ideas about content models must change, because the realities they are meant to address have changed. Best practices have a shelf life too.

LLMs have made “unstructured content” more valuable, and in so doing, have made structured content less valuable. Long-established distinctions between structured and unstructured content are becoming less meaningful.

Structured content can no longer be regarded as the preferred solution for managing content in the LLM era. It might still play a tactical, supporting role, but it is no longer the all-in-one solution as it was positioned before the arrival of LLMs.

Structured content has historically been sold on the promise that it would reduce authors’ work. A single author could output multiple versions and formats from a common file, a task otherwise impossible to perform without structured content. Yet that benefit is no longer compelling for two reasons.

First, authors’ experiences with structured content reveal that the approach creates extra work for writers even when it reduces other tasks, and many times, the burdens of authoring in structured content outweigh its benefits. For example, many technical communicators, who have been the primary targets of structured authoring, have abandoned it in favor of a docs-as-code approach. Structured content buyer’s remorse is a thing.

Second, LLMs have disrupted structured content’s monopoly on the complex assembly of content. Rather than relying on nested XSLT transformations or complicated GraphQL queries, LLMs can perform complex content transformations using plain-language prompts. Computer code is an awkward tool to shape narrative text. Often, written directives are more transformative than encoded software rules.

The adoption of structured content as the foundation of enterprise content management hit a wall because it overemphasized databases. Authors don’t think about creating content in terms of databases – they’re instead bewildered when content is divided into fragments, because they can’t see how the fragments fit together. They think imperatively using words, which LLMs offer.

LLMs are changing how content is assembled and are decreasing reliance on a database of fragments to generate coherent output for readers. LLMs don’t draw a distinction between content and code; for them, it’s all just strings of text. They can write, format, and assemble text.

Content models have, at many times, been idealized and granted superpowers they’ve never had. Content models don’t represent the real world of people, things, and actions. They are not ontologies that describe the physical world conceptually. They are merely tools to help manage content, and their importance is now diminishing.

The value of content models does not derive from simplifying content authoring. Structured content has always been challenging – even baffling – for authors, despite its advantages. Writing with a database is confusing. Their effort-saving benefits can be outweighed by the time costs of learning and oversight.

At the same time, content models, by splitting content into modules with specific roles, can hard-code content intent, making it difficult to pivot to other purposes. Too much structure can inhibit transformative generation. Complex, token-heavy content models can be a barrier to LLMs performing transformations. LLMs are trained on web articles and prefer working with such outputs.

As LLMs take over more authoring tasks, the role of the CMS is likely to change. It will continue to store content for API-based delivery to websites and other channels. But CMSs won’t necessarily be where content is drafted or composed. Archived legacy content that LLMs might access could be stored separately, perhaps in a RAG database, and made available to the authoring interface. Customer-facing chatbots would also need access to a RAG database of content curated to work with customer-oriented prompts. The authoring interface could be something akin to Claude Code, where agents can pull resources from other systems as needed. Controlled content and dynamic variables that require structured data management may be stored elsewhere, such as in a graph database like Neo4J. Agents will drive the orchestration of content, negotiating between prompts and code.

The future of content management is likely to be a hybrid mix of systems rather than a single CMS. It’s hard to speculate with any certainty what it will look like, given the rapid changes underway in technology. The biggest unknown is how quickly LLM content generation improves in terms of speed, cost, and accuracy. There have been significant improvements in all these dimensions over the past year.

How much content can realistically be generated on demand, and how much will need to be pre-generated with author oversight? More content will be low-touch (generated without human oversight), but high-touch content (needing editorial oversight) will be important for content addressing high-stakes circumstances.

If LLMs are taking over more responsibility for generating content, how does this affect content models? LLMs can generate content flexibly, but struggle to do so consistently. When consistency is needed, LLMs perform better when combined with a database.

The new purpose of the content model will be to store variables required for LLM-generated content.

Content models will be viewed from the perspective of output delivery rather than their original purpose of authoring. Publishers will focus on which elements they must control. These might be elements with special accuracy requirements (such as numeric values like prices) or granular details that, for business reasons, must be optimized.

Content isn’t data in most cases. Content is like data only when data values are displayed within narratives (the customer name is inserted into a terms of service agreement), or when narratives are counted like data (the number of times a disclaimer appears is counted and compared to the number of times it is supposed to appear).

Similarly, composing content generally isn’t algorithmic in the sense of following formal decision trees. Compositional choices are more often based on subjective choices about what would appeal most to readers.

Only certain kinds of content need to be part of a content model. Content structuring is required under two conditions.

First, the content wording must be invariant, meaning it must be treated like data.

Second, the assembly of the content must be deterministic. The content shown depends on encoded rules rather than on instructional guidance. For example, a rule might exist that the statement “free shipping” does not appear if the order total is less than $40. The message’s assembly is guided by if-then code.

Instead of structuring everything in one’s content, the goal is to structure only what’s necessary. These will be elements that authors rarely need to touch because they are largely fixed.

The short list of structured content elements would include:

Data variables (allowed alternatives or dynamic values)
Fixed phrasing (required wording or allowed alternatives)
Templated boilerplate content (background – rather than foreground – content used to frame the essential information, such as explanations on how to understand a table of information)
Combinational elements (chunks that could be used in different sequences)

Some of the information stored within the content model will be details extracted by LLMs from legacy content.

The promise of change

We can no longer divide content into single-source or single-use, live or dead, structured or unstructured.

The developments I’ve outlined are already happening, though sometimes in isolation from one another and at different speeds. I’m also aware that other parties are approaching content management differently, seeking a more all-in-one solution that fuses various AI technologies into a unified system, though I’m sceptical about the widespread adoption of this approach. What’s easiest to implement from a corporate IT and employee learning perspective will be what gets adopted. Whether it is a perfect system is far less important.

The changes underway will take years to be in widespread practice, but will nonetheless be disruptive, not evolutionary, as some believe. It’s important for content professionals to look beyond their immediate domain, because broader technology developments will determine future content practices more than the decisions of content management vendors.

The future I’ve sketched entails an ecosystem that is more architecturally complex than a single CMS would be. But content professionals should not have to worry about where information is stored – that will be the agent’s responsibility. They gain the freedom to transform different dimensions of content, from broad ideas previously used that they want to rework to precise data that must be included exactly.

The emerging approach removes a major obstacle of structured content: the need to determine in advance which content will be reused. The prominent role of LLMs will give authors more flexibility and control over how to shape new content.

— Michael Andrews