Redefining the Role of Content Models

(Note: this post was originally published in 2019 but became inaccessible due to a technical bug. I am republishing it without alternation.)

Content models have been around for a couple of decades. Until recently, only a handful of people interested in the guts of content management systems cared much about them. Lately, they are gaining more attention in the information architecture and content strategy communities, in part due to Carrie Hane’s and Mike Atherton’s recent book, Designing Connected Content. The growing attention to content models is also revealing how the concept can be interpreted in different ways as content models get mixed into broader discussions about structured content. And ironically, at a time when interest in content models is growing, some of the foundational ideas about content models are aging poorly. I believe the role of content models needs to be redefined so they can better serve changing needs.

This post will cover several issues:

How content models can be poorly designed
Defining content models in terms of components
Criteria for what content should be included in a content model
Editorial benefits of content models

In my previous post, I argued for the benefits of separating 1.) the domain model covering data about entities appearing within content from 2.) the content model governing expressive content. This post will discuss what content models should do. Content models have greater potential than they offer currently. To realize their full potential, content models will require redefining their purpose and construction, taking out pieces that add little value and adding in new pieces that can be useful.

Content Models in Historical Perspective

Bob Boiko’s highly influential Content Management Bible, published in 2002, provides one of the first detailed explanations of content models (p. 843):

“Database developers create data models (for database schema). These models establish how each table in the database is constructed and how it relates to other tables in the database. XML developers create DTDs (or XML Schema). DTDs establish how each element in the XML file is constructed and how it relates to other elements in the file. CMS developers create content models that serve the same function — they establish how each component is constructed and how it relates to the other components in the system.”

Content models haven’t changed much since Boiko wrote that nearly two decades ago. Indeed, many CMSs haven’t changed much either during that time. (Many of today’s popular CMSs date from around the time Boiko wrote his book.) Boiko likens the components of a content model to the tables of a database. Boiko implies through his analogy that a content model is the schema of the content — because CMSs historically have served as a monolithic database for content. While content strategists may be inclined to think about all content comes from the CMS, that’s is no longer entirely true. Two significant developments have eroded the primacy of the CMS: first APIs, and more recently graph databases that store metadata. While very different, both APIs and graph databases allow data or text to be accessed laterally, often with a simple “GET” command, instead of requiring a traversal of a hierarchy of attributes used in XML and the HTML DOM.

APIs allow highly specific information to be pulled in from files located elsewhere, while graphs allow different combinations of attributes to be stitched together as and when they are needed. Both are flexible “just-in-time” ways of getting specific information directly from multiple sources. Even though content may now come from many sources, content models are still designed as if they were tables in a traditional database. The content model is not a picture of what’s in the CMS. The CMS is no longer a single source repository of all content resources. Content models need to evolve.

Content structure does not entirely depend on arranging material into hierarchies. Hierarchies still have a role in content models, but they are often over-emphasized. It’s not so important to represent in a content model how information gets stored, as may have been true in the past. It’s more important to represent what information is needed.

Content models have great potential to support editorial decision making. But existing forms of content models don’t really capture the right elements. They may specify chunks of content that don’t belong in a content model.

How Content Models can be poorly designed

Content models reflect on the expertise and judgments of those designing them. Designers may have sightly different ideas about what a content model represents, and why elements are included in a content model. Content models may capture the wrong elements. They sometimes can include too much structure. When that’s the case, it creates unnecessary work, and sometimes makes the content less flexible.

Many discussions about content models will refer to the relationship between chunks of content or blocks of text. These relationships are likened to the fields used in a database. Such familiar terms avoid wonky jargon. But they can be misleading.

Many content strategists talk about chunks as the building blocks of content models. It’s common to refer to the structuring of content as “chunking” it. It’s accessible as a metaphor — we can think visually about a chunk, like a chunk of a chocolate bar. But the metaphor is still abstract and can evoke different ideas about when and why chunking content is desirable. People don’t agree what a chunk actually is — and they may not even realize they are disagreeing. At least three different perspectives about chunks exist:

Chunks as cohesive units of information — they are independent of any context (the Chunks of Information perspective)
Chunks as discrete segments that audiences consume — they depend on the audience context (The Chunks of Meaning perspective)
Chunks as elements or attributes managed by the CMS — they depend on the IT context (The Chunks of a Database perspective)

Each of these perspectives is valuable, but slightly different. While it is useful to address all these dimensions, it can be hard to optimize for all three together. Each perspective assumes a different rationale for why content is structured.

“Chunks of Information” Perspective

When chunks are considered units of information, it can lead to too much structuring in the content model. Because the chunk is context-independent, it can be loaded down with lots of details just in case they are needed in a certain scenario. Unless specific rules are written for every case when the chunk is displayed, all the information gets displayed to audiences that’s in the chunk. In many cases that’s overkill; audiences only want some of the information. Chunks get overloaded with details (often nested details) — the content model is trying to manage field-level information that belongs in the domain model and that should be managed with metadata vocabularies. Metadata vocabularies allow rich data to be displayed as and when it is needed (see chapter 13 of my new book, No More Silos: Metadata Strategy for Online Publishing). Content models, in contrast, often expose all the data all the time.

Another symptom of too much detail is when chunks get broken out to add completeness. Some content models apply the MECE standard: mutually exclusive, collectively exhaustive. While logically elegant, they make assumptions about what content is actually needed. For example, on a recipe website, each recipe might indicate any allergens associated with the ingredients. That’s potentially useful content. One can filter out recipes that have offending allergens. But it doesn’t follow that each allergen deserves its own profile, indicating all the recipes that contain peanuts, for example.

Sometimes content models add such details because one can, and because it seems like it would be more complete to do so. It can lead to page templates that display content sorted according to minor attributes that deliver little audience or business value. The problem is most noticeable when the content aims to be encyclopedic, presenting all details in any combination. Some content models promote the creation of collections of lists of things that few people are interested in. Content models are most effective when they identify content to pull in where it is actually needed, rather than push out to someplace it might be needed.

“Chunks of Meaning” Perspective

Focusing on audience needs sounds like a better approach to avoid unnecessary structuring. But when editorial needs guide the chunking process, it can lead to another problem: phantom chunks in the content model. These are pieces of content that might look like chunks in a content model. But they don’t behave like those used in a content model.

The concept of chunks are also used in structured authoring, which has a different purpose than content modeling. Segmenting content is a valuable approach to content planning. Segmenting allows content to be prioritized and content to be organized around headings. It can help improve both the authoring and audience experience. But most segmenting is local to the content type being designed. Segments won’t be reused elsewhere, and it doesn’t reuse specific values. Segmenting helps readers understand the whole better. But each segement still depends on the whole to be understood. It’s not truly an independent unit of meaning.

“Chunks of a Database” Perspective

Chunks are also viewed as elements managed by a CMS — the fields of a database. They may be blocks of text (such as paragraphs) or as nested sets of fields (such as an address). But blocks may not be the right unit to represent in a content model. When defined as a block, data (entity values) gets locked into specific presentation. When this data is baked into the content model as a block, the publisher can’t easily show only some of the data if that’s all that’s required.

Nesting makes sense when dependencies between information elements exist. But ideally, the model should present content elements that are independent and that can be used in flexible ways. As mentioned in my previous post, content models can become confusing when they show the properties of entities mentioned in the content as being attributes of the content.

When the focus of a content model is on blocks of text, it can be to the exclusion of other kinds of elements such as photos, links to video or audio, or message snippets. Moreover, only certain kinds of text blocks are likely to be components in a content model. Not all blocks of text get reused. And not all text that gets reused is a block.

Generally, long blocks of text are difficult to reuse. They aren’t likely to vary in regular ways as would short labels. Although it is possible to specify alternative paragraphs to present to audiences, it is not common. The opportunity to use text blocks as components mostly arises when wanting to use the same text block in different content types to support different use cases.

In summary, different people think about chunks different because they are motivated by different goals for structuring content. While all those goals are valid, they are not all relevant to the purpose of content modeling. The purpose of a content model is not to break down content. The purpose of a content model is to enable different elements of content to be pieced together in different ways. If the model breaks the content into too many pieces, or into pieces that can’t be used widely, the model will be difficult to use.

It is easy to break content apart. It is much harder to piece together content elements into a coherent whole. But if done judiciously, content models can provide richer meaning to the content delivered to audiences.

What precisely does a Content Model represent?

Because chunks are considered in different ways, it is necessary to define the elements of a content model more precisely. Like Boiko, I will refer to these elements as components, instead of as attributes or as blocks.

Content models specify content components that can be presented in different ways in different contexts. The components must be managed programmatically by IT systems. Importantly, a content component is not a free-text field, where anything can be entered, but never to be reused. A content model does not present potential relationships between content items. It is not a planning or discovery tool. It should show actual choices that will be available to content creators to present to audiences.

Content components are content variables. If the chunk isn’t a variable, it’s not a content component.

Think about a content variable as a predefined variant of content. If the content is an image of a product, the variants might be images showing different views of the product, or perhaps different sizes for the images. The image of the product is a content component. It is a variable value. People conventionally think about variables as data. They should broaden their thinking. Content variables are any use of content where there’s an option about which version to use, or where to use it.

A content model is useful because it shows what content values are variable. Content values are expressive when they vary in predictable or regular ways.

A chunk is a component only if it satisfies at least one of two criteria:

The component varies in a recurring way, and/or
The component will be reused in more than one context.

Content components can be either local or global variables. Content components are local variables when used in one context only. The component presents alternative variations, or it is optional in different scenarios. Content components are a global variables when they can be used in different contexts.

We can summarize whether a chunk is a content component in a matrix:

CONTEXT	VALUE IS FIXED	VALUE IS VARIABLE
Value is local to one context	Not a component	Component
Value is global: used in more than one context	Component	Component

Content components are the content model’s equivalent of a domain model’s enumerated values. Enumerated values are the list of allowed values in a domain model (sometimes called a controlled vocabulary, or colloquially known as a pick-list value.) Enumerated values are names of attribute choices — the names of colors, sizes, geographic regions, etc. They are small bits of data that can be aggregated, filtered upon, and otherwise managed.

In the case of a content model, the goal is to manage pieces of content rather than pieces of data. Generally, the pieces of content will be larger than the data. The components can be paragraphs or images. These components behave differently from the data in a domain model. One can’t filter on content values (unlike data values). And it will be rare that one aggregates content values. The benefit of a content variable is that one creates rules for when and where to display the component.

Let’s consider the variation in content according to three levels, which I will call repetitive, expressive, and distinctive. These terms are just labels to help us think about how fixed or variable content is. They aren’t meant to be value judgments.

Repetitive content refers to content that is fixed in place. It always says the same thing in the same way in one specific context. The meaning and the style are locked down — there’s no variation. For example, the welcome announcement and jingle for a podcast may always be the same each week, even though the program that follows the intro will be different every week. The welcome announcement is specific to the podcast, and is not used in other kinds of content.

Expressive content refers to how content variation changes the meaning for audiences. It considers variation in the components chosen. Variation can happen within components, and across different content incorporating those components. Expressive content also resembles a term in programming known as expressions, which evaluates values. With expressive content, the key question is knowing what’s the right value to use — choosing the right content variation.

With distinctive content, no two content items are the same. This blog post is distinctive content, because none of the material has been reused, or will be reused. The body of most articles is distinctive content, even if one can segment it into discrete parts.

It’s important to recognize that a content model is just one of the tools that’s available to organize the structuring of content. Other tools include content templates, and text editors.

Let’s focus on the “body field” — the big blob of text in much online content. It can be structured in different ways. Not all editorial structuring involves content components. An article might have a lead paragraph. That paragraph may be guided by specific criteria. It must address who the article is for, and what benefit the article offers the reader. But that lead is specific to the article. It is part of the article’s structure, but not an independent structure that’s used in other contexts.

The same article might have a summary paragraph. Unless the summary is used elsewhere, it is also not a content component. The summary may be standalone content that could be used in other contexts, although I’ve seen plenty of examples of where that’s been done that haven’t been great user experiences.

These segments of an article help readers understand the purpose of the content, and help writers plan the content. But they aren’t part of the content model. Such segmentation belongs in the text editor where the content is created.

Consider a different example of a content chunk. Many corporate press releases have an impact on the price of company shares. Companies routinely put a “forward earnings” disclaimer at the end of each press release. This disclaimer is only applicable to press releases, and the wording is fixed. The disclaimer is not a content component that varies or is used in other contexts. It should be incorporated into the content template for press releases.

KIND OF TEXT	VARIABILITY	WHERE TO SPECIFY OR MANAGE
Repetitive	Consistent text for one context only	Template — hardwired
Expressive	Same text used in multiple contexts or regularly variable text used in at least one context	Content Model
Distinctive	All content is unique: not reused in different contexts	Structured guidelines in text editor

The content model is only one tool of many available to structure content in the broader sense. The content model only addresses variable content components. The content model doesn’t define the entire structure of the content that audiences see. The content model helps support templates, but doesn’t define all the elements in a template or the organization of the wireframes. The structure of the authoring environment may draw on components available in the content model, but it will segment content characteristics that won’t be part of the content model.

What should be included in a Content Model?

Components are meaningful objects. They can change in meaning. They can create new meaning when combined in different ways. They aren’t simply empty containers or placeholders for material to present.

Content models provide guidance for two decisions:

Where can a component be used? — the available contexts
Which variation can be used? — the available variants

The components within a content model can be of three varieties:

Statements
Phrases
Media Assets

Statements

Statements are sentences, paragraphs or sections comprised of several paragraphs. Structurally, they can be sections, asides, call outs, quotes, and so on. Statements will often be long blocks of texts. In some cases there will be variations of the blocks of text for different audience segments or regions. Other times, there will be no variation in the text, but it will appear in more than one context.

An example of how statements can vary in a single context would be if an explanation about customer legal rights changed depending on whether the customer was based in the US or the UK. The substance of the content component changes.

An example of how a statement can be used in multiple contexts is a disclosure about pricing and availability. A publisher may need to include the statement “pricing and availability subject to change” in many different contexts.

Phrases

A content component may be a phrase or managed fragment of text.

Phrasing has become much more important, especially with the rise of UX writing. Some wording is significant, because it relates to a big “moment of truth” for the audience: a decision they need to make, or a notification that’s important to them. Specific phrasing may be continually refined to reflect brand voice and terminology, and to ensure customer acceptance. It may be optimized through A/B testing.

In contrast to variations in statements, which generally relate to differences in meaning or substance, the variation in phrasing generally relates to wording, tone, or style.

Some examples of managed phrases include:

Taglines
Value proposition variations
Labels for forms
Terminology and wording to use in feedback or confirmation messages

The recent emergence of design systems for UX writing is promoting the reuse of text phrases. UX writing design systems can indicate a preferred messaging that conforms to editorial decisions about branding, or that performs better with audiences. Although it is not currently common to do so, such reusable text can be included in the content model.

Phrases may not be managed within a CMS. They could be in an external file that is called to insert the correct phrase. Again, the content model should not be restricted to content managed by the CMS.

Let’s consider how phrases can be content components.

PHRASE TYPE	CONTENT VARIATION?	USED IN MULTIPLE CONTEXTS?
Error messages	No	Yes
CTA	Yes	Yes
Thank you message	No	Yes

Media Assets

Media assets offer a wide range of content variation. Different media assets can present either different substance, or present the same substance using a different style. Different photos might show different entities, or they might present the same entity in different ways.

Because content models have historically been closely identified with text, they have not always represented other forms of content such as media. Media assets are often stored in other systems, and may not be viewed as content variables when authors are focused on text within a text editor. As a result, media assets can sometimes be a second class citizen in a content-first process. As is the case with phrasing, media assets don’t appear in many content models currently, although they should.

Media assets include:

Alternative images
Optional videos
Maps
Content widgets, such as calculators

Let’s consider how media assets can be content components

ASSET TYPE	CONTENT VARIATION?	USED IN MULTIPLE CONTEXTS?
Campaign Logo	No	Yes
Product Images	Yes	Yes
Video Explainer	No	Yes
Welcome Video	Yes	No
Hero Image for Services Pages	Yes	No

Editorial Benefits of a Content Model

Content models help content creators focus on the scenarios in which specific elements of content will be used.

Content models help content designers decide what content is global content that will be used in different contexts. And what content needs to work along side other content.

For authors, content models operationalize prior editorial decisions and save effort. Publishers may already have approved language for a statement. Certain phrasing may already have been tested and optimized, so that text can be reused instead of recreated. Content models provide authors with guidance. They gain an ability to select different options: for example, choose one of these five images that relate to the rest of the content.

Models also offer the possibility of improving the stock of content components that get widely used. Several different phrases could be shown in notifications — providing some variety to messaging around routine tasks. A new phrase could be added to the model and tested. If well received, it could be added to the roster.

Content models enable better content governance. By defining and managing variations in content, communication can be optimized across channels. Content models prevent unplanned variation. They help to unify content resources that may be stored on different systems.

It’s time to elevate the role of content models. Editorial planning involves choosing the right information and presenting it in the right way. Content models can capture variations relating to both substance and style. Content models can do much to support editorial decisions.

—Michael Andrews