Month: February 2019

Redefining the Role of Content Models

Post author By Michael Andrews
Post date February 28, 2019

(Note: this post was originally published in 2019 but became inaccessible due to a technical bug. I am republishing it without alternation.)

Content models have been around for a couple of decades. Until recently, only a handful of people interested in the guts of content management systems cared much about them. Lately, they are gaining more attention in the information architecture and content strategy communities, in part due to Carrie Hane’s and Mike Atherton’s recent book, Designing Connected Content. The growing attention to content models is also revealing how the concept can be interpreted in different ways as content models get mixed into broader discussions about structured content. And ironically, at a time when interest in content models is growing, some of the foundational ideas about content models are aging poorly. I believe the role of content models needs to be redefined so they can better serve changing needs.

This post will cover several issues:

How content models can be poorly designed
Defining content models in terms of components
Criteria for what content should be included in a content model
Editorial benefits of content models

In my previous post, I argued for the benefits of separating 1.) the domain model covering data about entities appearing within content from 2.) the content model governing expressive content. This post will discuss what content models should do. Content models have greater potential than they offer currently. To realize their full potential, content models will require redefining their purpose and construction, taking out pieces that add little value and adding in new pieces that can be useful.

Content Models in Historical Perspective

Bob Boiko’s highly influential Content Management Bible, published in 2002, provides one of the first detailed explanations of content models (p. 843):

“Database developers create data models (for database schema). These models establish how each table in the database is constructed and how it relates to other tables in the database. XML developers create DTDs (or XML Schema). DTDs establish how each element in the XML file is constructed and how it relates to other elements in the file. CMS developers create content models that serve the same function — they establish how each component is constructed and how it relates to the other components in the system.”

Content models haven’t changed much since Boiko wrote that nearly two decades ago. Indeed, many CMSs haven’t changed much either during that time. (Many of today’s popular CMSs date from around the time Boiko wrote his book.) Boiko likens the components of a content model to the tables of a database. Boiko implies through his analogy that a content model is the schema of the content — because CMSs historically have served as a monolithic database for content. While content strategists may be inclined to think about all content comes from the CMS, that’s is no longer entirely true. Two significant developments have eroded the primacy of the CMS: first APIs, and more recently graph databases that store metadata. While very different, both APIs and graph databases allow data or text to be accessed laterally, often with a simple “GET” command, instead of requiring a traversal of a hierarchy of attributes used in XML and the HTML DOM.

APIs allow highly specific information to be pulled in from files located elsewhere, while graphs allow different combinations of attributes to be stitched together as and when they are needed. Both are flexible “just-in-time” ways of getting specific information directly from multiple sources. Even though content may now come from many sources, content models are still designed as if they were tables in a traditional database. The content model is not a picture of what’s in the CMS. The CMS is no longer a single source repository of all content resources. Content models need to evolve.

Content structure does not entirely depend on arranging material into hierarchies. Hierarchies still have a role in content models, but they are often over-emphasized. It’s not so important to represent in a content model how information gets stored, as may have been true in the past. It’s more important to represent what information is needed.

Content models have great potential to support editorial decision making. But existing forms of content models don’t really capture the right elements. They may specify chunks of content that don’t belong in a content model.

How Content Models can be poorly designed

Content models reflect on the expertise and judgments of those designing them. Designers may have sightly different ideas about what a content model represents, and why elements are included in a content model. Content models may capture the wrong elements. They sometimes can include too much structure. When that’s the case, it creates unnecessary work, and sometimes makes the content less flexible.

Many discussions about content models will refer to the relationship between chunks of content or blocks of text. These relationships are likened to the fields used in a database. Such familiar terms avoid wonky jargon. But they can be misleading.

Many content strategists talk about chunks as the building blocks of content models. It’s common to refer to the structuring of content as “chunking” it. It’s accessible as a metaphor — we can think visually about a chunk, like a chunk of a chocolate bar. But the metaphor is still abstract and can evoke different ideas about when and why chunking content is desirable. People don’t agree what a chunk actually is — and they may not even realize they are disagreeing. At least three different perspectives about chunks exist:

Chunks as cohesive units of information — they are independent of any context (the Chunks of Information perspective)
Chunks as discrete segments that audiences consume — they depend on the audience context (The Chunks of Meaning perspective)
Chunks as elements or attributes managed by the CMS — they depend on the IT context (The Chunks of a Database perspective)

Each of these perspectives is valuable, but slightly different. While it is useful to address all these dimensions, it can be hard to optimize for all three together. Each perspective assumes a different rationale for why content is structured.

“Chunks of Information” Perspective

When chunks are considered units of information, it can lead to too much structuring in the content model. Because the chunk is context-independent, it can be loaded down with lots of details just in case they are needed in a certain scenario. Unless specific rules are written for every case when the chunk is displayed, all the information gets displayed to audiences that’s in the chunk. In many cases that’s overkill; audiences only want some of the information. Chunks get overloaded with details (often nested details) — the content model is trying to manage field-level information that belongs in the domain model and that should be managed with metadata vocabularies. Metadata vocabularies allow rich data to be displayed as and when it is needed (see chapter 13 of my new book, No More Silos: Metadata Strategy for Online Publishing). Content models, in contrast, often expose all the data all the time.

Another symptom of too much detail is when chunks get broken out to add completeness. Some content models apply the MECE standard: mutually exclusive, collectively exhaustive. While logically elegant, they make assumptions about what content is actually needed. For example, on a recipe website, each recipe might indicate any allergens associated with the ingredients. That’s potentially useful content. One can filter out recipes that have offending allergens. But it doesn’t follow that each allergen deserves its own profile, indicating all the recipes that contain peanuts, for example.

Sometimes content models add such details because one can, and because it seems like it would be more complete to do so. It can lead to page templates that display content sorted according to minor attributes that deliver little audience or business value. The problem is most noticeable when the content aims to be encyclopedic, presenting all details in any combination. Some content models promote the creation of collections of lists of things that few people are interested in. Content models are most effective when they identify content to pull in where it is actually needed, rather than push out to someplace it might be needed.

“Chunks of Meaning” Perspective

Focusing on audience needs sounds like a better approach to avoid unnecessary structuring. But when editorial needs guide the chunking process, it can lead to another problem: phantom chunks in the content model. These are pieces of content that might look like chunks in a content model. But they don’t behave like those used in a content model.

The concept of chunks are also used in structured authoring, which has a different purpose than content modeling. Segmenting content is a valuable approach to content planning. Segmenting allows content to be prioritized and content to be organized around headings. It can help improve both the authoring and audience experience. But most segmenting is local to the content type being designed. Segments won’t be reused elsewhere, and it doesn’t reuse specific values. Segmenting helps readers understand the whole better. But each segement still depends on the whole to be understood. It’s not truly an independent unit of meaning.

“Chunks of a Database” Perspective

Chunks are also viewed as elements managed by a CMS — the fields of a database. They may be blocks of text (such as paragraphs) or as nested sets of fields (such as an address). But blocks may not be the right unit to represent in a content model. When defined as a block, data (entity values) gets locked into specific presentation. When this data is baked into the content model as a block, the publisher can’t easily show only some of the data if that’s all that’s required.

Nesting makes sense when dependencies between information elements exist. But ideally, the model should present content elements that are independent and that can be used in flexible ways. As mentioned in my previous post, content models can become confusing when they show the properties of entities mentioned in the content as being attributes of the content.

When the focus of a content model is on blocks of text, it can be to the exclusion of other kinds of elements such as photos, links to video or audio, or message snippets. Moreover, only certain kinds of text blocks are likely to be components in a content model. Not all blocks of text get reused. And not all text that gets reused is a block.

Generally, long blocks of text are difficult to reuse. They aren’t likely to vary in regular ways as would short labels. Although it is possible to specify alternative paragraphs to present to audiences, it is not common. The opportunity to use text blocks as components mostly arises when wanting to use the same text block in different content types to support different use cases.

In summary, different people think about chunks different because they are motivated by different goals for structuring content. While all those goals are valid, they are not all relevant to the purpose of content modeling. The purpose of a content model is not to break down content. The purpose of a content model is to enable different elements of content to be pieced together in different ways. If the model breaks the content into too many pieces, or into pieces that can’t be used widely, the model will be difficult to use.

It is easy to break content apart. It is much harder to piece together content elements into a coherent whole. But if done judiciously, content models can provide richer meaning to the content delivered to audiences.

What precisely does a Content Model represent?

Because chunks are considered in different ways, it is necessary to define the elements of a content model more precisely. Like Boiko, I will refer to these elements as components, instead of as attributes or as blocks.

Content models specify content components that can be presented in different ways in different contexts. The components must be managed programmatically by IT systems. Importantly, a content component is not a free-text field, where anything can be entered, but never to be reused. A content model does not present potential relationships between content items. It is not a planning or discovery tool. It should show actual choices that will be available to content creators to present to audiences.

Content components are content variables. If the chunk isn’t a variable, it’s not a content component.

Think about a content variable as a predefined variant of content. If the content is an image of a product, the variants might be images showing different views of the product, or perhaps different sizes for the images. The image of the product is a content component. It is a variable value. People conventionally think about variables as data. They should broaden their thinking. Content variables are any use of content where there’s an option about which version to use, or where to use it.

A content model is useful because it shows what content values are variable. Content values are expressive when they vary in predictable or regular ways.

A chunk is a component only if it satisfies at least one of two criteria:

The component varies in a recurring way, and/or
The component will be reused in more than one context.

Content components can be either local or global variables. Content components are local variables when used in one context only. The component presents alternative variations, or it is optional in different scenarios. Content components are a global variables when they can be used in different contexts.

We can summarize whether a chunk is a content component in a matrix:

CONTEXT	VALUE IS FIXED	VALUE IS VARIABLE
Value is local to one context	Not a component	Component
Value is global: used in more than one context	Component	Component

Content components are the content model’s equivalent of a domain model’s enumerated values. Enumerated values are the list of allowed values in a domain model (sometimes called a controlled vocabulary, or colloquially known as a pick-list value.) Enumerated values are names of attribute choices — the names of colors, sizes, geographic regions, etc. They are small bits of data that can be aggregated, filtered upon, and otherwise managed.

In the case of a content model, the goal is to manage pieces of content rather than pieces of data. Generally, the pieces of content will be larger than the data. The components can be paragraphs or images. These components behave differently from the data in a domain model. One can’t filter on content values (unlike data values). And it will be rare that one aggregates content values. The benefit of a content variable is that one creates rules for when and where to display the component.

Let’s consider the variation in content according to three levels, which I will call repetitive, expressive, and distinctive. These terms are just labels to help us think about how fixed or variable content is. They aren’t meant to be value judgments.

Repetitive content refers to content that is fixed in place. It always says the same thing in the same way in one specific context. The meaning and the style are locked down — there’s no variation. For example, the welcome announcement and jingle for a podcast may always be the same each week, even though the program that follows the intro will be different every week. The welcome announcement is specific to the podcast, and is not used in other kinds of content.

Expressive content refers to how content variation changes the meaning for audiences. It considers variation in the components chosen. Variation can happen within components, and across different content incorporating those components. Expressive content also resembles a term in programming known as expressions, which evaluates values. With expressive content, the key question is knowing what’s the right value to use — choosing the right content variation.

With distinctive content, no two content items are the same. This blog post is distinctive content, because none of the material has been reused, or will be reused. The body of most articles is distinctive content, even if one can segment it into discrete parts.

It’s important to recognize that a content model is just one of the tools that’s available to organize the structuring of content. Other tools include content templates, and text editors.

Let’s focus on the “body field” — the big blob of text in much online content. It can be structured in different ways. Not all editorial structuring involves content components. An article might have a lead paragraph. That paragraph may be guided by specific criteria. It must address who the article is for, and what benefit the article offers the reader. But that lead is specific to the article. It is part of the article’s structure, but not an independent structure that’s used in other contexts.

The same article might have a summary paragraph. Unless the summary is used elsewhere, it is also not a content component. The summary may be standalone content that could be used in other contexts, although I’ve seen plenty of examples of where that’s been done that haven’t been great user experiences.

These segments of an article help readers understand the purpose of the content, and help writers plan the content. But they aren’t part of the content model. Such segmentation belongs in the text editor where the content is created.

Consider a different example of a content chunk. Many corporate press releases have an impact on the price of company shares. Companies routinely put a “forward earnings” disclaimer at the end of each press release. This disclaimer is only applicable to press releases, and the wording is fixed. The disclaimer is not a content component that varies or is used in other contexts. It should be incorporated into the content template for press releases.

KIND OF TEXT	VARIABILITY	WHERE TO SPECIFY OR MANAGE
Repetitive	Consistent text for one context only	Template — hardwired
Expressive	Same text used in multiple contexts or regularly variable text used in at least one context	Content Model
Distinctive	All content is unique: not reused in different contexts	Structured guidelines in text editor

The content model is only one tool of many available to structure content in the broader sense. The content model only addresses variable content components. The content model doesn’t define the entire structure of the content that audiences see. The content model helps support templates, but doesn’t define all the elements in a template or the organization of the wireframes. The structure of the authoring environment may draw on components available in the content model, but it will segment content characteristics that won’t be part of the content model.

What should be included in a Content Model?

Components are meaningful objects. They can change in meaning. They can create new meaning when combined in different ways. They aren’t simply empty containers or placeholders for material to present.

Content models provide guidance for two decisions:

Where can a component be used? — the available contexts
Which variation can be used? — the available variants

The components within a content model can be of three varieties:

Statements
Phrases
Media Assets

Statements

Statements are sentences, paragraphs or sections comprised of several paragraphs. Structurally, they can be sections, asides, call outs, quotes, and so on. Statements will often be long blocks of texts. In some cases there will be variations of the blocks of text for different audience segments or regions. Other times, there will be no variation in the text, but it will appear in more than one context.

An example of how statements can vary in a single context would be if an explanation about customer legal rights changed depending on whether the customer was based in the US or the UK. The substance of the content component changes.

An example of how a statement can be used in multiple contexts is a disclosure about pricing and availability. A publisher may need to include the statement “pricing and availability subject to change” in many different contexts.

Phrases

A content component may be a phrase or managed fragment of text.

Phrasing has become much more important, especially with the rise of UX writing. Some wording is significant, because it relates to a big “moment of truth” for the audience: a decision they need to make, or a notification that’s important to them. Specific phrasing may be continually refined to reflect brand voice and terminology, and to ensure customer acceptance. It may be optimized through A/B testing.

In contrast to variations in statements, which generally relate to differences in meaning or substance, the variation in phrasing generally relates to wording, tone, or style.

Some examples of managed phrases include:

Taglines
Value proposition variations
Labels for forms
Terminology and wording to use in feedback or confirmation messages

The recent emergence of design systems for UX writing is promoting the reuse of text phrases. UX writing design systems can indicate a preferred messaging that conforms to editorial decisions about branding, or that performs better with audiences. Although it is not currently common to do so, such reusable text can be included in the content model.

Phrases may not be managed within a CMS. They could be in an external file that is called to insert the correct phrase. Again, the content model should not be restricted to content managed by the CMS.

Let’s consider how phrases can be content components.

PHRASE TYPE	CONTENT VARIATION?	USED IN MULTIPLE CONTEXTS?
Error messages	No	Yes
CTA	Yes	Yes
Thank you message	No	Yes

Media Assets

Media assets offer a wide range of content variation. Different media assets can present either different substance, or present the same substance using a different style. Different photos might show different entities, or they might present the same entity in different ways.

Because content models have historically been closely identified with text, they have not always represented other forms of content such as media. Media assets are often stored in other systems, and may not be viewed as content variables when authors are focused on text within a text editor. As a result, media assets can sometimes be a second class citizen in a content-first process. As is the case with phrasing, media assets don’t appear in many content models currently, although they should.

Media assets include:

Alternative images
Optional videos
Maps
Content widgets, such as calculators

Let’s consider how media assets can be content components

ASSET TYPE	CONTENT VARIATION?	USED IN MULTIPLE CONTEXTS?
Campaign Logo	No	Yes
Product Images	Yes	Yes
Video Explainer	No	Yes
Welcome Video	Yes	No
Hero Image for Services Pages	Yes	No

Editorial Benefits of a Content Model

Content models help content creators focus on the scenarios in which specific elements of content will be used.

Content models help content designers decide what content is global content that will be used in different contexts. And what content needs to work along side other content.

For authors, content models operationalize prior editorial decisions and save effort. Publishers may already have approved language for a statement. Certain phrasing may already have been tested and optimized, so that text can be reused instead of recreated. Content models provide authors with guidance. They gain an ability to select different options: for example, choose one of these five images that relate to the rest of the content.

Models also offer the possibility of improving the stock of content components that get widely used. Several different phrases could be shown in notifications — providing some variety to messaging around routine tasks. A new phrase could be added to the model and tested. If well received, it could be added to the roster.

Content models enable better content governance. By defining and managing variations in content, communication can be optimized across channels. Content models prevent unplanned variation. They help to unify content resources that may be stored on different systems.

It’s time to elevate the role of content models. Editorial planning involves choosing the right information and presenting it in the right way. Content models can capture variations relating to both substance and style. Content models can do much to support editorial decisions.

—Michael Andrews

Tags content models

Content Engineering

Where Domain Models Meet Content Models

Post author By Michael Andrews
Post date February 14, 2019

A model is supposed to be a simplification of reality. It is meant to help people understand and act. But sometimes models do the opposite and cause confusion. If the model becomes an artifact of its own, it can be hard to see its connection to what it is supposed to represent. Over recent months, several people have raised questions on Twitter about the relationship between a domain model, a content model, and a data model. We also may also encounter the terms ontology or vocabulary, which are also models. With so many models out there, it’s small wonder that people might be confused.

From what I can see, no consensus yet exists on how to blend these perspectives in a way that’s both flexible and easy to understand. I want to offer my thinking about how these different models are related.

Part of the source of confusion is that all these models were developed by different parties to solve different problems. Only recently has the content strategy community started to focus on how integrate the different perspectives offered by these models.

The Different Purposes of Models

Domain models have a geeky pedigree. They come from a software development approach known as domain-driven design (DDD). DDD is an approach to developing applications rather than to publishing content. It’s focused on tasks that a software application must support (behavior), and maps tasks to domain objects that are centered on entities (data). The notion of a domain model was subsequently adopted by ontology engineers (people who design models of information.) Again, these ontology engineers weren’t focused on the needs of web publishers: they just wanted a way to define the relationship between different kinds of information to allow the information to be queried. From these highly technical origins, domain models attracted attention in the content strategy community as a tool to model the relationships of entities that will appear in one’s content. The critical question is, so what? What value does a domain model offer to online publishers? This question can elicit different and sometimes fuzzy answers. I’ll offer my perspective in a moment.

A content model sounds similar to a domain model, but the two are different. A content model is an abstract picture of the elements that content creators must create, which are managed by a CMS. When content strategists talk about structuring content, they are generally referring to the elements that comprise a content model. Where a domain model is concerned with data or facts, a content model is concerned with expressive content — the text, images, videos and other material that audiences consume. Compared with a domain model, a content model is more focused on the experience of audiences. Unsurprisingly, content strategists talk about content models more than they talk about domain models.

Content models can serve two roles: representing what the audience is interested in consuming, and representing how that content is managed. The content model can become confusing when it tries to indicate both what the machine delivering content needs to know about, as well as what the audience needs to see.

Regrettably, the design of CMSs has trained authors to think about content elements in a certain way. Authors decompose text articles into chunks, presented as fields in a CMS. The content model can start to look like a massive form, with many fields available to address different aspects of a topic or theme. Not all fields will display in all scenarios, and fields may be shared across different views of content (hence rules are needed to direct what’s shown when). It may look like a data model. But the content model doesn’t impose strict rules about what types of values are allowed for the fields. The values of some fields are numbers, some are pick list values. Many fields are multiple paragraphs of text representing thousands of characters. Some fields are links to images, audio, or to videos. Some fields may involve values that are phrases, such as the text used on a button. While all these values are “data” in the sense of being ones and zeros, they don’t add up to a robust data model. That’s one reason that many developers consider content as unstructured — the values of content defy any uniformity.

A content model is not a solid foundation for a data model about the content. The structure represented in a content model is not semantic (machine intelligible) — contrary to the beliefs of many content strategists. Creating a content model doesn’t make the content semantic. Structured authoring helps authors plan how different pieces of content can fit together. But author-defined structures don’t mean anything to outside parties, and most machines won’t automatically understand what the chunks of content mean. A content model can inform a schematic of the content’s architecture, such as what content is needed and from where it will be sourced (it could come from other systems, or even external sources). That’s useful for internal purposes. The content model is implemented with custom code.

The primary value of content models is to guide editorial decisions. The content model defines content types — distinct profiles of content that address specific user purposes and goals. A content model can specify many details, such as a short and a long description to accommodate different kinds of devices, or alternative text for different audiences in different regions. A detailed content model can help the content adapt to different contexts.

Domain models are strong where content models are weak. Although domain models did not originally rely on metadata standards (e.g., in DDD), domain models increasingly have become synonymous with metadata vocabularies or ontologies. Domain models define data models: how factual information is stored so it can be accessed. They supply one source of truth for information, in contrast to the many expressive variations represented in a content model. Domain models represent the relationships of the data or information relating to a domain or broad subject area. Domain models can be precise about the kinds of values expected. Precise values are required in order to allow the information to be understood and reused in different contexts by different machines. Because a domain model is based on metadata standards, the information can be used by different parties. Content defined by a content model, in contrast, is primarily of use to the publisher only.

The core value of a domain model is to represent entities — the key things discussed in content. Metadata vocabularies define entity types that provide properties for all the important values that would provide important information. Some entity types will reference other entity types. For example, an event (entity type 1) takes place at a location (entity type 2). The relationships between different entities are already defined by the vocabulary, which reduces the need for the publisher to set up special rules defining these relationships. The domain model can suggest the kinds of information that authors need to include in content delivered to audiences. In addition, the domain model can also support non-editorial uses of the information. For example, it can provide information to a functional app on a smartphone. Or it can provide factual information to bots or to search engines.

The Boundary between Domain and Content Models

What’s the boundary between a domain model and a content model?

A common issue I’ve noticed is that model makers try to use a content type to represent an entity type. Certain CMSs aren’t too clear about the difference between content types and entity types. One must be careful not to let your CMS force you to think in certain ways.

Let’s consider a common topic: events. Some content strategists consider events as a distinct content type. That would seem to imply the content model manages all the information relating to events. But an event is actually an entity type. Metadata standards already define all the common properties associated with an event. There’s little point replicating that information in the content model. The event information may need to travel to many places: to a calendar on someone’s phone, in search results, as well as on the publisher’s website which has a special webpage for events. But how the publisher wants to promote the event could still be productively represented in the content model. The publisher needs to think about editorial elements associated with the event, such as images and calls-to-action.

Event content contains both structured editorial content, as well as structured metadata

The domain model represents what something is, while the content model can represent what is said or how it is said. Let’s return to the all important call-to-action (CTA). A CTA is a user action that is monitored in analytics. The action itself can be represented as metadata — for example, there is a “buy action” in schema.org. Publishers can use metadata to track what products are being bought according to the product’s properties, for example, color. But the text on the buy button is part of the content model. The CTA phrasing can be reused on different buttons. The value of the content model is to facilitate the reuse of expressive content rather than the reuse of information. Content models will change, as different elements gain or lose their mojo when presented to audiences. The elements in a content model can be tested. The domain model, centered on factual information, is far more stable. The values may change, but the entities and properties in the model will rarely change.

When information is structured semantically with metadata standards, a database designed around a domain model can populate information used in content. In such cases, the domain model supports the content model. But in other cases, authors will be creating loosely structured information, such as long narrative texts that discuss information. In these cases, authors can annotate the text to capture the core facts that should be included. The annotation allows these facts to be reused later for different contexts.

Over time, more editorial components are becoming formalized as structured data defined by metadata vocabulary standards. As different publishers face similar needs and borrow from each others’ approaches, the element in the content model becomes a design pattern that’s widely used, and therefore a candidate for standardization. For example, simple how-to instructions can be specified using metadata standards.

The Layered Cake

How domain models can support content models

One simple way to think about the two models is as layers of a cake. The domain model is the base layer. It manages the factual information that’s needed by the content and by machines for applications. The content model is the layer above the domain model. It manages all the relevant content assets (thumbnails, video trailers, diagrams, etc), all the sections of copy (introductions, call outs, quotes, sidebars, etc.) and all the messaging (button text, alternative headlines, etc.) On the top of these layers is the icing on the cake: the presentation layer. The presentation layer is not about the raw ingredients, or how the ingredients are cooked. It’s about how the finished product looks.

The distinctions I’ve made between the domain model and content model may not align with how your content management systems are set up. But such decoupling of data and content is becoming more common. If factual information is kept separate from expressive content, publishers can gain more flexibility when configuring how they deliver content and information to audiences.

— Michael Andrews

Tags content model