Redefining the Role of Content Models

Content models have been around for a couple of decades. Until recently, only a handful of people interested in the guts of content management systems cared much about them.  Lately, they are gaining more attention in the information architecture and content strategy communities, in part due to Carrie Hane’s and Mike Atherton’s recent book, Designing Connected Content.  The growing attention to content models is also revealing how the concept can be interpreted in different ways as content models get mixed into broader discussions about structured content.  And ironically, at a time when interest in content models is growing, some of the foundational ideas about content models are aging poorly.   I believe the role of content models needs to be redefined so they can better serve changing needs.

This post will cover several issues:

  1. How content models can be poorly designed
  2. Defining content models in terms of components
  3. Criteria for what content should be included in a content model
  4. Editorial benefits of content models

In my previous post, I argued for the benefits of separating 1.) the domain model covering data about entities appearing within content from 2.) the content model governing expressive content.  This post will discuss what content models should do. Content models have greater potential than they offer currently.  To realize their full potential, content models will require redefining their purpose and construction, taking out pieces that add little value and adding in new pieces that can be useful.

Content Models in Historical Perspective

Bob Boiko’s highly influential Content Management Bible, published in 2002, provides one of the first detailed explanations of content models (p. 843):

“Database developers create data models (for database schema). These models establish how each table in the database is constructed  and how it relates to other tables in the database.  XML developers create DTDs (or XML Schema).  DTDs establish how each element in the XML file is constructed and how it relates to other elements in the file.  CMS developers create content models that serve the same function — they establish how each component is constructed and how it relates to the other components in the system.”

Content models haven’t changed much since Boiko wrote that nearly two decades ago.  Indeed, many CMSs haven’t changed much either during that time. (Many of today’s popular CMSs date from around the time Boiko wrote his book.)  Boiko likens the components of a content model to the tables of a database.  Boiko implies through his analogy that a content model is the schema of the content —  because CMSs historically have served as a monolithic database for content.  While content strategists may be inclined to think about all content comes from the CMS, that’s is no longer entirely true.  Two significant developments have eroded the primacy of the CMS:  first APIs, and more recently graph databases that store metadata.  While very different, both APIs and graph databases allow data or text to be accessed laterally, often with a simple “GET” command, instead of requiring a traversal of a hierarchy of attributes used in XML and the HTML DOM.  

APIs allow highly specific information to be pulled in from files located elsewhere, while graphs allow different combinations of attributes to be stitched together as and when they are needed.  Both are flexible “just-in-time” ways of getting specific information directly from multiple sources. Even though content may now come from many sources, content models are still designed as if they were tables in a traditional database.  The content model is not a picture of what’s in the CMS.  The CMS is no longer a single source repository of all content resources.  Content models need to evolve.

Content structure does not entirely depend on arranging material into hierarchies.  Hierarchies still have a role in content models, but they are often over-emphasized.  It’s not so important to represent in a content model how information gets stored, as may have been true in the past. It’s more important to represent what information is needed.  

Content models have great potential to support editorial decision making. But existing forms of content models don’t really capture the right elements.  They may specify chunks of content that don’t belong in a content model.  

How Content Models can be poorly designed

Content models reflect on the expertise and judgments of those designing them.  Designers may have sightly different ideas about what a content model represents, and why elements are included in a content model.   Content models may capture the wrong elements.  They sometimes can include too much structure.  When that’s the case, it creates unnecessary work, and sometimes makes the content less flexible.

Many discussions about content models will refer to the relationship between chunks of content or blocks of text.  These relationships are likened to the fields used in a database.  Such familiar terms avoid wonky jargon.  But they can be misleading.   

Photo by Neil Martin from Pexels

Many content strategists talk about chunks as the building blocks of content models.  It’s common to refer to the structuring of content as “chunking” it.  It’s accessible as a metaphor — we can think visually about a chunk, like a chunk of a chocolate bar.  But the metaphor is still abstract and can evoke different ideas about when and why chunking content is desirable.  People don’t agree what a chunk actually is — and they may not even realize they are disagreeing.  At least three different perspectives about chunks exist:

  1. Chunks as cohesive units of information  — they are independent of any context (the Chunks of Information perspective)
  2. Chunks as discrete segments that audiences consume — they depend on the audience context (The Chunks of Meaning perspective)
  3. Chunks as elements or attributes managed by the CMS — they depend on the IT context (The Chunks of a Database perspective)

Each of these perspectives is valuable, but slightly different.  While it is useful to address all these dimensions, it can be hard to optimize for all three together.   Each perspective assumes a different rationale for why content is structured.  

“Chunks of Information” Perspective 

When chunks are considered units of information, it can lead to too much structuring in the content model.   Because the chunk is context-independent, it can be loaded down with lots of details just in case they are needed in a certain scenario.  Unless specific rules are written for every case when the chunk is displayed, all the information gets displayed to audiences that’s in the chunk.  In many cases that’s overkill; audiences only want some of the information.  Chunks get overloaded with details (often nested details) — the content model is trying to manage field-level information that belongs in the domain model and that should be managed with metadata vocabularies.  Metadata vocabularies allow rich data to be displayed as and when it is needed (see chapter 13 of my new book, No More Silos: Metadata Strategy for Online Publishers).  Content models, in contrast, often expose all the data all the time.  

Another symptom of too much detail is when chunks get broken out to add completeness.  Some content models apply the MECE standard: mutually exclusive, collectively exhaustive.  While logically elegant, they make assumptions about what content is actually needed.  For example, on a recipe website, each recipe might indicate any allergens associated with the ingredients.  That’s potentially useful content. One can filter out recipes that have offending allergens.  But it doesn’t follow that each allergen deserves its own profile, indicating all the recipes that contain peanuts, for example.  

Sometimes content models add such details because one can, and because it seems like it would be more complete to do so.  It can lead to page templates that display content sorted according to minor attributes that deliver little audience or business value.  The problem is most noticeable when the content aims to be encyclopedic, presenting all details in any combination.  Some content models promote the creation of collections of lists of things that few people are interested in.  Content models are most effective when they identify content to pull in where it is actually needed, rather than push out to someplace it might be needed.

“Chunks of Meaning” Perspective 

Focusing on audience needs sounds like a better approach to avoid unnecessary structuring.  But when editorial needs guide the chunking process, it can lead to another problem: phantom chunks in the content model.  These are pieces of content that might look like chunks in a content model.  But they don’t behave like those used in a content model.

The concept of chunks are also used in structured authoring, which has a different purpose than content modeling.  Segmenting content is a valuable approach to content planning.  Segmenting allows content to be prioritized and content to be organized around headings.  It can help improve both the authoring and audience experience.  But most segmenting is local to the content type being designed. Segments won’t be reused elsewhere, and it doesn’t reuse specific values.  Segmenting helps readers understand the whole better.  But each segement still depends on the whole to be understood.  It’s not truly an independent unit of meaning.

“Chunks of a Database” Perspective

Chunks are also viewed as elements managed by a CMS — the fields of a database.  They may be blocks of text (such as paragraphs) or as nested sets of fields (such as an address).  But blocks may not be the right unit to represent in a content model.  When defined as a block, data (entity values) gets locked into specific presentation.  When this data is baked into the content model as a block, the publisher can’t easily show only some of the data if that’s all that’s required.  

Nesting makes sense when dependencies between information elements exist.  But ideally, the model should present content elements that are independent and that can be used in flexible ways.  As mentioned in my previous post, content models can become confusing when they show the properties of entities mentioned in the content as being attributes of the content.  

When the focus of a content model is on blocks of text, it can be to the exclusion of other kinds of elements such as photos, links to video or audio, or message snippets.  Moreover, only certain kinds of text blocks are likely to be components in a content model.  Not all blocks of text get reused.  And not all text that gets reused is a block.  

Generally, long blocks of text are difficult to reuse.  They aren’t likely to vary in regular ways as would short labels.  Although it is possible to specify alternative paragraphs to present to audiences, it is not common.  The opportunity to use text blocks as components mostly arises when wanting to use the same text block in different content types to support different use cases.  

In summary, different people think about chunks different because they are motivated by different goals for structuring content.  While all those goals are valid, they are not all relevant to the purpose of content modeling.  The purpose of a content model is not to break down content. The purpose of a content model is to enable different elements of content to be pieced together in different ways.  If the model breaks the content into too many pieces, or into pieces that can’t be used widely, the model will be difficult to use.  

It is easy to break content apart.  It is much harder to piece together content elements into a coherent whole.  But if done judiciously, content models can provide richer meaning to the content delivered to audiences.  

What precisely does a Content Model represent?

Because chunks are considered in different ways, it is necessary to define the elements of a content model more precisely.  Like Boiko, I will refer to these elements as components, instead of as attributes or as blocks.    

Content models specify content components that can be presented in different ways in different contexts.  The components must be managed programmatically by IT systems.  Importantly, a content component is not a free-text field, where anything can be entered, but never to be reused.  A content model does not present potential relationships between content items. It is not a planning or discovery tool.   It should show actual choices that will be available to content creators to present to audiences.

Content components are content variables.   If the chunk isn’t a variable, it’s not a content component. 

Think about a content variable as a predefined variant of content.  If the content is an image of a product, the variants might be images showing different views of the product, or perhaps different sizes for the images.  The image of the product is a content component.  It is a variable value.  People conventionally think about variables as data. They should broaden  their thinking.  Content variables are any use of content where there’s an option about which version to use, or where to use it.  

A content model is useful because it shows what content values are variable.  Content values are expressive when they vary in predictable or regular ways.  

A chunk is a component only if it satisfies at least one of two criteria:

  1. The component varies in a recurring way, and/or
  2. The component will be reused in more than one context.

Content components can be either local or global variables.  Content components are local variables when used in one context only.  The component presents alternative variations, or it is optional in different scenarios.  Content components are a global variables when they can be used in different contexts.

We can summarize whether a chunk is a content component in a matrix:

Context Value is Fixed Value is Variable
Value is local to one context Not a component Component
Value is global: used in more than one context Component Component

Content components are the content model’s equivalent of a domain model’s enumerated values.  Enumerated values are the list of allowed values in a domain model (sometimes called a controlled vocabulary, or colloquially known as a pick-list value.)  Enumerated values are names of attribute choices — the names of colors, sizes, geographic regions, etc.  They are small bits of data that can be aggregated, filtered upon, and otherwise managed.   

In the case of a content model, the goal is to manage pieces of content rather than pieces of data.  Generally, the pieces of content will be larger than the data.  The components can be paragraphs or images.  These components behave differently from the data in a domain model.  One can’t filter on content values (unlike data values).   And it will be rare that one aggregates content values.  The benefit of a content variable is that one creates rules for when and where to display the component.

Let’s consider the variation in content according to three levels, which I will call repetitive, expressive, and distinctive.  These terms are just labels to help us think about how fixed or variable content is. They aren’t meant to be value judgments. 

Repetitive content refers to content that is fixed in place.  It always says the same thing in the same way in one specific context.  The meaning and the style are locked down — there’s no variation.  For example, the welcome announcement and jingle for a podcast may always be the same each week, even though the program that follows the intro will be different every week.  The welcome announcement is specific to the podcast, and is not used in other kinds of content.

Expressive content refers to how content variation changes the meaning for audiences.  It considers variation in the components chosen.  Variation can happen within components, and across different content incorporating those components.  Expressive content also resembles a term in programming known as expressions, which evaluates values.  With expressive content, the key question is knowing what’s the right value to use — choosing the right content variation.

With distinctive content, no two content items are the same.  This blog post is distinctive content, because none of the material has been reused, or will be reused.  The body of most articles is distinctive content, even if one can segment it into discrete parts.  

It’s important to recognize that a content model is just one of the tools that’s available to organize the structuring of content.  Other tools include content templates, and text editors.  

Let’s focus on the “body field” — the big blob of text in much online content.  It can be structured in different ways.  Not all editorial structuring involves content components.  An article might have a lead paragraph.  That paragraph may be guided by specific criteria. It must address who the article is for, and what benefit the article offers the reader.  But that lead is specific to the article.  It is part of the article’s structure, but not an independent structure that’s used in other contexts.

The same article might have a summary paragraph.  Unless the summary is used elsewhere, it is also not a content component.  The summary may be standalone content that could be used in other contexts, although I’ve seen plenty of examples of where that’s been done that haven’t been great user experiences.

These segments of an article help readers understand the purpose of the content, and help writers plan the content.  But they aren’t part of the content model.  Such segmentation belongs in the text editor where the content is created.

Consider a different example of a content chunk.  Many corporate press releases have an impact on the price of company shares.  Companies routinely put a “forward earnings” disclaimer at the end of each press release.  This disclaimer is only applicable to press releases, and the wording is fixed. The disclaimer is not a content component that varies or is used in other contexts.  It should be incorporated into the content template for press releases.

Kind of text Variability Where to specify or manage
Repetitive Consistent text for one context only Template — hardwired
Expressive Same text used in multiple contexts or regularly variable text used in at least one context Content Model
Distinctive All content is unique: not reused in different contexts Structured guidelines in text editor

The content model is only one tool of many available to structure content in the broader sense.  The content model only addresses variable content components.  The content model doesn’t define the entire structure of the content that audiences see.  The content model helps support templates, but doesn’t define all the elements in a template or the organization of the wireframes.  The structure of the authoring environment may draw on components available in the content model, but it will segment content characteristics that won’t be part of the content model.  

What should be included in a Content Model?

Components are meaningful objects.  They can change in meaning.  They can create new meaning when combined in different ways.  They aren’t simply empty containers or placeholders for material to present.  

Content models provide guidance for two decisions:

  • Where can a component be used? — the available contexts
  • Which variation can be used? — the available variants

The components within a content model can be of three varieties:

  1. Statements
  2. Phrases
  3. Media Assets


Statements are sentences, paragraphs or sections comprised of several paragraphs.  Structurally, they can be sections, asides, call outs, quotes, and so on.  Statements will often be long blocks of texts.  In some cases there will be variations of the blocks of text for different audience segments or regions.  Other times, there will be no variation in the text, but it will appear in more than one context.  

An example of how statements can vary in a single context would be if an explanation about customer legal rights changed depending on whether the customer was based in the US or the UK.  The substance of the content component changes.  

An example of how a statement can be used in multiple contexts is a disclosure about pricing and availability.  A publisher may need to include the statement “pricing and availability subject to change” in many different contexts.  


A content component may be a phrase or managed fragment of text.    Phrasing has become much more important, especially with the rise of UX writing.  Some wording is significant, because it relates to a big “moment of truth” for the audience: a decision they need to make, or a notification that’s important to them.  Specific phrasing may be continually refined to reflect brand voice and terminology, and to ensure customer acceptance.  It may be optimized through A/B testing.  

In contrast to variations in statements, which generally relate to differences in meaning or substance, the variation in phrasing generally relates to wording, tone, or style.

Some examples of managed phrases include:

 The recent emergence of design systems for UX writing is promoting the reuse of text phrases.  UX writing design systems can indicate a preferred messaging that conforms to editorial decisions about branding, or that performs better with audiences.  Although it is not currently common to do so, such reusable text can be included in the content model.

Phrases may not be managed within a CMS.  They could be in an external file that is called to insert the correct phrase.  Again, the content model should not be restricted to content managed by the CMS.

Let’s consider how phrases can be content components.

Phrase type Content Variation? Used in Multiple Contexts?
Error messages No Yes
CTA Yes Yes
Thank you message No Yes


Media Assets

Media assets offer a wide range of content variation.  Different media assets can present either different substance, or present the same substance using a different style.  Different photos might show different entities, or they might present the same entity in different ways.  

Because content models have historically been closely identified with text, they have not always represented other forms of content such as media.  Media assets are often stored in other systems, and may not be viewed as content variables when authors are focused on text within a text editor.  As a result, media assets can sometimes be a second class citizen in a content-first process.  As is the case with phrasing, media assets don’t appear in many content models currently, although they should.

Media assets include:

  • Alternative images 
  • Optional videos
  • Maps
  • Content widgets, such as calculators

Let’s consider how media assets can be content components

Asset type Content Variation? Used in Multiple Contexts?
Campaign Logo No Yes
Product Images Yes Yes
Video Explainer No Yes
Welcome Video Yes No
Hero Image for Services Pages Yes No


Editorial Benefits of a Content Model


Content models help content creators focus on the scenarios in which specific elements of content will be used.  

Content models help content designers decide what content is global content that will be used in different contexts.  And what content needs to work along side other content.

For authors, content models operationalize prior editorial decisions and save effort.  Publishers may already have approved language for a statement.  Certain phrasing may already have been tested and optimized, so that text can be reused instead of recreated.  Content models provide authors with guidance.  They gain an ability to select different options: for example, choose one of these five images that relate to the rest of the content.  

Models also offer the possibility of improving the stock of content components that get widely used.  Several different phrases could be shown in notifications — providing some variety to messaging around routine tasks.  A new phrase could be added to the model and tested.  If well received, it could be added to the roster.  

Content models enable better content governance. By defining and managing variations in content, communication can be optimized across channels.  Content models prevent unplanned variation.  They help to unify content resources that may be stored on different systems.   

It’s time to elevate the role of content models.  Editorial planning involves choosing the right information and presenting it in the right way.   Content models can capture variations relating to both substance and style.   Content models can do much to support editorial decisions.  

—Michael Andrews

Where Domain Models Meet Content Models

A model is supposed to be a simplification of reality.  It is meant to help people understand and act.  But sometimes models do the opposite and cause confusion.  If the model becomes an artifact of its own, it can be hard to see its connection to what it is supposed to represent.  Over recent months, several people have raised questions on Twitter about the relationship between a domain model, a content model, and a data model.  We also may also encounter the terms ontology or vocabulary, which are also models. With so many models out there, it’s small wonder that people might be confused. 

From what I can see, no consensus yet exists on how to blend these perspectives in a way that’s both flexible and easy to understand.  I want to offer my thinking about how these different models are related.  

Part of the source of confusion is that all these models were developed by different parties to solve different problems.   Only recently has the content strategy community started to focus on how integrate the different perspectives offered by these models.   

The Different Purposes of Models

Domain models have a geeky pedigree.  They come from a software development approach known as domain-driven design (DDD).  DDD is an approach to developing applications rather than to publishing content.  It’s focused on tasks that a software application must support (behavior), and maps tasks to domain objects that are centered on entities (data).  The notion of a domain model was subsequently adopted by ontology engineers (people who design models of information.)  Again, these ontology engineers weren’t focused on the needs of web publishers: they just wanted a way to define the relationship between different kinds of information to allow the information to be queried. From these highly technical origins, domain models attracted attention in the content strategy community as a tool to model the relationships of entities that will appear in one’s content.  The critical question is, so what?  What value does a domain model offer to online publishers?  This question can elicit different and sometimes fuzzy answers.  I’ll offer my perspective in a moment.

A content model sounds similar to a domain model, but the two are different.  A content model is an abstract picture of the elements that content creators must create, which are managed by a CMS.  When content strategists talk about structuring content, they are generally referring to the elements that comprise a content model.  Where a domain model is concerned with data or facts, a content model is concerned with expressive content — the text, images, videos and other material that audiences consume.  Compared with a domain model, a content model is more focused on the experience of audiences.  Unsurprisingly, content strategists talk about content models more than they talk about domain models.  

Content models can serve two roles: representing what the audience is interested in consuming, and representing how that content is managed.   The content model can become confusing when it tries to indicate both what the machine delivering content needs to know about, as well as what the audience needs to see.  

Regrettably, the design of CMSs has trained authors to think about content elements in a certain way.  Authors decompose text articles into chunks, presented as fields in a CMS.  The content model can start to look like a massive form, with many fields available to address different aspects of a topic or theme.  Not all fields will display in all scenarios, and fields may be shared across different views of content (hence rules are needed to direct what’s shown when). It may look like a data model.  But the content model doesn’t impose strict rules about what types of values are allowed for the fields.  The values of some fields are numbers, some are pick list values.  Many fields are multiple paragraphs of text representing thousands of characters.  Some fields are links to images, audio, or to videos.  Some fields may involve values that are phrases, such as the text used on a button.  While all these values are “data” in the sense of being ones and zeros, they don’t add up to a robust data model.  That’s one reason that many developers consider content as unstructured — the values of content defy any uniformity.  

A content model is not a solid foundation for a data model about the content. The structure represented in a content model is not semantic (machine intelligible) — contrary to the beliefs of many content strategists.  Creating a content model doesn’t make the content semantic.   Structured authoring helps authors plan how different pieces of content can fit together. But author-defined structures don’t mean anything to outside parties, and most machines won’t automatically understand what the chunks of content mean.  A content model can inform a schematic of the content’s architecture, such as what content is needed and from where it will be sourced (it could come from other systems, or even external sources).  That’s useful for internal purposes.  The content model is implemented with custom code.  

The primary value of content models is to guide editorial decisions.  The content model defines content types — distinct profiles of content that address specific user purposes and goals.  A content model can specify many details, such as a short and a long description to accommodate different kinds of devices, or alternative text for different audiences in different regions.   A detailed content model can help the content adapt to different contexts.  

Domain models are strong where content models are weak. Although domain models did not originally rely on metadata standards (e.g., in DDD), domain models increasingly have become synonymous with metadata vocabularies or ontologies.  Domain models define data models: how factual information is stored so it can be accessed. They supply one source of truth for information, in contrast to the many expressive variations represented in a content model.  Domain models represent the relationships of the data or information relating to a domain or broad subject area.  Domain models can be precise about the kinds of values expected.  Precise values are required in order to allow the information to be understood and reused in different contexts by different machines.  Because a domain model is based on metadata standards, the information can be used by different parties.  Content defined by a content model, in contrast, is primarily of use to the publisher only.   

The core value of a domain model is to represent entities — the key things discussed in content.  Metadata vocabularies define entity types that provide properties for all the important values that would provide important information.  Some entity types will reference other entity types.  For example, an event (entity type 1) takes place at a location (entity type 2).  The relationships between different entities are already defined by the vocabulary, which reduces the need for the publisher to set up special rules defining these relationships.  The domain model can suggest the kinds of information that authors need to include in content delivered to audiences.  In addition, the domain model can also support non-editorial uses of the information.  For example, it can provide information to a functional app on a smartphone.  Or it can provide factual information to bots or to search engines.  

The Boundary between Domain and Content Models

What’s the boundary between a domain model and a content model?

A common issue I’ve noticed is that model makers try to use a content type to represent an entity type. Certain CMSs aren’t too clear about the difference between content types and entity types.  One must be careful not to let your CMS force you to think in certain ways. 

Let’s consider a common topic: events.  Some content strategists consider events as a distinct content type.  That would seem to imply the content model manages all the information relating to events. But an event is actually an entity type.  Metadata standards already define all the common properties associated with an event.  There’s little point replicating that information in the content model.  The event information may need to travel to many places: to a calendar on someone’s phone, in search results, as well as on the publisher’s website which has a special webpage for events.    But how the publisher wants to promote the event could still be productively represented in the content model.  The publisher needs to think about editorial elements associated with the event, such as images and calls-to-action.

Event content contains both structured editorial content, as well as structured metadata

The domain model represents what something is, while the content model can represent what is said or how it is said.  Let’s return to the all important call-to-action (CTA).  A CTA is a user action that is monitored in analytics.  The action itself can be represented as metadata — for example, there is a “buy action” in  Publishers can use metadata to track what products are being bought according to the product’s properties, for example, color.  But the text on the buy button is part of the content model.  The CTA phrasing can be reused on different buttons.  The value of the content model is to facilitate the reuse of expressive content rather than the reuse of information.  Content models will change, as different elements gain or lose their mojo when presented to audiences.  The elements in a content model can be tested.  The domain model, centered on factual information, is far more stable.  The values may change, but the entities and properties in the model will rarely change.

When information is structured semantically with metadata standards, a database designed around a domain model can populate information used in content.  In such cases, the domain model supports the content model.  But in other cases, authors will be creating loosely structured information, such as long narrative texts that discuss information.  In these cases, authors can annotate the text to capture the core facts that should be included.  The annotation allows these facts to be reused later for different contexts.  

Over time, more editorial components are becoming formalized as structured data defined by metadata vocabulary standards.  As different publishers face similar needs and borrow from each others’ approaches, the element in the content model becomes a design pattern that’s widely used, and therefore a candidate for standardization.  For example, simple how-to instructions can be specified using metadata standards.  

The Layered Cake

How domain models can support content models

One simple way to think about the two models is as layers of a cake.  The domain model is the base layer.  It manages the factual information that’s needed by the content and by machines for applications.  The content model is the layer above the domain model.  It manages all the relevant content assets (thumbnails, video trailers, diagrams, etc), all the sections of copy (introductions, call outs, quotes, sidebars, etc.) and all the messaging (button text, alternative headlines, etc.)  On the top of these layers is the icing on the cake: the presentation layer.  The presentation layer is not about the raw ingredients, or how the ingredients are cooked.  It’s about how the finished product looks.  

The distinctions I’ve made between the domain model and content model may not align with how your content management systems are set up.  But such decoupling of data and content is becoming more common.   If factual information is kept separate from expressive content, publishers can gain more flexibility when configuring how they deliver content and information to audiences.

— Michael Andrews

Fact-checking Attitudes about FAQs

FAQs are a polarizing form of content.  Some content professionals think polished FAQs can be awesome (even if many are far from being so.) Many other content professionals believe FAQs can never be awesome.  Some publishers treat FAQs as another outlet to promote the publisher’s message.  Critics view FAQs as unnecessary clutter — an admission that the design of the content is failing. 

I want to try to separate the light from the heat.  FAQs may seem like a dinosaur from the earliest days of online publishing, but they are morphing into something much more intelligent than many content designers realize.  FAQs will likely be more important in the future, not less.  But they won’t act like most FAQs used today.

The Stigma of FAQs

FAQs are so loathed that slogans about them now show up on swag.  A Twitter poll revealed that a “frequently asked” request for a coffee/tea mug slogan was “No FAQs.”  The mug was duly made, and is appearing on office desks.

A cozy cuppa. (Via Twitter)

Lisa Wright, a technical communicator who has written one of the most insightful critiques of FAQs, invokes a ghoulish specter:  “Like zombies in a horror film, and with the same level of intellectual rigor, FAQs continue to pop up all over the web.”  Be afraid: FAQs can trigger nightmares!

There are plenty of valid criticisms of FAQs, including many that get scant attention.  But there are also plenty of criticisms of FAQs that seem subjective, and merely reinforce pre-existing attitudes about them.  Dinging FAQs can be fun and can help you bond with new friends.  Mocking FAQs during a rabble rousing conference presentation is a sure crowd pleaser, although the glib assertions in such talks are hard to fact-check and challenge.  I’ve seen critiques of FAQs that say categorically that users don’t want FAQs, without providing any evidence that would allow us to evaluate how accurate, or generalizable, such a statement is.   Even data about FAQs can be difficult to evaluate.  Analytics may show FAQs are being viewed, while usability tests may show FAQs are frustrating to use.  Evaluating the value of FAQs requires a deep understanding of both content relationships and user expectations.  

To say that FAQs don’t “spark joy” would be an understatement.  FAQs carry a stigma: their existence can seem to signal  failure.  Lots of people wish they weren’t necessary. Why do customers keep asking these same questions again and again?  What’s the root cause triggering this irritation?  FAQs may not be a sign that there are problems in the content.  FAQs may be a sign that there are problems in the products that the content must explain.  Many common customer service-related questions are about low-level bugs: frequent points of friction that users encounter that vendors decide are not serious enough to prioritize fixing. People may ask: why can’t they do something on their phone that they can do on their desktop?  In an ideal world, users wouldn’t have to ask questions. Everything would work perfectly and no thinking would be necessary.  We should never give up on that aspiration. But the messy reality is that vendors ship products with bugs and limitations. Users will always want to do something that was deemed an edge case or a low priority.   The gap between what’s available and what’s expected results in a question.  

 FAQs have been around since the earliest days of the internet.  They arose because they provided a simple way for publishers to address information that audiences were seeking.  FAQs were the first feedback loop on websites, long before usability testing or A/B testing became common.  Users indicated through emails or searches the questions they had, and publishers provided answers.  FAQs aren’t the only way to reply to user questions.  But credible, relevant FAQs can signal that the publisher listens to what information people want to know. Susan Farrell of the usability research consultancy Nielsen Norman Group has concluded that “FAQs Still Deliver Great Value.”

FAQs are simply a generic content type, much like a table or video can be a generic content type or format.  Content professionals should resist the temptation to categorically dismiss FAQs as a bad or evil content type.  Few would condemn all videos as evil, even if there are plenty of examples of bad videos.  FAQs are surprising hard to do well. They have to deliver important information, highly anticipated by the user, in a concise and precise way.   

For many web publishers, FAQs are a poorly managed content type.  Why are FAQs poorly managed? FAQ pages often suffer from unclear ownership.  FAQs are located on one of the few web pages in an organization that may be used by both marketing and customer support.  That dual ownership can create a tension about the purpose of the FAQs: the kinds of questions presented, and the kinds of answers allowed.  Without clear ownership and or a clearly defined purpose, FAQ pages can become a dumping ground for random information that different parties want to publish. When that happens, FAQs are no longer about answering common customer questions; they are about exposing organizational anxieties.   FAQs can share the governance problems of another high profile web page: the home page.

FAQs can seem dishonest at times.  Not all FAQs are really “frequently asked” questions, even if they appear in a short list on a FAQ page.  True FAQs are based on real questions from real users.  Potemkin FAQs are questions that the publisher decided they wanted to talk about, or wanted to spin in a flattering light.  I’ve seen FAQs with a strong marketing focus, such as “How is your product different than company X’s product?”  Even if buyers are wondering about that, they aren’t looking for an answer on the FAQ page.  FAQs are meant to answer factual questions — not provide opinion and commentary.  FAQs are not a sales channel.  They are not a list of potential buyer objections.  Answers should answer, not sell.  

The purpose of FAQs is to prevent the user from having to dig through pages of content to get an answer to a straightforward question.  But if the user has to go digging through a long list of FAQs to find the question in order to find the answer, then the benefit of the FAQ has been nullified.  

In order to decide if FAQs are appropriate, a publisher should understand when and why customers have a question to begin with?  What triggers the question, and when is it triggered?  

FAQs respond to a user need for information. Either the information is new to the user, or it has been forgotten.  Several common scenarios arise.  If customers are familiar with other similar organizations and now are considering your organization, they may ask questions so they can make a comparison.  For example, they may want to know what’s your organization’s policy about an issue that’s  important to them.  If customers are already familiar with your organization, they may have questions about changes that may have occurred.  For example, annual changes in tax policies routinely generate many questions about how such changes affect specific situations.

Using FAQs especially makes sense when the web isn’t the primary channel of communicating with audiences.  If people are already reading your web content, there is little point having them find the FAQs if the question can be answered on the pages people are already visiting.  But many FAQ scenarios arise from non-web channel interactions.  People have an issue with a product they’ve bought, and are driven to the website looking for the answer.  The BBC’s audiences hear about something on radio or TV, and want follow up details, so they head to the website.  Someone is planning to visit a retail store, but has a question about the validity of a competitor coupon.  And sometimes people have general questions that aren’t related to a specific task.  Questions don’t always arise in the context of a web task.  Providing answers can sometimes be a precondition to starting a task.

Questions and Answers as a Content Form

Questions and answers are a fundamental way of structuring content.  Q&As are one of the oldest forms of content, and they can be traced to ancient times when content was oral.  The conversational nature of questions and answers aligns closely to the increasingly post-document character of online content. 

Published content should have a purpose.  Questions put a spotlight on the purpose of the content.  What question(s) does the content answer?  Does it answer the question well?  Is the question important?  

A question can be called many things.  When I lived in Britain, I discovered people made enquiries (with an “e”), which to my American ears sounded rather formal.  Living in India, I notice few people have questions, but many people have doubts.  People with doubts generally aren’t skeptical; they just want answers.  Computer geeks will speak about queries.  All these terms can be synonyms, though they can evoke subtly different connotations about intention and purpose, depending on one’s background.   Do they need a formal verdict about eligibility?  Are they confused? Are they exploring?

No matter how carefully-crafted a publisher’s content is, audiences will still have questions. Publishers will need to provide answers to those questions — as they are articulated by audiences.  Publishers can’t expect that audiences will stop asking questions.  Publishers can’t even expect that everyone will read through all their carefully crafted content.  Publishers would be presumptuous to assume that their content will answers every question audiences have, and that information sought will always be easy to find within the text.  One of the ironies of the anti-FAQ attitude is that while it claims to be audience-centric, it actually is publisher-centric.  FAQ-phobia at its worst becomes an attitude of “No Questions Allowed: we’ll decide what you need to know, and will tell you when we decide you need to know about it.”  Like a stern school teacher, the publisher doesn’t permit any participation.  

It’s helpful to compare the characteristics of FAQ with those of Q&As found in online forums.  They are similar, except that both the questions and answers in FAQs tend to be more fixed, as one party chooses both the question and the answer.  Q&As in forums tend to be more fluid.  In an open Q&A, it can be more transparent who raised the question, and users themselves may supply the answer.  Questions in a Q&A can sometimes be duplicated (a sure sign they are frequently asked) and sometimes questions mutate: people ask variants, or request an update based on new circumstances.  Answers sometimes spawn new questions.  Q&As in forums can be less efficient than FAQs at directly answering common questions, but they can be effective  surfacing what issues concern audiences, and how they are thinking about these issues.

Questions themselves can be interesting.  One can see some common questions, and think: that’s a good question!  Hadn’t thought to ask that myself, but interested to know the answer.  For example, I found these questions on the Nestlé India website:  

  • “Are the natural trans fats in dairy as harmful for the body as man-made trans fats?”  
  • “Are stir fries healthy?”

The very presence of these questions provides an indication of what customers must be chatting about online and in social media.  Questions can be the voice of the customer, if the questions are genuine.

Any form of published questions and answers involves some kind of moderation.  FAQs typically don’t offer an “Ask Me Anything” form of openness — the questions selected are chosen editorially.  But many Q&A sites allow such openness.  Quora, StackExchange, and other sites allow users to pose any question they want (consistent with their guidelines), and users vote on the value of both the question and the answers.  The success of these sites indicates that the question-and-answer format does service a useful role.

In the case of FAQs, publishers must decide which questions are common enough to merit an answer.  Who specifically are FAQs meant to address: everyone, or specific groups of individuals?  And what kinds of questions are appropriate to answer with FAQs?   These are editorial decisions, and they need to be supported with the right structure for the content.  Many FAQ problems arise from either not making clear editorial decisions, or not having the right structure in place to support the editorial decisions made.  

FAQs can sprawl if governance is lacking.  Some publishers use FAQs to broadcast information about things they think audiences should know about, even if audiences aren’t asking about them often.   They lack a process to evaluate the importance of a question to the audience.  

Many people think about FAQs as a single destination page. But some publishers have multiple FAQ pages.  When FAQs are treated as web pages, users may never even find the questions and answers.  They need to figure out two things: whether their query is a frequently asked question, and knowing where the FAQs are located.   Audiences ideally shouldn’t have to think about where the answers live.  

The content marketing software firm HubSpot seems to have over 7000 FAQ pages, covering different branded audience and task-themed areas of their website, such as:

  • Content Marketing Certification FAQ – HubSpot Academy
  • HubSpot Developers FAQ
  • Workflows | Frequently Asked Questions – HubSpot Academy
  • Frequently Asked Questions – HubSpot Design
  • HubSpot Partner Program FAQs

What a mess — how is the user supposed to know where to get an answer?  

Such sprawl is common when marketing organizations dominate the process.  Both questions and answers get framed by marketing segmentation.  The supply of answers — the stuff to talk about — drives the process, instead of the supply of questions.  That creates a risk that the FAQs don’t sound authentic.  They can sound as if a blurb about something was written, and only then was a leading question created to become its heading.  In Hubspot’s case, even the title of blog posts use the term FAQ.  While it can be appropriate to address common questions in a blog post, those shouldn’t be labelled as  FAQs.  

Some frequently asked questions  reflect customer skepticism.  For example, MeWe, a social network site, claims to be free and to respect user privacy.  They have a FAQ on how they can be free and make money.  The question seems genuine, even if the answer seems vague.

How can you be so great?

 Not only should the questions be important and relevant to many people, their answer needs to concise enough to cover the question’s scope.  Open ended questions fail that test: short answers will fail to satisfy everyone’s criteria.  Answers should not involve “it depends…” unless the answer provides onward links to explain different dimensions relating to the question.  Overly general answers can sound evasive.  

FAQs prove their value when they deliver brevity.  Audiences don’t want to wade through lengthy text to find an answer.  A classic case of a mismatch between questions and content are terms and conditions (T&Cs).  While there may be legal reasons for having a long terms and conditions document, the information is hard to access. Apple’s terms and conditions would take nine hours to read completely.  Users will have specific questions, and should be able to get specific answers without having to scan or read the T&Cs.

Publishers need to clearly convey what kind of questions are covered by FAQs.  Many users will assume frequently asked questions are perennial questions repeatedly asked by people over time —a sort of greatest hits of factoids.  But some publishers such as the BBC introduce the notion of “most popular” FAQs, which is confusing.  Many users will assume that frequent questions are popular ones.  But the BBC seems to describe lots of questions as FAQs, and then scores them by popularity, which can fluctuate.  Popular questions may relate to how to get tickets to show or purchase a calendar linked to a program.  Popular questions may be shorted lived and of interest only to a limited subgroup of visitors.  There certainly needs to be a way to address questions that become suddenly and perhaps momentarily popular, but FAQ pages are customarily static.  Many users won’t expect answers to such questions on a FAQ page.

Matching Questions with Answers: The issue of Intention

The root of question is “quest.”  Users are on a quest.  Questions arise because users need to understand something in order to do something.  It is not always clear why someone is asking a question.  Sometimes there could be more than one reason.  Some people might ask about a return policy because they want to try the product before committing.  Others ask because they are buying a gift and don’t know if the recipient will like it.  

A core design challenge for FAQs is understanding how specific or general a user intention is.  This gets into how to handle the granularity of questions and answers.  

First, let’s break down the components of the customer-publisher interaction.

The quaeritur (the question asked) needs a corresponding quaesitum (a solution).  While it sounds simple to match these two parts, in the word of online information the process is slightly more complex.  It’s actually a three stage process:

  1. Question that was asked (the user query)
  2. The question that was answered (the published question or statement)
  3. The answer that was provided (or elaboration of the statement)

There are several places where the process could go wrong.  

How user queries get connected to published answers

The user query may not match the published FAQ question.  If the user sees a list of questions on a FAQ page, they may not see a question that matches how they are thinking about an issue.  There are many reasons why the user query may not overlap with the published question.  One reason is language: the terminology in a user query could be less formal and vaguer, and the user is unable to translate how they are thinking about the question into the publisher’s terminology.  The other reason is a mismatch of scope.  Users may be looking for more specific answers, and hence questions, than appear in a FAQ’s list of questions.  Lisa Wright notes: “If a question appears to exclude the required information, the user may never click to see the answer, even if it is actually relevant.”  

The published answer may not satisfy the user goal associated with their original query.   The answer may be too general, or it may focus on details that while of interest to many, are not relevant to the specific user.  

Because of the possibilities of mismatching, publishers need to remove extraneous steps and provide onward next steps to get users toward the answer they seek.  If the user query matches an answer, it is best to show that answer directly, and not show how the publisher wrote the question, which could be broader.  Specific queries are unlikely match published questions on a FAQ page.  Users need the ability to express their own questions such as typing a search query, which can be mapped to appropriate answers.  If the user query doesn’t match an available answer, it is best to show the nearest question for which there is an answer, assuming there is one.   

Users need to know: “What questions can I ask?”   The BBC, which has a search interface to their FAQs, even has a FAQ on how to use their FAQ, a tacit admission that people aren’t sure what they can expect.

How to use the BBC’s FAQs

One of the major uncertainties associated with tell-me-what-you-want command-based queries is not knowing if one can ask a question.  Voice interfaces are frustrating when the agent responds that they don’t understand the question, or when they misinterpret the question.  Designers of query bots are developing ways to indicate query patterns a user can try that will yield useful answers.  

The BBC’s FAQs reveal the problem of matching a user query to a publisher question.  I have difficulty downloading BBC podcasts on my smartphone’s podcast app (rather than the BBC’s proprietary iPlayer app).  But my query doesn’t really match the questions available, which are either more general or are irrelevant.   My issue feels like it should be a frequently asked question.  I’d be surprised if I’m alone in having problems accessing the BBC’s DRM-clamped, regionally-restricted audio content.  

Is there a match?

The BBC’s FAQ explains: “If you don’t get the result you’re looking for then it’s likely that your question isn’t one of our Frequently Asked Questions and we don’t have the information available.”   But because I only see select questions that the BBC offers, I don’t really know if they failed to match my query with their questions (likely), or whether my query search their answers to find a match (possible).  A user’s query might match the content of an available answer, but not the content of the question published that’s associated with that answer.  

Most matching relies on matching specific trigger words, called hotwords in the case of voice interfaces.  These trigger words are typically primitive.  They don’t capture the context of the question, which can leads to a time-consuming process of clarification.  Users ask: “How do I unclog my dishwasher?”  And the answer could be another question (“Sounds like you need some help, tell me what model of dishwasher you have?”) Or the answer provided is too general to be useful (“Here are some tips on maintaining your dishwasher.”) As long as question-matching relies on trigger words, the interaction will be rudimentary.   Over time, machines will become better interpreting questions.  Major tech companies such as Amazon, Google, and Microsoft offer natural language processing APIs that publishers can access to help understand the meaning of the user query rather than just react to a keyword.  Machines will understand the type of question, recognize synonyms, and connect relationships between different concepts mentioned.  

From Preselected Questions to Permission-To-Ask

A major source of friction associated with FAQs are the limitations on presenting a list of questions to users.  Listing more than one or two dozen questions results in sprawl.  Long lists of questions are difficult to scan, because they lack contextual clues about what they refer to.  People are looking for answers.  They aren’t interested in looking for preselected questions.  They aren’t looking for permission to ask a question.  

But not all questions addressed need to appear in a list on a FAQ page. More and more, users expect to be able to ask questions directly, perhaps in a search query.   Publishers need to be responsive to the actual questions that audiences want to ask.  Publishers need to give audiences permission to ask.  That involves not thinking about FAQs as existing only on a FAQ page.

Publishers need to embrace a “permission to ask” mindset.  Part of giving users permission to ask is not making users have to visit your website to get an answer.  A publisher can let them use Google, Facebook messenger, or Alexa to ask questions.

Publishers can’t assume they know all the questions that people will have that will be important.  While user research is always necessary when prioritizing questions to answer, user research will never uncover all questions users will have.  A widely quoted statistic is that 15% of all Google queries each day are new — they have never before been expressed.  This has been true for many years.   Not all high priority questions can be anticipated in advance.  FAQs can provide a way for publishers to answer questions for which no published content has yet been created.  FAQs can provide a brief answer that can subsequently elaborated upon in articles or other content.  Even if user queries aren’t fully answered currently, publishers can provide a task flow that offers a next step for users to get the complete answer they need.

For FAQs to become more useful, they should be treated as a global resource.  They are a body of answers, not just as a list of questions.  They need to be unshackled from the FAQ page.  Some developments in chat bot and voice bots suggests the overall direction for how answers must work across all channels.  Answers don’t live in one place: they need to be everywhere.

Lisa Wright cautions that with FAQs “information can get out of sync quickly, resulting in duplicate or even contradictory content.”  Most publishers manage FAQ is separately than other content.  But it doesn’t need to be that way.  The same question and answer can be inserted on relevant web pages as well as appearing on a FAQ page.  If structured appropriately, there is only one instance of the content that is available multiple places.  

The first task is to consider how questions need to be structured.   Certain question structures indicate intents, for example:

  • (Process) How do I….?
  • (Checking rules) Can I…? 
  • (Options) Which…?

These intents can be characterized according to the purpose of the question and the purpose of the related information answering the question.  The question needs to indicate what the user’s quest is about.  

When questions have structure, they can better serve the needs of users.  Many applications provide prompts to users about how to ask in a certain way [“Ask me….”].  Such prompts reduce the gap between matching the query with the question, and helping users understand the scope of addressable questions.

The second part of Q&A structuring is looking at the relationship between the question and answer.

  • Can specific questions be answered with general answers?
  • Can a specific query be an instance of a more general question?  

For example, “Does Walmart accept international credit cards?” is a more specific question than “What forms of payment does Walmart accept?”  The answer to the latter question may address the first question about international credit cards, but it will include many other details.  “Does Walmart accept purchase orders?” might also lead to a comprehensive answer about forms of payment.  

More specifically, questions and answers can involve three kinds of relationships:

  1. One Question, One Answer (a highly specific question-answer pair)
  2. One Question, Numerous Answers (many partial answers, or different opinions — that can be broken down into general and specific, or into follow up or related questions)
  3. Numerous Questions, One Answer (different specific questions have a single general answer)

In the cases where a question has numerous answers, the structuring may involve breaking down the answer.  Users expect answers to be short.  An answer overview could provide a link to other content that goes into more detail.

Finally, structuring questions and answers involves looking at the relationship between questions.  Questions may not be independent.  Other questions may be related, and useful for users to know about.  Google, for example, provides related questions to user queries.

Related questions in Google search results

Improving Precision with Metadata

Even though FAQ pages may seem like dinosaurs, they are not going away.  On the contrary: FAQ pages are in the process of being recognized as a distinct content type by the W3C’s community. Metadata is reinvigorating FAQs, and making them more useful and adaptable.

Increasingly, FAQs will be accessed outside of a website.  Many users won’t go to a FAQ page to find the questions and read the answers.  Metadata will allow them access to FAQs from their search page or using a chatbot or voice bot.  

The utility of FAQs is limited when they lack metadata.  It is hard to know what the question involves if it isn’t tagged according to the topic and intent of the question.  It’s hard to associate related questions when such metadata doesn’t exist.  And it’s hard to deal with more than a handful of questions, because the sprawl makes finding information difficult.  Metadata can organize vast quantities of questions and answers behind the scenes, so that users only see relevant information, and can access it immediately.   

Google relies on Q&A metadata to select answer snippets to display in search results.  Google states: “marking up your Q&A page helps Google generate a better snippet for your page.”  The user has a query, and metadata allows an answer without the user needing to go to a website.

As I discuss in my new book, No More Silos: Metadata Strategy for Online Publishing, metadata helps Quora keep track of the vast amount of questions and answers on its site.  Quora is using Wikidata IDs — a metadata identifier that is derived from Wikipedia.  These IDs can bring great precision to sometimes ambiguously worded questions.

Jay Myers of Best Buy, one the pioneers in the commercial applications of semantic metadata, explored in a recent blog post how one can model the relationship of topics that can be asked in a voice user interface such as Alexa.  He shows how a class of products such as routers can have a range of common issues such as being slow or having connection problems.  These issues are linked to tips to resolve issues.  There’s huge potential for questions to become the gateway to all kinds of very specific information.  This transition is just beginning.

Already, offers a extensive vocabulary for handling questions and answers.  The vocabulary gives publishers the ability to structure questions and answers to meet different user needs.  

Questions and answers in the vocabulary

First, the vocabulary offers different ways to indicate what a question is about. This goes beyond the simple tagging that exists in most content management systems.  The publisher can indicate the primary topic that the question addresses.  But the metadata can also provide more context about the question.  It can indicate what other topics are mentioned in the question.  And it also indicate whether the question is part of a broader question or series of questions, which can signal related questions.  Finally, there’s a metadata property for keywords, which can be used for taxonomy terms that classify the purpose of the question, such as diagnosis or repair (the keywords can be tags and don’t have to be the literal words appearing in the question.)  Publishers can also indicate the audience segment for the question, such as “existing owners” or “beginners.”  These different properties can make the intention of the question much more precise, and transcend the limitations of matching specific words used in the question.  

The vocabulary provides a range of options to indicate what’s included in an answer.  As with the question, the answer can properties to indicate what an answer is about and mentions (an answer might mention something that’s not mentioned in the question.)   These properties allow the publisher to deliver the answer when the user asks about a specific solution instead of asking more generally about options that are available.  Another feature available is the ability to publish multiple answers.  Publishers might publish provisional answers that users can vote on to determine if the answer is helpful or not.  They can select an answer as being the best answer or accepted answer.  Both Google and Bing already rely on these properties when presenting answers to certain questions that involve a range of perspectives.  When providing an answer, publishers can indicate that they are the source of the information using the publisher property.  This allows for attribution when users get an answer back when doing a search, or when asking a bot.  The metadata allows the bot to say “According to Acme Corporation…” before presenting the answer.

Suppose users have questions that aren’t already answered? metadata can cover that scenario as well.  The metadata can capture the asking of a question.  The user might fill out a form to ask a question, and might indicate the product they own if their question relates to that product.  The organization then receives the question and can develop and publish an answer.  When the answer is published, the user requesting the answer can be notified.  The structure allowing all this to happen is available within the metadata standard.


The FAQ page is no longer the only place that questions and answers are accessed.  The FAQ page could lose its role as a destination that users look for to get answers.  Yet content in the form of questions and answers is poised to become more important, especially as conversational interfaces become more sophisticated and are used to access more kinds of information.  FAQs will need to cover more questions and answers, not just the handful of Q&As shown on the FAQ page.  

While FAQs have great potential to address user needs, FAQs will be frustrating for users if content designers treat questions and answers simply as text instead of as structure.  Unless FAQs are structured appropriately, users will have trouble accessing relevant answers.   FAQs need metadata to support the user’s quest for information.  Metadata standards to describe Q&As already have a big influence on how readily audiences can discover the answers they are seeking.

— Michael Andrews