Where Domain Models Meet Content Models

A model is supposed to be a simplification of reality.  It is meant to help people understand and act.  But sometimes models do the opposite and cause confusion.  If the model becomes an artifact of its own, it can be hard to see its connection to what it is supposed to represent.  Over recent months, several people have raised questions on Twitter about the relationship between a domain model, a content model, and a data model.  We also may also encounter the terms ontology or vocabulary, which are also models. With so many models out there, it’s small wonder that people might be confused. 

From what I can see, no consensus yet exists on how to blend these perspectives in a way that’s both flexible and easy to understand.  I want to offer my thinking about how these different models are related.  

Part of the source of confusion is that all these models were developed by different parties to solve different problems.   Only recently has the content strategy community started to focus on how integrate the different perspectives offered by these models.   

The Different Purposes of Models

Domain models have a geeky pedigree.  They come from a software development approach known as domain-driven design (DDD).  DDD is an approach to developing applications rather than to publishing content.  It’s focused on tasks that a software application must support (behavior), and maps tasks to domain objects that are centered on entities (data).  The notion of a domain model was subsequently adopted by ontology engineers (people who design models of information.)  Again, these ontology engineers weren’t focused on the needs of web publishers: they just wanted a way to define the relationship between different kinds of information to allow the information to be queried. From these highly technical origins, domain models attracted attention in the content strategy community as a tool to model the relationships of entities that will appear in one’s content.  The critical question is, so what?  What value does a domain model offer to online publishers?  This question can elicit different and sometimes fuzzy answers.  I’ll offer my perspective in a moment.

A content model sounds similar to a domain model, but the two are different.  A content model is an abstract picture of the elements that content creators must create, which are managed by a CMS.  When content strategists talk about structuring content, they are generally referring to the elements that comprise a content model.  Where a domain model is concerned with data or facts, a content model is concerned with expressive content — the text, images, videos and other material that audiences consume.  Compared with a domain model, a content model is more focused on the experience of audiences.  Unsurprisingly, content strategists talk about content models more than they talk about domain models.  

Content models can serve two roles: representing what the audience is interested in consuming, and representing how that content is managed.   The content model can become confusing when it tries to indicate both what the machine delivering content needs to know about, as well as what the audience needs to see.  

Regrettably, the design of CMSs has trained authors to think about content elements in a certain way.  Authors decompose text articles into chunks, presented as fields in a CMS.  The content model can start to look like a massive form, with many fields available to address different aspects of a topic or theme.  Not all fields will display in all scenarios, and fields may be shared across different views of content (hence rules are needed to direct what’s shown when). It may look like a data model.  But the content model doesn’t impose strict rules about what types of values are allowed for the fields.  The values of some fields are numbers, some are pick list values.  Many fields are multiple paragraphs of text representing thousands of characters.  Some fields are links to images, audio, or to videos.  Some fields may involve values that are phrases, such as the text used on a button.  While all these values are “data” in the sense of being ones and zeros, they don’t add up to a robust data model.  That’s one reason that many developers consider content as unstructured — the values of content defy any uniformity.  

A content model is not a solid foundation for a data model about the content. The structure represented in a content model is not semantic (machine intelligible) — contrary to the beliefs of many content strategists.  Creating a content model doesn’t make the content semantic.   Structured authoring helps authors plan how different pieces of content can fit together. But author-defined structures don’t mean anything to outside parties, and most machines won’t automatically understand what the chunks of content mean.  A content model can inform a schematic of the content’s architecture, such as what content is needed and from where it will be sourced (it could come from other systems, or even external sources).  That’s useful for internal purposes.  The content model is implemented with custom code.  

The primary value of content models is to guide editorial decisions.  The content model defines content types — distinct profiles of content that address specific user purposes and goals.  A content model can specify many details, such as a short and a long description to accommodate different kinds of devices, or alternative text for different audiences in different regions.   A detailed content model can help the content adapt to different contexts.  

Domain models are strong where content models are weak. Although domain models did not originally rely on metadata standards (e.g., in DDD), domain models increasingly have become synonymous with metadata vocabularies or ontologies.  Domain models define data models: how factual information is stored so it can be accessed. They supply one source of truth for information, in contrast to the many expressive variations represented in a content model.  Domain models represent the relationships of the data or information relating to a domain or broad subject area.  Domain models can be precise about the kinds of values expected.  Precise values are required in order to allow the information to be understood and reused in different contexts by different machines.  Because a domain model is based on metadata standards, the information can be used by different parties.  Content defined by a content model, in contrast, is primarily of use to the publisher only.   

The core value of a domain model is to represent entities — the key things discussed in content.  Metadata vocabularies define entity types that provide properties for all the important values that would provide important information.  Some entity types will reference other entity types.  For example, an event (entity type 1) takes place at a location (entity type 2).  The relationships between different entities are already defined by the vocabulary, which reduces the need for the publisher to set up special rules defining these relationships.  The domain model can suggest the kinds of information that authors need to include in content delivered to audiences.  In addition, the domain model can also support non-editorial uses of the information.  For example, it can provide information to a functional app on a smartphone.  Or it can provide factual information to bots or to search engines.  

The Boundary between Domain and Content Models

What’s the boundary between a domain model and a content model?

A common issue I’ve noticed is that model makers try to use a content type to represent an entity type. Certain CMSs aren’t too clear about the difference between content types and entity types.  One must be careful not to let your CMS force you to think in certain ways. 

Let’s consider a common topic: events.  Some content strategists consider events as a distinct content type.  That would seem to imply the content model manages all the information relating to events. But an event is actually an entity type.  Metadata standards already define all the common properties associated with an event.  There’s little point replicating that information in the content model.  The event information may need to travel to many places: to a calendar on someone’s phone, in search results, as well as on the publisher’s website which has a special webpage for events.    But how the publisher wants to promote the event could still be productively represented in the content model.  The publisher needs to think about editorial elements associated with the event, such as images and calls-to-action.

Event content contains both structured editorial content, as well as structured metadata

The domain model represents what something is, while the content model can represent what is said or how it is said.  Let’s return to the all important call-to-action (CTA).  A CTA is a user action that is monitored in analytics.  The action itself can be represented as metadata — for example, there is a “buy action” in schema.org.  Publishers can use metadata to track what products are being bought according to the product’s properties, for example, color.  But the text on the buy button is part of the content model.  The CTA phrasing can be reused on different buttons.  The value of the content model is to facilitate the reuse of expressive content rather than the reuse of information.  Content models will change, as different elements gain or lose their mojo when presented to audiences.  The elements in a content model can be tested.  The domain model, centered on factual information, is far more stable.  The values may change, but the entities and properties in the model will rarely change.

When information is structured semantically with metadata standards, a database designed around a domain model can populate information used in content.  In such cases, the domain model supports the content model.  But in other cases, authors will be creating loosely structured information, such as long narrative texts that discuss information.  In these cases, authors can annotate the text to capture the core facts that should be included.  The annotation allows these facts to be reused later for different contexts.  

Over time, more editorial components are becoming formalized as structured data defined by metadata vocabulary standards.  As different publishers face similar needs and borrow from each others’ approaches, the element in the content model becomes a design pattern that’s widely used, and therefore a candidate for standardization.  For example, simple how-to instructions can be specified using metadata standards.  

The Layered Cake

How domain models can support content models

One simple way to think about the two models is as layers of a cake.  The domain model is the base layer.  It manages the factual information that’s needed by the content and by machines for applications.  The content model is the layer above the domain model.  It manages all the relevant content assets (thumbnails, video trailers, diagrams, etc), all the sections of copy (introductions, call outs, quotes, sidebars, etc.) and all the messaging (button text, alternative headlines, etc.)  On the top of these layers is the icing on the cake: the presentation layer.  The presentation layer is not about the raw ingredients, or how the ingredients are cooked.  It’s about how the finished product looks.  

The distinctions I’ve made between the domain model and content model may not align with how your content management systems are set up.  But such decoupling of data and content is becoming more common.   If factual information is kept separate from expressive content, publishers can gain more flexibility when configuring how they deliver content and information to audiences.

— Michael Andrews