Categories
Content Engineering

Where does Content Structuring Happen?

Many useful approaches are available to support the structuring of content.  It’s important to understand their differences, and how they complement each other.  I want to consider content structuring in terms of a spectrum of approaches that address different priorities. 

Only a few years ago most discussion about structuring content focused on the desirability of doing it.  Now we are now seeing more written about how to do it, spawning discussion around topics such as content models, design patterns, templates, message hierarchies, vocabulary lists, and other approaches.  All these approaches contribute to structuring content.  Structuring content involves a combination of human decisions, design methods, and automated systems. 

Lately I’ve been thinking about how to unlock the editorial benefits of content models, which are generally considered a technical topic. I realized that discussing this angle could be challenging because I would need to separate general ideas about structuring content from concepts that are specific to content models.  While content models provide content with structure, so do other activities and artifacts.  If the goal is to structure content, what’s unique about content models?   We need to unpack the concept of structuring content to clarify what it means in practice.

Yet structuring content is not the true end goal.  Structuring content is simply a means to an end.  It’s the benefits of structuring content that are the real goal.  The expected benefits include improved consistency, flexibility, scaleability, and efficiency. Ultimately, the goal should be to deliver more unique and distinctive content: tailored and targeted to user needs, and reflecting the brand’s strategic priorities.  

 In reality, content structuring is not a thing.  It’s an umbrella term that covers a range of things that promote benefits, such as content consistency and so on.  Many approaches contribute.  But no single can approach claim “mission accomplished.”  

What are we talking about, exactly?

People with different roles and responsibilities talk about structuring content.  It sometimes seems like they are talking about different surface features of a giant pachyderm.

The Guardian newspaper earlier this month published an article on how they use “a content model to create structured content”.  

“Structured content works well for a media company like The Guardian, but the same approach can work for any organization that publishes large amounts of data on multiple platforms. Whether you’re a government department, art gallery, retailer or university.”

The Guardian

I applaud these sentiments, and endorse them enthusiastically.  Helpfully, the article provided tangible examples of how structuring content can make publishing content easier.  But the article unintentionally highlighted how terminology on this topic can be used in different ways.  The article mentions content reuse as a benefit of content structuring.  But the examples related more to republishing finished articles with slight modification, rather than reusing discrete components of content to build new content. When the writer, a solutions architect, refers to a content type, he identifies video as an example.  Most content strategists would consider video a content format, not a content type.  Similarly, when the article illustrates the Guardian’s content model, it looks very limited in its focus (a generic article) — much more like a content type than a full content model.  

Mike Atherton commented on twitter that the article, like many discussions of content structuring, didn’t address distinctions between “presentation structure vs semantic structure, how the two are compatible or, indeed, different, and whether they can or should be captured in the same model.”  

Mike raises a fair point: we often talk about different aspects of structure, without being explicit about what aspect is being addressed.

I think about structure as a spectrum. As yet there’s no Good Housekeeping Seal Of Approval on the one right way to structure content.  Even people who are united in enthusiasm for content structure can diverge in how they discuss it — as the Guardian article shows.  I know other people use different terminology, define the same terminology in different ways, and follow slightly different processes.  That doesn’t imply others are wrong.  It merely suggests that practices are still far from settled.  How an organization uses content structuring will partly depend on the kind of content they publish, and their specific goals.  The Guardian’s approach makes sense for their needs, but may not serve the needs of other publishers.  

For me, it helps to keep the focus on the value of each distinct kind of decision offers.  For those who write simple articles, or write copy for small apps that don’t need to be coordinated with other content, some of these distinctions won’t be as important.  Structure becomes increasingly important for enterprises trying to coordinate different web-related tasks.  The essence of structure is repeatability.  

The Spectrum of content structuring

The structuring of content needs to support different decisions.   

Structure brings greater precision to content. It can influence five dimensions:

  1. How content is presented
  2. What content is presented
  3. Where content is presented
  4. What content is required
  5. What content is available

Some of these issues involve audience-facing aspects, and others involve aspects handled by backend systems.  

Different aspects of content structure

Content doesn’t acquire its structure at one position along this spectrum.  The structuring of content happens in many places.  Each decision on the spectrum has a specific activity or artifact associated with it. The issues addressed by each decision can be inter-related.  But they shouldn’t become entangled, where it is difficult to understand how each influences another.  

UI Design or Interaction Design

UI design is not just visual styling.  Interaction design shapes the experience by structuring micro-tasks and the staging of information. From a content perspective, it’s not so much about surface behaviors such as animated transitions, but how to break up the presentation of content into meaningful moments.  For example, progressive disclosure, which can be done using CSS, both paces the delivery of content and directs attention to specific elements within the content.  Increasingly, UX writers are designing content within the context of a UI design or prototype.  They need understand the cross-dependences between the behavior of content and how it is understood and perceived.  

The design of behavior involves the creation of structure.  Content needs to behave predictably to be understandable.  UI design leverages structure by utilizing design patterns and design libraries.    

Content Design

Content design encompasses the creation and arrangement of different long and short messages into meaningful experiences. It defines what is said.  

Content design is not just about styling words.  It involves all textual and visual elements that influence the understanding and perception of messages, including the interaction between different messages over time and in different scenarios.  Words are central to content design; some professionals involved with content design refer to themselves as UX writers. Terminology is finely tuned and controlled to be consistent, clear, and on-brand. 

Writers commonly break content into blocks of text.  They may use a simple tool like Dropbox paper to provide a “distraction free” view of different text elements that’s unencumbered by the visual design.  It may look a bit like a template (and is sometimes referred to as one), but it’s purpose is to help writers to plan their text, rather than to define how the text is managed.  The design of content relies heavily on the application of implicit structure.  Audiences understand better when they are comfortable knowing what they can expect.  The design may utilize a message hierarchy (identifying major and minor messages), or voice and tone guidelines that depend on the scenario in which the writing appears.  For the most part these implicit structures are managed offline through guidelines for writers, rather than through explicit formal online systems.  But some writers are looking to operationalize these guidelines into more formal design systems that are easier and more reliable to use.  

Content design involves delivering a mix of the fresh and the familiar.  The content that’s fresh, that talks about novel issues or delivers unique or distinctive messages, is unstructured — it doesn’t rely on pre-existing elements.   Messages that are familiar (recycled in some way) have the possibility of becoming structured elements.  Content design thus involves both the creation of elements that will be reused (such as feedback messaging), and ad hoc content that will be specific to given screen.  But even ad hoc elements present the opportunity reuse certain phrases and terminology so that it is consistent with the content’s tone of voice guidelines.   Some publishers are even managing strings of phrases to reuse across different content.

Page Templates

Templates provide organizational structure for the content — for example, prioritizing the order of content, and creating a hierarchy between primary and secondary content.  The template defines the elements will be consistent for any content using the template, in contrast to the interaction design, which defines the elments that will be fluid and will change and respond to users as they consume the content.  

Templates provide slots to fill with content. Page templates specify HTML structure, in contrast to the drafting templates writers use to design specific content elements.   Page templates express organizational structure, such as where an image should be placed, or where a heading is needed. The template doesn’t indicate what each heading says, which will vary according to the specifics of the content.  Templates can sometimes incorporate fixed text elements, such as copyright notice in the footer of the page, if they are specific to that page and are unlikely to change.  The critical role that templates play is that they define what’s fixed about a page that the audience will see.  Templates provides the framework for the layout of the content, allowing other aspects of the content to adjust.  

Layout has a subtle effect on how content is delivered and is accessed across different screens.  Elements that are obvious on some screen sizes may not be so on other screen sizes — for example, a list of related articles, or a cross-promotion.  Page templates must address how to make core information consistently available.  

Content Types

Content types indicate what kinds of information and messages audiences need to see to satisfy their goals.  The more specific the audience goal, the most specific the content type is likely to be. For example, many websites have an “article” content type that has only a few basic attributes, such as title, author and body.  Such types aren’t associated with any specific goal.  But a product profile on an e-commerce website will be much more specific, since different elements are important to satisfying user needs for them to decide to buy the product.  The more specific a content type, the more similar each screen of content based on it will seem, even though the specific messages and information will vary. Content types provide consistency in the kinds of information presented for a given scenario.

Content types are designed for a specific audience who has a specific goal. It specifies: to support this purpose, this information must be presented.  It answers: what elements of content needs to be delivered here for this scenario?  One of the benefits of a content type is that it can provide options to show more details, fewer details, or different details, according to the audience and scenario. 

Content types also encode business rules about the display of content. In doing so, they provide the logical structure of content.   If the content model already has defined the specifics of required information, it can pre-populate the information — enabling the reuse of content elements.  

Content Models

Content models indicate the elements of content that are available to support different audiences and scenarios.  They specify the specific kinds of messages the publisher has planned to use across different content.  They specify the semantic structure of the content — or put more simply, how different content elements are related to each other in their meaning.

Content is built from various kinds of messages associated with different topics and having different roles, such as extended descriptions, instructions, calls-to-action, value propositions, admonitions, and illustrations.  The content model provides a overview of the different kinds of essential messages that are available to build different versions and variations of content.  

In some respects, a content model is analogous to a site map.  A site map provides external audiences and systems a picture of the content published on a website.  A content model provides a map of the internal content resources that are available for publication.  But instead of representing a tree of web pages like a site map, the content model presents constellation of  “nodes”  that indicate available information resources.  A node is a basic unit of content that part of and connected to the larger structure of content.  They correspond to a content elements within published content — the units of content described within a pair of HTML tags.

Each node in a content model represents a distinct unit of content covering a discrete message or statement of information. Nodes are connected to other nodes elsewhere.  A node may be empty (authors can supply any message provided it relates to the expected meaning), or a node may be pre-populated with one or more values (indicating that the meaning will have a certain predefined message).  

Content models connect nodes by identifying the relationships between them —  how one element relates to another.  It can show how different nodes are associated, such as what role one node has to another.  For example, one node could be part of another node because is a detail relating to a larger topic.  The relationships provide pathways between different nodes of content.  

Content models are more abstract than other approaches to structuring content, and can therefore be open to wider interpretation about what they do.  The content model represents perhaps the deepest level of content structure, capturing all reusable and variable content elements. 

No single model, template or design system

No single representation of content structure can effectively depict all its different aspects.  I haven’t seen any single view representation that supports the different kinds of design decisions required.  For example, wireframes mix together fixed structures defined by templates with dynamic structures associated with UI design.  When content is embedded within screen comps, it is hard to see which elements are fixed and which are fluid.  Single views promote a tunnel focus on a specific decision, but block visibility into larger considerations that may be involved.  I’ve seen various attempts to improve wireframes to make them more interactive and content-friendly, but the basic limitations remain.

Consider a simple content element: an alert that tells a customer that their subscription is expiring and that they need to submit new payment details.  UI design needs to consider how the alert is delivered where it is noticed but not annoying.  Content design needs to decide on whether to use an existing alert, or write a new one.  The template must decide where within a  page or screen the alert appears.  The content type will specify the rules triggering delivery of the alert: who gets it, and when. And the content model may hold variations of the alert, and their mappings to different content types that use them.  You need a better alert, but what do you need to change?  What should stay the same, so you don’t mess up other things you’ve worked hard to get right?

Such decisions require coordination; different people may be responsible for different aspects. Not only must decisions and tasks be coordinated across people, they must be coordinated across time.  Those involved need to be aware of past decisions, easily reuse these when appropriate, and be able to modify them when not.  Agility is important, but so is governance.

A benefit of content structure is that it can accelerate the creation and delivery of content.  The challenge of content structure is that it’s not one thing.  There are different approaches, and each has its own value to offer.   Web publishers have more tools than ever to solve specific problems. But they still need truly integrated platforms that help web teams coordinate different kinds of decisions relating to specifying and choosing content elements. 

— Michael Andrews

Categories
Content Engineering

Tailless Content Management

There’s a approach to content management that is being used, but doesn’t seem to have a name.  Because it lacks a name, it doesn’t get much attention.  I’m calling this approach tailless content management — in contrast to headless content management.  The tailless approach, and the headless approach, are trying to solve different problems.

What Headless Doesn’t Do

Discussion of content management these days is dominated by headless CMSs.   A crop of new companies offer headless solutions, and legacy CMS vendors are also singing the praises of headless.  Sitecore says: “Headless CMSs mean marketers and developers can build amazing content today, and—importantly—future-proof their content operation to deliver consistently great content everywhere.”  

In simple terms, a headless CMS strips functionality relating to how web pages are presented and delivered to audiences.  It’s supposed to let publishers focus on what the content says, rather than what it looks like when delivered.  Headless CMS is one of several trends to unbundle functionality customarily associated with CMSs.  Another trend is moving the authoring and workflow functionality into a separate application that is friendlier to use.  CMS vendors have long touted that their products can do everything needed to manage the publication of content.  But increasingly content authors and designers are deciding that vendor choices are restricting, rather than helpful.  CMSs have been too greedy in making decisions about how content gets managed.  

“Future-proof” headless CMSs may seem like the final chapter in the evolution of the CMS.  But even headless CMSs can still be very rigid in how they handle content elements.  Many are based on the same technology stack (LAMP) that’s obliquely been causing problems for publishers over the past two decades.   In nearly every CMS, all audience-facing factual information needs to be described as a field that’s attached to a specific content type.  The CMS may allow some degree of content structuring, and the ability to mix different fragments of content in different ways.  But they don’t solve important problems that complex publishers face: the ability to select and optimize alternative content-variables, to use data-variables across different content, and to create dynamic content-variables incorporating data-variables.   To my mind, those three dimensions are the foundation for what a general-purpose approach to content engineering must offer.  Headless solutions relegate the CMS to being an administrative interface for the content.  The CMS is a destination to enter text.  But it often does a poor job supporting editorial decisions, and giving publishers true flexibility.   The CMS design imposes restrictions on how content is constructed.  

Since the CMS no longer worries about the “head”, headless solutions help publishers focus on the body.  But the solution doesn’t help publishers deal with a neglected aspect: the content’s tail.

Content’s ‘Tail’

Humans are one of the few animals without tails.  Perhaps that’s why we don’t tend to talk about the tail as it relates to content.  We sometimes talk about the “long tail” of information people are looking for.  That’s about as close as most discussions get to considering the granular details that appear within content. The long tail is a statistical metaphor, not a zoological one.  

Let’s think about content management as having three aspects: the head at the top (and which is top of mind for most content creators), the body in the middle (which has received more attention lately), and the tail at the end, which few people think much about. 

The head/body distinction in content is well-established.  The metaphor needs to be extended to include the notion of a tail.  Let’s breakdown the metaphor:

  • The head — is the face of the content, as presented to audiences.  
  • The body — are the organs (components) of the content.  Like the components of the human body (heart, lungs, stomach, etc.) the components within the body of content each should have a particular function to play.
  • The tail — are the details in the content (mnemonic: deTails).  The tail provides stability, keeping the body in balance

 In animals, tails play an important role negotiating with their surroundings.  Tails offer balance.  They swat flies.  They can grab branches to steady oneself.  Tails help the body adjust to the environment.  To do this, tails need to be flexible. 

Details can be the most important part of content, just as the tails of some animals are main event. In a park a kilometer from my home in central India, I can watch dozens of peacocks, India’s national bird.  Peacocks show us that tails are not minor details.

When the tail is treated as a secondary aspect of the body, its role gets diminished.  Publishers need to treat data as being just as important as content in the body.  Content management needs to consider both customer-facing data and narrative content as distinct but equally important dimensions.  Data should not be a mere appendage to content. Data has value in its own right.  

With tailless content management, customer-facing data is stored separately from the content using the data.  

The Body and the Details

The distinction between content and data, and between the body and the detail, can be hard to grasp.  The architecture of most CMSs doesn’t make this distinction, so the difference doesn’t seem to exist.  

CMSs typically structure content around database fields.   Each field has a label and an associated value.  Everything that the CMS application needs to know gets stored in this database.  This model emerged when developers realized that HTML pages had regular features and structures, such as having titles and so on. Databases made managing repetitive elements much easier compared to creating each HTML page individually.

The problem is that a single database is trying to many different things at once.  It can be:

  • Holding long “rich” texts that are in the body of an article
  • Holding many internally-used administrative details relating to articles, such as who last revised an article
  • Holding certain audience-facing data, such as the membership services contact telephone number and dates for events

These fields have different roles, and look and behave differently.  Throwing them together in a single database creates complexity.  Because of the complexity, developers are reluctant to add additional structure to how content is managed.  Authors and publishers are told they need to be flexible about what they want, because the central relational database can’t be flexible.  What the CMS offers should be good enough for most people.  After all, all CMSs look and behave the same, so it’s inevitable that content management works this way.

Something perverse happens in this arrangement.  Instead of the publisher structuring the content so it will meet the publisher’s needs, the CMS’s design ends up making decisions about if and how content can be structured.

Most CMSs are attached to a relational database such as mySQL.  These databases are a “kitchen sink” holding any material that the CMS may need to perform its tasks.  

To a CMS, everything is a field.  They don’t distinguish between long text fields that contain paragraphs or narrative content that has limited reuse (such as a teaser or the article body) from data fields with simple values that are relevant across different content items and even outside of the content.  CMSs mix narrative content, administrative data, and editorial data all together. 

A CMS database holds administrative profile information related to each content item (IDs, creation dates, topic tags, etc). The same database is also storing other non-customer facing information that’s more generally administrative such as roles and permission.   In addition to the narrative content and the administrative profile information, the CMS stores customer-facing data that’s not necessarily linked to specific content items. This is information about entities such as products, addresses of offices, event schedules and other details that can be used in many different content items.  Even though entity-focused data can be useful for many kinds of content, these details are often fields of specific content types.  

The design of CMSs reflects various assumptions and priorities.  While everything is a field, some fields are more important than others.  CMSs are optimized to store text, not to store data.  The backend uses a relational database, but it mostly serves as a content repository. 

Everyday Problems with the Status Quo

Content discusses entities.  Those entities involve facts, which are data.  These facts should be described with metadata, though they frequently are not.

A longstanding problem publishers face is that important facts are trapped within paragraphs of content that they create and publish.  When the facts change, they are forced to manually revise all the writing that mentions these facts.  Structuring content into chunks does not solve the problem of making changes within sentences.  Often, factual information is mentioned within unique texts written by various authors, rather than within a single module that is centrally managed.  

Most CMSs don’t support the ability to change information about an entity so that all paragraphs will update that information. 

Let’s consider an example of a scenario that can be anticipated ahead of time.  A number of paragraphs in different content items mention an application deadline date.  The procedure for applying stays the same every year, but the exact date by which someone must apply will change each year.  The application deadline is mentioned by different writers in different kinds of content: various announcement pages, blog posts, reminder emails, etc. In most CMSs today, the author will need to update each unique paragraph where the application is mentioned.  They don’t have the ability to update each mention of the application date from one place.   

Other facts can change, even if not predictably.  Your community organization has for years staged important events in the Jubilee Auditorium at your headquarters.  Lots of content talks about the Jubilee Auditorium.  But suddenly a rich donor has decided to give your organization some money.  To honor the donation, your organization decides to rename Jubilee Auditorium to the Ronald L Plutocrat Auditorium.  After the excitement dies down, you realize that more than the auditorium plaque needs to change.  All kinds of mentions of the auditorium are scattered throughout your online content.  

These examples are inspired by real-life publishing situations.   

Separating Concerns: Data and Content

Contrary to the view of some developers, I believe that content and data are different things, and need to be separated.

Content is more like computer code than it’s like data.  Like computer code, content is about language and expression.  Data is easy to compare and aggregate.  Its values are tidy and predictable.  Content is difficult to compare: it must be diff’d.  Content can’t easily be aggregated, since most items of content are unique.

Each chunk of content is code that will be read by a browser.  The body must indicate what text gets emphasis, what text has links, and what text is a list.  Content is not like data generally stored in databases. It is unpredictable. It doesn’t evaluate to standard data types. Within a database, content can look like a messy glob that happens to have a field name attached to it.

The scripts that a CMS uses must manipulate this messy glob by evaluating each letter character-by-character.  All kinds of meaning are embedded within a content chunk, and some it is hard to access.  

The notion that content is just another form of data that can be stored and managed in a relational database with other data is the original sin of content management.  

It’s considered good practice for developers to separate their data from their code.  Developers though have a habit of co-mingling the two, which is why new software releases can be difficult to upgrade, and why moving between software applications is hard to do.

The inventor of the World Wide Web, Tim Berners-Lee, has lately been talking about the importance of separating data from code, “turning the way the web works upside-down.”  He says: “It’s about separating the apps from the data.”

In a similar vain, content management needs to separate data from content.  

Data Needs Independence

We need to fix the problem with the design of most CMSs, where the tail of data is fused together to the spine of the body.  This makes the tail inflexible.  The tail is dragged along with the body, instead of wagging on its own.  

Data needs to become independent of specific content, so that it can be used flexibly.  Customer-facing data needs to be stored separately from the content that customers view.  There are many reasons why this is a good practice.   And the good news is it’s been done already.

Separating factual data from content is not a new concept.  Many large e-commerce websites have a separate database with all their product details that populates templates that are handled by a CMS.  But this kind of use of specialized backend databases is limited in what it seeks to achieve.  The external database may serve a single purpose: to populate tables within templates.  Because most publishers don’t see themselves as data-driven publishers the way big ecommerce platforms are, they may not see the value of having a separate dedicated backend database.  

Fortunately there’s a newer paradigm for storing data that is much more valuable.  What’s different in the new vision is that data is defined as entity-based information, described with metadata standards.  

The most familiar example of how an independent data store works with content is Wikipedia.  The content we view on Wikipedia is updated by data stored in a separate repository called Wikidata.  The relationship between Wikipedia and Wikidata is bidirectional.  Articles mention factual information, which gets included in Wikidata.  Other articles that mention the same information can draw on the Wikidata to populate the information within articles.

Facts are generally identified with a QID.  The identifier Q95 represents Google.  Google is a data variable.  Depending on the context, Google can be referred to by Google Inc. (as a joint-stock company until 2017) Or Google LLC (as a limited liability company beginning in 2017).  As a data value, the company name can adjust over time.  Editors can also change the value when appropriate.  Google became a subsidiary of Alphabet Inc. (Q20800404) in 2015.  Some content, such as relating to financial performance will address that entity starting in 2015.  Like many entities, companies change names and statuses over time.

How Wikipedia accesses Wikidata. Source: Wikidata

As an independent store of data, Wikidata supports a wide variety of articles, not just one content type.  But its value extends beyond its support for Wikipedia articles.  Wikidata is used by many other third party platforms to supply information.  These include Google, Amazon’s Alexa, and the websites of various museums.

While few publishers operate of the scale of Wikipedia, the benefits of separating data from content can be realized on a small scale as well.  An example is offered by the popular static website generator package called Jekyll, which is used by Github, Shopify, and other publishers.  A plug in for Jekyll lets publishers store their data in the RDF format — a standard that offers significant flexibility.  The data can be inserted into web content, but is a format where it can also be available for access by other platforms. 

Making the Tail Flexible

Data needs to be used within different types of content, and across different channels — including channels not directly controlled by the publisher.

The CMS-centric approach, tethered to a relational database, tries to solve these issues by using APIs.  Unfortunately, headless CMS vendors have interpreted the mantra of “create once, publish everywhere” to mean “enter all your digital information in our system, and the world will come to you, because we offer an API.”  

Audiences need to know simple facts, such as what’s the telephone number for member services, in the case of a membership organization.  They may need to see that information within an article discussing a topic, or they may want to ask Google to tell them while they are making online payments.  Such data doesn’t fit into comfortably into a specific structured content type.  It’s too granular.  One could put it into a larger contact details content type, but that would include lots of other information that’s not immediately relevant.  Chunks of content, unlike data, are difficult to reuse in different scenarios.  Content types, by design, are aligned with specific kinds of scenarios. But defined content structures used to build content types are clumsy supporting general purpose queries or cross-functional uses.    And it wouldn’t help much to make the phone number into an API request.  No ordinary publisher can expect the many third party platforms to read through their API documentation in the event that someone asks their voice bot service about a telephone number.

The only scaleable and flexible way to make data available is to use metadata standards that third party platforms understand.  When using metadata standards, special a API isn’t necessary.  

An independent data store (unlike a tethered database) offers two distinct advantages:

1. The data is multi-use, for both published content and to support other platforms (Google, voice bots, etc.)

2.  The data is multi-source, coming from authors who create/add new data, from other IT systems, and even from outside sources

The ability of the data store to accept new data is also important.  Publishers should grow their data so that they can offer factual information accurately and immediately, wherever it is needed.  When authors mention new facts relating to entities, this information can be added to the database.   In some cases authors will note what’s new and important to include, much like webmasters can note metadata relating to content using Google’s Data Highlighter tool.  In other cases, tools using natural language processing can spot entities, and automatically add metadata.  Metadata provides the mechanism by which data gets connected to content. 

Metadata makes it easier to revise information that’s subject to change, especially information such as prices, dates, and availability.  The latest data is stored in the database, and gets updated there.  Content that mentions such information can indicate the variable abstractly, instead of using a changeable value.  For example: “You must apply by {application date}.”  As a general rule, CMSs don’t make using data variables an easy thing to do.

A separate data store makes it simpler to pull data coming from other sources.  The data store describes information using metadata standards, making is easy to upload information from different sources.  With many CMSs, it is cumbersome to pull information from outside parties.  The CMS is like a bubble.  Everything may work fine as long you as you never want to leave the bubble.  That’s true for simple CMSs such as WordPress, and for even complex component CMSs (CCMSs) that support DITA.  These hosts are self-contained.  They don’t readily accept information from outside sources.  The information needs to be entered into their special format, using their specific conventions.  The information is not independent of the CMS.  The CMS ends of defining the information, rather than simply using it.

A growing number of companies are developing enterprise knowledge graphs — their own sort of Wikidata. These are databases of the key facts that a company needs to refer to.  Companies can use knowledge graphs to enhance the content they publish.  This innovation is possible because these companies don’t rely on their CMS to manage their data.

— Michael Andrews