Should Information be Data-Rich or Content-Rich?

One of the most challenging issues in online publishing is how to strike the right balance between content and data.  Publishers of online information, as a matter of habit, tend to favor either a content-centric, or a data-centric approach.  Publishers may hold deep seeded beliefs about what form of information is most valuable.  Some believe that compelling stories will wow audiences. Others expect that new artificial intelligence agents, providing instantaneous answers, will delight them. This emphasis on information delivery can overshadow consideration of what audiences really need to know and do. How information is delivered can get in the way of what the audience needs. Instead of delight, audiences experience apathy and frustration. The information fails to deliver the right balance between facts, and explanation.

The Cultural Divide

Information on the web can take different forms. Perhaps the most fundamental difference is whether online information provides a data-rich or content-rich experience. Each form of experience has its champions, who promote the virtues of data (or content).  Some go further, and dismiss the value of the approach they don’t favor, arguing that content (or data) actually gets in the way of what users want to know.

  • A (arguing for data-richness): Customers don’t want to read all that text!  They just want the facts.  
  • B (arguing for content-richness): Just showing facts and figures will lull customers to sleep!

Which is more important, offering content or data?  Do users want explanations and interpretations, or do they just want the cold hard facts?  Perhaps it depends on the situation, you think.  Think of a situation where people need information.  Do they want to read an explanation and get advice, or do they want a quick unambiguous answer that doesn’t involve reading (or listening to a talking head)?  The scenario you have in mind, and how you imagine people’s needs in that scenario, probably reveals something about your own preferences and values.  Do you like to compare data when making decisions, or do you like to consider commentary?  Do your own PowerPoint slides show words and images, or do they show numbers and graphs? Did you study a content-centric discipline such as the humanities in university, or did you study a data-centric one such as commerce or engineering? What are your own definitions of what’s helpful or boring?

Our attitudes toward content and data reflect how we value different forms of information.  Some people favor more generalized and interpreted information, and others prefer specific and concrete information.  Different people structure information in different ways, through stories for example, or by using clearly defined criteria to evaluate and categorize information.  These differences may exist within your target audience, just as they may show up within the web team trying to deliver the right information to that audience.  People vary in their preferences. Individuals may shift their personal  preferences depending on topic or situation.  What form of information audiences will find most helpful can elude simple explanations.

Content and data have an awkward relationship. Each seems to involve a distinct mode of understanding.  Each can seem to interrupt the message of the other. When relying on a single mode of information, publishers risk either over-communicating, or under-communicating.

Content and Data in Silhouette

To keep things simple (and avoid conceptual hairsplitting), let’s think about data as any values that are described with an attribute.  We can consider data as facts about something.  Data can be any kind of fact about a thing; it doesn’t need to be a number. Whether text or numeric, data are values that can be counted.

Content can involve many distinct types, but for simplicity, we’ll consider content as articles and videos — containers  where words and images combine to express ideas, stories, instructions, and arguments.

Both data and content can inform.  Content has the power to persuade, as sometimes data can possess that power as well.  So what is the essential difference between them?  Each has distinct limitations.

The Limits of Content

In certain situations content can get in the way of solving user problems.  Many times people are in a hurry, and want to get a fact as quickly as possible.  Presenting data directly to audiences doesn’t always mean people get their questioned answered instantly, of course.  Some databases are lousy answering questions for ordinary people who don’t use databases often.  But a growing range of applications now provide “instant answers” to user queries by relying on data and computational power.  Whereas content is a linear experience, requiring time to read, view or listen, data promises instant experience that can gratify immediately.  After all, who wants to waste their customer’s time?  Content strategy has long advocated solving audience problems as quickly as possible.  Can data obviate the need for linear content?

“When you think about something and don’t really know much about it, you will automatically get information.  Eventually you’ll have an implant where if you think about a fact, it will just tell you the answer.”  Google’s Larry Page, in Steven Levy’s  “In the Plex”.

The argument that users don’t need websites (and their content) is advanced by SEO expert Aaron Bradley in his article “Zero Blue Links: Search After Traffic”.   Aaron asks us to “imagine a world in which there was still an internet, but no websites. A world in which you could still look for and find information, but not by clicking from web page to web page in a browser.”

Aaron notes that within Google search results, increasingly it is “data that’s being provided, rather than a document summary.”  Audiences can see a list of product specs, rather than a few sentences that discuss those specs. He sees this as the future of how audiences will access information on different devices.  “Users of search engines will increasingly be the owners of smart phones and smart watches and smart automobiles and smart TVs, and will come to expect seamless, connected, data-rich internet experiences that have nothing whatsoever to do with making website visits.”

In Aaron’s view, we are seeing a movement from “documents to data” on the web. “The evolution of search results in terms of the gradual supplanting of document references by data than it is to infer that direction through the enumeration of individual features.”  No need to read a document: search results will answer the question.  It’s an appealing notion, and one that is becoming more commonplace.  Content isn’t always necessary if clear, unambiguous data is available that can answer the question.

Google, or any search engine, is just a channel — an important one for sure, but not the end-all and be-all.  Search engines locate information created by others, but unless they have rights to that information, they are limited in what they do with it. Yet the principles here can apply to other kinds of interactive apps, channels and platforms that let users get information instantly, without wading through articles or videos.  So is content now obsolete?

There is an important limitation to considering SEO search results as data.  Even though the SEO community refers to search metadata as “structured data”, the use of this term is highly misleading.  The values described by the metadata aren’t true data that can be counted.  They are values to display, or are links to other values.  The problem with structured data as currently practiced is that is doesn’t enforce how the values need to be described.  The structured data values are never validated, so computers can’t be sure if two prices appearing on two random websites are both quoting the same currency, even if both mention dollars.  SEO structured data rarely requires controlled vocabulary for text values, and most of its values doesn’t include or mandate data typing that computers would need to aggregate and compare different values.  Publishers are free to use most any kind of text value they like in many situations.   The reality of SEO structured data is less glamorous than its image: much of the information described by SEO structured data is display content for humans to read, rather than data for machines to transform.  The customers who scan Google’s search results are people, not machines.  People still need to evaluate the information, and decide its credibility and relevance.  The values aren’t precise and reliable enough for computers to make such judgements.

When an individual wants to know what time a shop closes, it’s a no brainer to provide exactly that information, and no more. The strongest cases for presenting data directly is when the user already knows exactly what they want to know, and they will understand the meaning and significance of the data shown.  These are the “known unknowns” (or “knowns but forgotten”) use cases.  Plenty of such cases exist.  But while the lure of instant gratification is strong, people aren’t always in a rush to get answers, and in many cases they shouldn’t be in a rush, because the question is bigger than a single answer can address.

The Limits of Data

Data in various circumstances can get in the way of what interests audiences.  At a time when the corporate world increasingly extols the virtues of data, it’s important to recognize when data can be useless, because it doesn’t answer questions that audiences have.  Publishers should identify when data is oversold, as always being what audiences want.  Unless data reflects audiences priorities, the data is junk as far as audiences are concerned.

Data can bring credibility to content, though has the potential to confuse and mislead as well.  Audiences can be blinded by data when it is hard to comprehend, or is too voluminous. Audiences need to be interested in the data for it to provide them with value.  Much of the initial enthusiasm for data journalism, the idea of writing stories based on the detailed analysis of facts and statistics, has receded.  Some stories have been of high quality, but many weren’t intrinsically interesting to large numbers of viewers.  Audiences didn’t necessarily see themselves in the minutiae, or feel compelled to interact with raw material being offered to them.  Data journalism stories are different from commercially oriented information, which have well defined use cases specifying how people will interact with data.  Data journalism can presume people will be interested in topics simply because public data on these topics is available.  However, this data may be collected for a different purpose, often for technical specialists.  Presenting it doesn’t transform it into something interesting to audiences.

The experience of data journalism shows that not all data is intrinsically interesting or useful to audiences.  But some technologists believe that making endless volumes of data available is intrinsically worthwhile, because machines have the power to unlock value from the data that can’t be anticipated.

The notion that “data is God” has fueled the development of the semantic web approach, which has subsequently been  rebranded as “linked data”.  The semantic web has promised many things, including giving audiences direct access to information without the extraneous baggage of content.  It even promised to make audiences irrelevant in many cases, by handing over data to machines to act on, so that audiences don’t even need to view that data.  In its extreme articulation, the semantic web/linked data vision considers content as irrelevant, and even audiences as irrelevant.

These ideas, while still alive and championed by their supporters, have largely failed to live up to expectations.  There are many reasons for this failure, but a key one has been that proponents of linked data have failed to articulate its value to publishers and audiences. The goal of linked data always seems to be to feed more data to the machine.  Linked data discussions get trapped in the mechanics of what’s best for machines (de-referencable URIs,  machine values that have no intrinsic meaning to humans), instead of what’s useful for people.

The emergence of schema.org (the structured data standard used in SEO) represents a step back from such machine-centric thinking, to accommodate at least some of the needs of human metadata creators by allowing text values. But schema.org still doesn’t offer much in the way of controlled vocabularies for values, which would be both machine-reliable and human-friendly.  It only offers a narrow list of specialized “enumerations”, some of which are not easy-to-read text values.

Schema.org has lots of potential, but its current capabilities get over-hyped by some in the SEO community.  Just as schema.org metadata should not be considered structured data, it is not really the semantic web either.  It’s unable to make inferences, which was a key promise of the semantic web.  Its limitations show why content remains important. Google’s answer to the problem of how to make structured data relevant to people was the rich snippet.  Rich snippets displayed in Google search results are essentially a vanity statement. Sometimes these snippets answer the question, but other times they simply tease the user with related information.  Publishers and audiences alike may enjoy seeing an extract of content in search results, and certainly rich snippets are a positive development in search. But displaying extracts of information does not represent an achievement of the power of data.  A list of answers supplied by rich snippets is far less definitive than a list of answers supplied by a conventional structured query database — an approach that has been around for over three decades.

The value of data comes from its capacity to aggregate, manipulate and compare information relating to many items.  Data can be impactful when arranged and processed in ways that change an audience’s perception and understanding of a topic. Genuine data provides values that can be counted and transformed, something that schema.org doesn’t support very robustly, as previously mentioned.  Google’s snippets, when parsing metadata values from articles, simply display fragments  from individual items of content.  A list of snippets doesn’t really federate information from multiple sources into a unified, consolidated answer.  If you ask Google what store sells the cheapest milk in your city, Google can’t directly answer that question, because that information is not available as data that can be compared.  Information retrieval (locating information) is not the same as data processing (consolidating information).

“What is the point of all that data? A large data set is a product like any other. It must be maintained and updated, given attention. What are we to make of it?”  Paul Ford in “Usable Data

But let’s assume that we do have solid data that machines can process without difficulty.  Can that data provide audiences with what they need?  Is content unnecessary when the data is machine quality?  Some evidence suggests that even the highest quality linked data isn’t sufficient to interest audiences.

The museum sector has been interested in linked data for many years.  Unlike most web publishers, they haven’t been guided by schema.org and Google.  They’ve been developing their own metadata standards.  Yet this project has had its problems.  The data lead of a well known art museum complained recently of the “fetishization of Linked Open Data (LOD)”.  Many museums approached data as something intrinsically valuable, without thinking through who would use the data, and why.  Museums reasoned that they have lots of great content (their collections) and that they needed to provide information about their collections online to everyone, so that linked data was the way to do that.  But the author notes: ‘“I can’t wait to see what people do with our data” is not a clear ROI.’  When data is considered as the goal, instead of as a means to a goal, then audiences get left out of the picture.  This situation is common to many linked data projects, where getting data into a linked data structure becomes an all consuming end, without anchoring the project in audience and business needs.  For linked data to be useful, it needs to address specific use cases for people relying on the data.

Much magical thinking about linked data involves two assumptions: that the data will answer burning questions audiences have, and these answers will be sufficient to make explanatory content unnecessary.  When combined, these assumptions become one: everything you could possibly want to know is now available as a knowledge graph.

The promise that data can answer any question is animating development of knowledge graphs and “intelligent assistants” by nearly every big tech company: Google, Bing, LinkedIn, Apple, Facebook, etc.  This latest wave of data enthusiasm again raises questions whether content is becoming less relevant.

Knowledge graphs are a special form of linked data.  Instead of the data living in many places, hosted by many different publishers, the data is instead consolidated into a single source curated by one firm, for example, Bing. A knowledge graph combines millions of facts about all kinds of things into a single data set. A knowledge graph creator generally relies on other publisher’s linked data. But it assumes responsibility for validating that data itself when incorporating the information in its knowledge graph.  In principle, the information is more reliable, both factually and technically.

Knowledge graphs work best for persistent data (the birth year of a celebrity) but less well for high velocity data that can change frequently (the humidity right now).   Knowledge graphs can be incredibly powerful.  They can allow people to find connections between pieces of data that might not seem related, but are.  Sometimes these connections are simply fun trivia (two famous people born in the same hospital on the same day). Other times these connections are significant as actionable information.  Because knowledge graphs hold so much potential, it is often difficult to know how they can be used effectively.   Many knowledge graph use cases relate to open ended exploration, instead of specific tasks that solve well defined user problems.   Few people can offer a succinct, universally relevant reply to the question: “What problem does a knowledge graph solve?” Most of the success I’ve seen for knowledge graphs has been in specialized vertical applications aimed at researchers, such as biomedical research or financial fraud investigations.  To be useful to general audiences, knowledge graphs require editorial decisions that queue up on-topic questions, and return information relevant to audience needs and interests.  Knowledge graphs are less useful when they simply provide a dump of information that’s related to a topic.

Knowledge graphs combine aspects of Wikipedia (the crowdsourcing of data) with aspects of a proprietary gatekeeping platform such as Facebook (the centralized control of access to and prioritization of information).  No one party can be expected to develop all the data needed in a knowledge graph, yet one party needs to own the graph to make it work consistently — something that doesn’t always happen with linked data.   The host of the knowledge graph enjoys a privileged position: others must supply data, but have no guarantee of what they receive in return.

Under this arrangement, suppliers of data to a knowledge graph can’t calculate their ROI. Publishers are back in the situation where they must take a leap of faith that they’ll benefit from their effort.  Publishers are asked to supply data to a service on the basis of a vague promise that the service will provide their customers with helpful answers.  Exactly how the service will use the data is often not transparent. Knowledge graphs don’t reveal what data gets used, and when.   Publisher also know their rivals are also supplying data to the same graph.  The faith-based approach to developing data, in hopes that it will be used, has a poor track record.

The context of data retrieved from a knowledge graph may not be clear.  Google, Siri, Cortana, or Alexa may provide an answer.  But on what basis do they make that judgment?  The need for context to understand the meaning of data leads us back to content.   What a fact means may not be self-evident. Even facts that seem straightforward can depend on qualified definitions.

“A dataset precise enough for one purpose may not be sufficiently precise for another. Data on the Web may be wrong, or wrong in some context—with or without intent.” Bernstein, Hendler & Noy

The interaction between content and data is becoming even more consequential as the tech industry promotes services incorporating artificial intelligence.  In his book Free Speech, Timothy Garton Ash shared his experience using WolfamAlpha, a semantic AI platform that competes with IBM Watson, and that boldly claims to make the “world’s knowledge computable.”  When Ash asked WolfamAlpha “How free should speech be?”, it replied: “WolframAlpha doesn’t understand your query.”   This kind of result is entirely expected, but it is worth exploring why something billed as being smart fails to understand.  Conversational interfaces, after all, are promising to answer our questions.  Data needs to exist for questions to get answers.  For data to operate independently of content, an answer must be expressible as data. But many answers can’t be reduced to one or two values.  Sometimes they involve many values.  Sometimes answers can’t be expressed as a data value at all. This actuality means that content will always be necessary for some answers.

Data as a Bridge to Content

Data and content have different temperaments.  The role of content is often to lead the audience to reveal what’s interesting.  The role of data is frequently to follow the audience as they indicate their interests. Content and data play complementary roles.  Each can be incomplete without the other.

Content, whether articles, video or audio, is typically linear.  Content is meant to be consumed in a prescribed order.   Stories have beginnings and ends, and procedures normally have fixed sequences of steps.  Hyperlinking content provides a partial solution to making a content experience less linear, when that is desired.  Linear experiences can be helpful when audiences need orientation, but they are constraining when such orientation isn’t necessary.

Data, to be seen, must first be selected. Publishers must select what data to highlight, or they must delegate that task to the audience. Data is non-linear: it can be approached in any order.  It can be highly interactive, providing audiences with the ability to navigate and explore the information in any order, and change the focus of the information.  With that freedom comes the possibility that audiences get lost, unable to identify information of value.  What data means is highly dependent on the audience’s previous understanding.  Data can be explained with other data, but even these explanations require prior  knowledge.

From an audience perspective, data plays various roles.  Sometimes data is an answer, and the end of a task.  Sometimes data is the start of a larger activity.  Data is sometimes a signal that a topic should be looked at more closely.  Few people decide to see a movie based on an average rating alone.  A high rating might prompt someone to read about the film.  Or the person may be already be interested in reading about the film, and consults the average rating simply to confirm their own expectation of whether they’d like it.  Data can be an entryway into a topic, and a point of comparison for audiences.

Writers can undervalue data because they want to start with the story they wish to tell, rather than the question or fact that prompts initial interest from the audience.   Audiences often begin exploration by seeking out a fact. But what that fact may be will be different according to each individual.  Content needs facts to be discovered.

Data evangelists can undervalue content because they focus on the simple use cases, and ignore the messier ones.  Data can answer questions only in some situations.  In an ideal world, a list of questions and answers get paired together as data. Just match the right data with the right question.  But audiences may find it difficult to articulate the right question, or they may not know what question to ask. Audiences may find they need to ask so many specific questions to develop a broad understanding.  They may find the process of asking questions exhausting.  Search engines and intelligent agents aren’t going to Socratically enlighten us about new or unfamiliar topics.  Content is needed.

Ultimately, whether data or content is most important depends on how much communication is needed to support the audience.  Data supplies answers, but doesn’t communicate ideas.  Content communicates ideas, but can fail to answer if it lacks specific details (data) that audiences expect.

No bold line divides data from content.  Even basic information, such as expressing how to do something, can be approached either episodically as content, or atomically as data.  Publishers can present the minimal facts necessary to perform a task (the must do’s), or they can provide a story about possibilities of tasks to do (the may do’s).  How should they make that decision?

In my experience, publishers rarely create two radically alternative versions of online information, a data-centric and content-centric version, and test these against each other to see which better meets audience needs.  Such an approach could help publishers understand what the balance between content and data needs to be.  It could help them understand how much communication is required, so the information they provide is never in the way of the audience’s goals.

— Michael Andrews

Content & Decisions: A Unified Framework

Many organizations face a chasm between what they say they want to do, and what they are doing in practice.  Many say they want to transition toward digital strategy.  In practice, most still rely on measuring the performance of individual web pages, using the same basic approach that’s been around for donkey’s years. They have trouble linking the performance of their digital operations to their high level goals. They are missing a unified framework that would let them evaluate the relationship between content and decisions.

Why is a Unified Framework important?

Organizations, when tracking how successful they are doing, tend to focus on web pages: abandonment rates, clicks, conversions, email opening rates, likes, views, and so on. Such granular measurements don’t reveal the bigger picture of how content is performing within the publishing organization. Even multi-page measurements such as funnels are little more than an arbitrary linking of discrete web pages.

Tracking the performance of specific web pages is necessary, but not sufficient. But because each page is potentially unique, summary metrics of different pages don’t explain variations in performance.   Page-level metrics tell how specific pages perform, but they don’t address important variables that transcend different pages, such as which content themes are popular, or which design features are being adopted.

Explaining how content fits into digital business strategy is a bit like trying to describe an elephant without being able to see the entire animal. Various people within an organization focus on different digital metrics. How all these metrics interact gets murky.  Operational staff commonly track lower level variables about specific elements or items. Executives track metrics that represent higher level activities and events, which have resource and revenue implications that don’t correspond to specific web pages.

Metadata can play an important role connecting information about various activities and events, and transcend the limitations of page-level metrics.  But first, organizations need a unified framework to see the bigger picture of how their digital strategy relates to their customers.

Layers of Activities and Decisions

To reveal how content relates to other decisions, we need to examine content at different layers. Think of these layers as a stack. One layer consists of the organization publishing content.  Another layer comprises the customers of the organization, the users of the organization’s content and products.  At the center is the digital interface, where organizations interact with their users.

We also need to identify how content interacts with other kinds of decisions within each layer.  Content always plays a supporting role.  The challenge is to measure how good a job it is doing supporting the goals of various actors.

Diagram showing relationships between organizations, their digital assets, and users/customers, and the interaction between content and platforms..

First let’s consider what’s happening within the organization that is publishing content.  The organization makes business decisions that define what the business sells to its customers, and how it services its customers.  Content needs to support these decisions.  The content strategy needs to support the business strategy.  As a practical matter, this means that the overall publishing activity (initiatives, goals, resources) needs to reflect the important business decisions that executives have made about what to emphasize and accomplish.  For example, publishing activity would reflect marketing priorities, or branding goals.  Conversely, an outsider could view the totality of an organization’s content, by viewing their website, and should get a sense of what’s important to that organization.  Publishing activity reveals an organization’s brand and priorities.

The middle layer is composed of assets that the organization has created for their customers to use.  This layer has two sides: the stock of content that’s available, and digital platforms customers access.  The stock of content reflects the organization’s publishing activity .  The digital platforms reflect the organization’s business decisions.  Digital platforms are increasingly an extension of the products and services the organization offers.  Customers need to access the digital platforms to buy the product or service, to use the product or service, and to resolve any problems after purchase.  Content provides the communications that customers need to access the platform.  Because of this relationship, the creation of content assets and the designs for digital platforms are commonly coordinated during their implementation.

Within the user layer, the customer accesses content and platforms.  They choose what content to view, and make decisions about how to buy, use, and maintain various products and services.  The relationship between content activity and user decisions is vital, and will be discussed shortly.  But its importance should not overshadow the influence of the other layers.  The user layer should not be considered in isolation from other decisions and activities that an organization has made.

Feedback loops Between and Within Layers

Let’s consider how the layers interact.  Each layer has a content dimension, and a platform dimension, at opposite ends.  Content dimensions interact with each other within feedback loops, as do platform dimensions.  The content and platform dimensions ultimately directly interact with each other in a feedback loop within the user layer.

On the content side, the first feedback loop, the publishing operations loop, relates to how publishing activity affects the stock of content.  The organization decides the broad direction of its publishing. For many organizations, this direction is notional, but more sophisticated organizations will use structured planning to align their stock of content with the comprehensive goals they’ve set for the content overall.  This planning involves not only the creation of new content, but the revision of the existing stock of content to reflect changes in branding, marketing, or service themes.   The stock of content evolves as the direction of overall publishing activity changes.  At the same time, the stock of content reflects back on the orientation of publishing activity.  Some content is created or adjusted outside of a formal plan.  Such organic changes may be triggered in response to signals indicating how customers are using existing content. Publishers can compare their plans, goals, and activities, with the inventory of content that’s available.

The second content feedback loop, the content utilization loop, concerns how audiences are using content.  Given a stock of content available, publishers must decide what content to prioritize.  They make choices concerning how to promote content (such as where to position links to items), and how to deliver content (such as which platforms to make available for customers to access information).  At the same time, audiences are making their own choices about what content to consume.  These choices collectively suggest preferences of certain kinds of content that are available within the stock of content.

When organizations consider the interaction between the two loops of feedback, they can see the connection between overall publishing activity, and content usage activity.  Is the content the organization wants to publish the content that audiences want to view?

Two feedback loops are at work on the platform side as well.  The first, the business operations loop, concerns how organizations define and measure goals for their digital platforms.  Product managers will have specific goals, reflecting larger business priorities, and these goals get embodied in digital platforms for customers to access.  Product metrics on how customers access the platform provide feedback for adjusting goals, and inform the architectural design of platforms to realize those goals.

The second platform loop, the design optimization loop, concerns how the details of platform designs are adjusted.  For example, designs may be composed of different reusable web components, which could be tied to specific business goals.  Design might, as an example, feature a chatbot that provides a cost savings or new revenue opportunity. The design optimization loop might look at how to improve the utilization of that chatbot functionality.  How users adopt that functionality will influence the optimization (iterative evolution) of its design. The architectural decision to introduce a chatbot, in contrast, would have happened within the business operations loop.

As with the content side, the two feedback loops on the platform side can be linked, so that the relationship between business decisions and user decisions is clearer.  User decisions may prompt minor changes within the design optimization loop, or if significant, potentially larger changes within the business operations loop.  Like content, a digital platform is an asset that requires continual refinement to satisfy both user and business goals.

The two parallel sides, content and design, meet at the user layer.  User decisions are shaped both by the design of the platforms they are accessing, as well as content they are consuming while on the platform.  Users need to know what they can do, and want to do it.  Designs need to support users access to content they need when making a decision. That content needs to provide users with the knowledge and confidence for their decision.

The relationship between content and design can sometimes seem obvious when looking at a web page.  But in cases where content and design don’t support each other, web pages aren’t necessarily the right structure to fix problems.  User experiences can span time and devices.  Some pages will be more about content, and other pages more about functionality. Relevant content and functionality won’t always appear together.  Both content and designs are frequently composed from reusable components.  Many web pages may suffer from common problems stemming from faulty components, or the wrong mix of components. The assets (content and functionality) available to customers may be determined by upstream decisions that can’t be fixed on a page level. Organizations need ways to understand larger patterns of user behavior, to see how content and designs support each other, or fail to.

Better Feedback

Content and design interact across many layers of activities and decisions. Organizations must first decide what digital assets to create and offer customers, and then must refine these so that they work well for users.  Organizations need more precise and comprehensible feedback on how their customers access information and services.  The content and designs that customers access are often composed from reusable components that appear in different contexts. In such cases, page-level metrics are not sufficient to provide situational insights.  Organizations need usage feedback that can be considered at the strategic layer.  They need the ability to evaluate global patterns of use to identify broad areas to change.

In a future post, I will draw on this framework to return to the topic of how descriptive, structural, technical and administrative metadata can help organizations develop deeper insights into the performance of both their content and their designs.  If you are not already familiar with these types of metadata, I invite you to learn about them in my recent book, Metadata Basics for Web Content, available on Amazon.

— Michael Andrews

Landscape of Content Variation

Publishers understandably want to leverage what they’ve already produced when creating new content.  They need to decide how to best manage and deliver new content that’s related to — but different from — existing content. To create different versions of content, they have three options, which I will refer to as the template-based, compositional, and elastic approaches.

To understand how the three approaches differ, it is useful to consider a critical distinction: how content is expressed, as distinct from the details the content addresses.

When creating new content, publishers face a choice of what existing material to use again, and what to change.  Should they change the expression of existing content, or the details of that content?  The answer will depend on whether they are seeking to amplify an existing core message, or to extend the message to cover additional material.  That core message straddles between expression (how something is said) and details (specifics), which is one reason both these aspects, the style and the substance, get lumped together into a generic idea of “content”.  Telling an author to simply “change the content” does not indicate whether to change the connotation or denotation of the content.  They need more clarity on the goal of the change.

Content variation results from the interaction of the two dimensions:

  1. The content expression (the approach of written prose or other manifestations such as video)
  2. The details (facts and concrete information).

Both expression and details can vary.  Publishers can change both the expression and the details of content, or they can focus on just one of the dimensions.

The interplay of content expression and details can explain a broad range of content variation.  Content management professionals commonly explain content variation by referring to a more limited concept: content structure —  the inclusion and arrangement of chunk-size components or sections.  Content structure does influence content variation in many cases, but not in all cases. Expressive variation can result when content is made up of different structural components.  Variation in detail can take place within a common structural component.   But rearranging content structure is not the only, or even necessarily the preferred, way to manage content variation.  Much content lacks formal structure, even though the content follows distinguishable variations that are planned and managed.

The expression of content (for example, the wording used) can be either fixed (static, consistent or definitive) or fluid (changeable or adaptable).  A fixed expression is present when all content sounds alike, even if the particulars of the content are different.  As an example, a “form” email is a fixed expression, where the only variation is whether the email is addressed to Jack or to Jill.  When the expression of content is fluid,  in contrast, the same basic content can exist in many forms.  For example, an anecdote could be expressed as a written short story, as a dramatized video clip, or as a comic book.

Details in content can also be either fixed, or they can vary.  Some details are fixed, such as when all webpages include the same contact details.  Other content is entirely about the variation of the details.  For example, tables often look similar (their expression is fixed), though their details vary considerably.

Diagram showing how both expression and details in content can vary (revised).  NB: elastic content can also fluidly address a diverse range of details, but its unique power comes from its ability to express the same fixed details different ways.

Now let’s look at three approaches for varying content.  Only one relies on leveraging structures within content, while the other two exist without using structure.

Template-based content has a fixed expression.  Think of a form letter, where details are merged into a fixed body of text.  With template-based content, the details vary, and are frequently what’s most significant about the content.   Template-based content resembles a “mad libs” style of writing, where the basic sentence structure is already in place, and only certain blanks get filled in with information.  Much of the automated writing referred to as robo-journalism relies on templates.  The Associated Press will, for example, feed variables into a template to generate thousands of canned sports and financial earnings reports.  Needless to say, the rigid, fixed expression of template-based writing rates low on the creativity scale.  On the other hand, fixed expression is valuable when even subtle changes in wording might cause problems, such as in legal disclaimers.

Compositional content relies on structural components.  It is composed of different components that are fixed, relying on a process known as transclusion.  These components may include informational variables, but most often do not.  The expression of the content will vary according to which components are selected and included in the delivered content.  Compositional content allows some degree of customization, to reflect variations in interests and detail desired.  Content composed from different components can offer both expressive variation and consistency in content to some degree, though there is ultimately a intrinsic tradeoff in those goals.  Generally the biggest limitation of compositional content is that its range of variation is limited.  Compositional variation increases complexity, which tends to prioritize creating consistency in content instead of variation.  Compositional content can’t generate novel variation, since it must rely on existing structures to create new variants.

Elastic content is content that can be expressed in a multitude of ways.  With elastic content, the core informational details stay constant, but how these details are expressed will change. None of the content is fixed, except for the details.  In fact, so much variation in expression is possible that publishers may not notice how they can reuse existing informational details in new contexts.  Elastic content can even morph in form, by changing media.

Authors tend to repeat facts in content they create.  They may want to keep mentioning the performance characteristic of a product, or an award that it has won. Such proof points may appeal to the rational mind, but don’t by themselves stimulate  much interest.  To engage the reader’s imagination, the author creates various stories and narratives that can illustrate or reinforce facts they want to convey.  Each narrative is a different expression, but the core facts stay constant.  Authors rely on this tactic frequently, but sometimes unconsciously.  They don’t track how many separate narratives draw on the same facts. They can’t tell if a story failed to engage audiences because its expression was dull, or because the factual premise accompanying the narrative had become tired, and needs changing.  When authors track these informational details with metadata, they can monitor which stories mention which facts, and are in a better position to understand the relationships between content details and expression.

Machines can generate elastic content as well.   When information details are defined by metadata, machines can use the metadata to express the details in various ways.  Consider content indicating the location of a store or an event.  The same information, captured as a geo-coordinate value in metadata, can be expressed multiple ways.  It can be expressed as a text address, or as a map.  The information can also be augmented, by showing a photo of the location, or with a list of related venues that are close by.  The metadata allows the content to become versatile.

As real time information becomes more important in the workplace, individuals are discovering they want that information in different ways.  Some people want spreadsheet-like tools they can use to process and refine the raw alphanumeric values.  Others want data summarized in graphic dashboards.  And a growing number want the numbers and facts translated into narrative reports that highlight, in sentences, what is significant about the information.  Companies are now offering software that assesses information, contextualizes it, and writes narratives discussing the information.  In contrast to the fill-in-the-blank feeding of values in a template, this content is not fixed.  The content relies on metadata (rather than a blind feed as used in templates); the description changes according to the information involved.  The details of the information influence how the software creates the narrative.   By capturing key information as metadata, publishers have the ability to amplify how they express that information in content.  Readers can get a choice of what medium to access the information.

The next frontier in elastic content will be conversational interfaces, where natural language generation software will use informational details described with metadata, to generate a range of expressive statements on topics.  The success of conversational interfaces will depend on the ability of machines to break free from robotic, canned, template-based speech, and toward more spontaneous and natural sounding language that adapts to the context.

Weighing Options

How can publishers leverage existing content, so they don’t have to start from scratch?  They need to understand what dimensions of their content that might change.  They also need to be realistic about what future needs can be anticipated and planned for.  Sometimes publishers over-estimate how much of their content will stay consistent, because they don’t anticipate the circumstantial need for variation.

Information details that don’t change often, or may be needed in the future, should be characterized with metadata.  In contrast, frequently changing and ephemeral details could be handled by a feed.

Standardized communications lend themselves to templates, while communications that require customization lend themselves to compositional approaches using different structural components.  Any approach that relies on a fixed expression of content can be rendered ineffective when the essence of the communication needs to change.

The most flexible and responsive content, with the greatest creative possibilities, is elastic content that draws on a well- described body of facts.  Publishers will want to consider how they can reuse information and facts to compose new content that will engage audiences.

— Michael Andrews