Category Archives: Content Engineering

Landscape of Content Variation

Publishers understandably want to leverage what they’ve already produced when creating new content.  They need to decide how to best manage and deliver new content that’s related to — but different from — existing content. To create different versions of content, they have three options, which I will refer to as the template-based, compositional, and elastic approaches.

To understand how the three approaches differ, it is useful to consider a critical distinction: how content is expressed, as distinct from the details the content addresses.

When creating new content, publishers face a choice of what existing material to use again, and what to change.  Should they change the expression of existing content, or the details of that content?  The answer will depend on whether they are seeking to amplify an existing core message, or to extend the message to cover additional material.  That core message straddles between expression (how something is said) and details (specifics), which is one reason both these aspects, the style and the substance, get lumped together into a generic idea of “content”.  Telling an author to simply “change the content” does not indicate whether to change the connotation or denotation of the content.  They need more clarity on the goal of the change.

Content variation results from the interaction of the two dimensions:

  1. The content expression (the approach of written prose or other manifestations such as video)
  2. The details (facts and concrete information).

Both expression and details can vary.  Publishers can change both the expression and the details of content, or they can focus on just one of the dimensions.

The interplay of content expression and details can explain a broad range of content variation.  Content management professionals commonly explain content variation by referring to a more limited concept: content structure —  the inclusion and arrangement of chunk-size components or sections.  Content structure does influence content variation in many cases, but not in all cases. Expressive variation can result when content is made up of different structural components.  Variation in detail can take place within a common structural component.   But rearranging content structure is not the only, or even necessarily the preferred, way to manage content variation.  Much content lacks formal structure, even though the content follows distinguishable variations that are planned and managed.

The expression of content (for example, the wording used) can be either fixed (static, consistent or definitive) or fluid (changeable or adaptable).  A fixed expression is present when all content sounds alike, even if the particulars of the content are different.  As an example, a “form” email is a fixed expression, where the only variation is whether the email is addressed to Jack or to Jill.  When the expression of content is fluid,  in contrast, the same basic content can exist in many forms.  For example, an anecdote could be expressed as a written short story, as a dramatized video clip, or as a comic book.

Details in content can also be either fixed, or they can vary.  Some details are fixed, such as when all webpages include the same contact details.  Other content is entirely about the variation of the details.  For example, tables often look similar (their expression is fixed), though their details vary considerably.

Diagram showing how both expression and details in content can vary (revised).  NB: elastic content can also fluidly address a diverse range of details, but its unique power comes from its ability to express the same fixed details different ways.

Now let’s look at three approaches for varying content.  Only one relies on leveraging structures within content, while the other two exist without using structure.

Template-based content has a fixed expression.  Think of a form letter, where details are merged into a fixed body of text.  With template-based content, the details vary, and are frequently what’s most significant about the content.   Template-based content resembles a “mad libs” style of writing, where the basic sentence structure is already in place, and only certain blanks get filled in with information.  Much of the automated writing referred to as robo-journalism relies on templates.  The Associated Press will, for example, feed variables into a template to generate thousands of canned sports and financial earnings reports.  Needless to say, the rigid, fixed expression of template-based writing rates low on the creativity scale.  On the other hand, fixed expression is valuable when even subtle changes in wording might cause problems, such as in legal disclaimers.

Compositional content relies on structural components.  It is composed of different components that are fixed, relying on a process known as transclusion.  These components may include informational variables, but most often do not.  The expression of the content will vary according to which components are selected and included in the delivered content.  Compositional content allows some degree of customization, to reflect variations in interests and detail desired.  Content composed from different components can offer both expressive variation and consistency in content to some degree, though there is ultimately a intrinsic tradeoff in those goals.  Generally the biggest limitation of compositional content is that its range of variation is limited.  Compositional variation increases complexity, which tends to prioritize creating consistency in content instead of variation.  Compositional content can’t generate novel variation, since it must rely on existing structures to create new variants.

Elastic content is content that can be expressed in a multitude of ways.  With elastic content, the core informational details stay constant, but how these details are expressed will change. None of the content is fixed, except for the details.  In fact, so much variation in expression is possible that publishers may not notice how they can reuse existing informational details in new contexts.  Elastic content can even morph in form, by changing media.

Authors tend to repeat facts in content they create.  They may want to keep mentioning the performance characteristic of a product, or an award that it has won. Such proof points may appeal to the rational mind, but don’t by themselves stimulate  much interest.  To engage the reader’s imagination, the author creates various stories and narratives that can illustrate or reinforce facts they want to convey.  Each narrative is a different expression, but the core facts stay constant.  Authors rely on this tactic frequently, but sometimes unconsciously.  They don’t track how many separate narratives draw on the same facts. They can’t tell if a story failed to engage audiences because its expression was dull, or because the factual premise accompanying the narrative had become tired, and needs changing.  When authors track these informational details with metadata, they can monitor which stories mention which facts, and are in a better position to understand the relationships between content details and expression.

Machines can generate elastic content as well.   When information details are defined by metadata, machines can use the metadata to express the details in various ways.  Consider content indicating the location of a store or an event.  The same information, captured as a geo-coordinate value in metadata, can be expressed multiple ways.  It can be expressed as a text address, or as a map.  The information can also be augmented, by showing a photo of the location, or with a list of related venues that are close by.  The metadata allows the content to become versatile.

As real time information becomes more important in the workplace, individuals are discovering they want that information in different ways.  Some people want spreadsheet-like tools they can use to process and refine the raw alphanumeric values.  Others want data summarized in graphic dashboards.  And a growing number want the numbers and facts translated into narrative reports that highlight, in sentences, what is significant about the information.  Companies are now offering software that assesses information, contextualizes it, and writes narratives discussing the information.  In contrast to the fill-in-the-blank feeding of values in a template, this content is not fixed.  The content relies on metadata (rather than a blind feed as used in templates); the description changes according to the information involved.  The details of the information influence how the software creates the narrative.   By capturing key information as metadata, publishers have the ability to amplify how they express that information in content.  Readers can get a choice of what medium to access the information.

The next frontier in elastic content will be conversational interfaces, where natural language generation software will use informational details described with metadata, to generate a range of expressive statements on topics.  The success of conversational interfaces will depend on the ability of machines to break free from robotic, canned, template-based speech, and toward more spontaneous and natural sounding language that adapts to the context.

Weighing Options

How can publishers leverage existing content, so they don’t have to start from scratch?  They need to understand what dimensions of their content that might change.  They also need to be realistic about what future needs can be anticipated and planned for.  Sometimes publishers over-estimate how much of their content will stay consistent, because they don’t anticipate the circumstantial need for variation.

Information details that don’t change often, or may be needed in the future, should be characterized with metadata.  In contrast, frequently changing and ephemeral details could be handled by a feed.

Standardized communications lend themselves to templates, while communications that require customization lend themselves to compositional approaches using different structural components.  Any approach that relies on a fixed expression of content can be rendered ineffective when the essence of the communication needs to change.

The most flexible and responsive content, with the greatest creative possibilities, is elastic content that draws on a well- described body of facts.  Publishers will want to consider how they can reuse information and facts to compose new content that will engage audiences.

— Michael Andrews

Content Structure in Tables

Content used in tables requires planning. Some authors consider tables fussy and unnecessary, and assume readers find them confusing. Dislike of tables often reflects mistaken ideas about what content in a table is meant to convey. Many people treat tables as blank cells to fill-in as they please. Such free-form tables are the root source of many problems in large scale content publishing operations.  Tables are most effective when their structure is designed to support the meaning of the content they display.

Contrary to popular perception, tables are not really a single content type. Various types of tables exist, each with distinct  content structures.  Unfortunately, the tools authors use to make tables, whether spreadsheets such as Excel or plain old HTML markup, encourage them to think of tables as a blank canvas on which anything can be added.  Just choose the number of columns and rows you want, and a table results.  Tables shouldn’t be considered merely as a display format.  Tables should, where possible, convey the underlying structure of the content.  Structure provides editorial guidance to readers so they can understand the content more clearly, and helps manage how the content within the table is delivered and reused in different contexts.

Over the past five to ten years, data scientists have focused research on how to extract the information embedded in millions of HTML tables published on the web.  Researchers consider the data in HTML tables as “semi-structured”.  These tables frequently follow predictable patterns, but are subject to great inconsistency as well.  Even when the table follows a pattern, the structure is normally implicit rather than explicit.

Published tables of content should indicate an explicit structure that is clear to readers and to machines alike. To reach that goal, we need to understand the implicit structure of tables published on the web today.  Many tables follow design patterns that reflect consistent content structures.  These patterns can provide a the basis to design templates for tables that will be used consistently.

To perceive the implicit structure of content, think about the content as composed of three parts:

  • A subject or topic (which we will refer to as “S”)
  • A property or attribute of the subject (referred to as “P”)
  • An object or value of the property of the subject (“O”)

For example, we can identify the structure of the following statement: “Mary (S) knows (P) Jane (O).”  The property  announces some information about the subject that is revealed by the object of this statement.

Tables allow many statements to be expressed in a compact space.  The S-P-O technique can be applied to tabular information.  Sometimes the information in tables is more complex than the basic S-P-O structure, and some tables (such as a Sudoku puzzle) lack this structure entirely.  Nonetheless, the S-P-O structure can help identify the underlying structure of content in many common types of tables.

Let’s consider the content structure commonly found in tables.  No standard taxonomy of table formats seems to exist, so I will offer my own terms to refer to these structures.  Five common kinds of tables are:

  1. Mutual comparison tables
  2. Dimensional tables
  3. Alternative list tables
  4. Spectrum tables
  5. Matrix tables

These five kinds of tables are not the only kinds possible, and some authors or data experts will object that these examples limit options for arranging information.  Yet it is important to simplify and standardize how information is displayed in tables when publishing at enterprise scale.  Knowing widely used and effective patterns for tables provides a basis to develop standardized templates to display tabular information.

Mutual Comparison Tables

The mutual comparison is a very common table type.  It lists a number of items (subjects) that all belong to a common category, and then indicates different properties and values for these items.  Let’s look at some examples, starting with a table of most active stocks.  Each company is a subject, and different properties of the company are identified in the column headings.  The values (or objects) of these properties appear within each row.  All the companies belong to a common category: most active stocks.  It is common for the table heading of a mutual comparison table to refer in some way to the kind of subject listed, and one or more of the key properties associated with that subject.  The table makes two kinds of statements.  First, that the most active stocks include certain companies such as BAC and AMD.  Next, a second kind of statement is made where a company, say BAC, has a price that has a dollar value.  Each company in the table has multiple statements, presented in each column.

source: MSNBC
source: MSNBC

Let’s consider another mutual comparison table from the website FiveThirtyEight, showing sports teams rankings.  While by convention the subjects in the table typically appear in the left column, in this table the subjects (the teams) appear in the third column.  The properties of the teams appear on either side of the team’s name.  The table heading only implicitly indicates what the different subjects in the table share in common: that they all belong to the NFL.

Source: FiveThirtyEight
Source: FiveThirtyEight

The archetypal content structure for a mutual comparison table is illustrated below.  The arrows show the relationships between different elements in the table.  The overarching subject, the category that the table discusses, is indicated by a S’.  The individual subjects have one or more properties, each property generally having a single value (but not necessarily so).  What the value means depends on the property that describes it, which in turn depends on the subject the property refers to.

Diagram of Mutual Comparison Table

Dimensional Tables

Dimensional tables are similar to mutual comparison tables, except that only one subject is discussed.   They reveal  dimensions of a single topic.   Let’s look at a table from Wikipedia about winners of the Booker prize for literature.  The subject of the table is the Booker prize.

Source: Wikipedia
Source: Wikipedia

The table has rows of information relating to the winner for each year.  Although the table allows sorting by any column, the key column that defines how the content in a row is related is the year.  We can say that the year property is the primary dimension, while other properties such as country (of the author) and genre are secondary dimensions.  Such tables identify some primary dimension that varies as a way to structure the content.  Typically the most important property of the subject will be the left column of the table.  In many cases the primary property is one that is required to exist in order for other properties to also be present.  The best way to think about dimensional tables is to consider that they involve a chain of statements.  First, we announce that a Booker prize was awarded in 2016.  If no Booker prize was given in 2016 (and some awards choose to skip years if they don’t like the candidates), then none of the other properties relating to a 2016 winner would make sense.  Two kinds of statements are supported by the structure of the table:

  1. A Booker Prize was awarded in 2016.
  2. It was given to Paul Beatty.

Dimensional tables can be applied to qualitative as well as quantitive properties and values.  Here’s an example that’s more conceptual.  The subject of the table is strategic communication.  Again we see that the columns are not of equal importance.  The primary dimension relates to communication function, while other dimensions are structurally dependent on that.  The structure of the table indicates that its author considered communication function as the key to understanding communication approaches, rather than say, channel.

Source: MIT Sloan Management Review
Source: MIT Sloan Management Review

The structure of a dimensional table is shown in the diagram below.  The secondary properties and their values depend on the primary property.

diagram of dimensional table

Alternative List Tables

An alternative list table is similar to a dimensional table in that the table refers to one subject only.  It differs in that each property addressed by the table is independent of the others.  This means that the rows presenting values aren’t related.  The following example of an alternative list table relates to spending decisions.  Although the table has no title, the subject of the table is spending.  The subject has two properties, which are alternatives.  The table answers which kinds of purchases are major, and which are minor.  Examples of each are listed under the columns.  This is a common kind of table to display content, and is often used to compare products.

Source: World Bank
Source: World Bank

The following diagram shows the structure of the content in an alternative list table.

diagram-alternative list table

What can make alternative list tables difficult to interpret is that sometimes the creator of the table leaves out information, or implies relationships that may or may not be intended.  Let’s look at another example, from the World Bank.  The table discusses two alternative ways of thinking, automatic and deliberative.  Examples are shown for each alternative.  Are the examples just a random list, or do they suggest some additional dimensions?

Source: World Bank
Source: World Bank

Many tables using this format choose to leave out a column on the left that would explain what each value represents.  In this table, a line is drawn across the values implying that each pair is related.  For example, “narrow frame” and “wide frame,” or “effortful” and “effortless,” both seem related pairs.  But what about “associative” and “based on reasoning”?  Are those opposite or similar, and what exactly do these values refer to?  What’s the difference between associative and intuitive, which are both properties of automatic thinking?  When tables drop labels, the reader can’t understand the structure of the content presented in the table without consulting accompanying text.  This prevents the table from being reusable in different contexts.

Spectrum Tables

A spectrum table is a special type that mixes elements from the alternative list (showing alternative kinds) and the dimensional (addressing distinct properties of a subject) models.  When structured properly, it is a sophisticated way to present content.

A spectrum table answers the question: how does a value for a property vary according to some other factor?  One set of properties are treated as dependent variables (values change depending on the property considered), while the other set are treated as independent variables.   A concrete example will illustrate how the structure works.  We can see the service hours provided increases with the price of the monthly package.

table-spectrum

In the left column are all the features that could be part of a service offered for sale.  The other columns represent different price options, and the table reveals how much of each feature is offered according to price.

The following diagram illustrates the structure of a spectrum table.  The column heading will be of the same data type, which allows them to be compared directly.

diagram-spectrum table

Most of us are familiar with the pattern shown in the example.  But not all product comparison tables are spectrum tables.  Many are alternative list tables.  Marketers often remove any labels explaining what the values refer to, which allows them to make apples to oranges comparisons, so that the values of every option often sounds positive, even through they many not be addressing the same properties.  It’s a disingenious way to hide information when some options lack certain features. Such tables can never be constructed on the basis of features offered, since there are no rules governing what is shown. They are brittle and one-offs. It is not an approach that scales well.

Matrix Tables

Many tables appear to be matrices, because they contain both row and column headings.  A genuine matrix table has unique structural characteristics.  Its defining characteristic is that both row and column headings are of equal importance.  Each value has two properties.  Matrix tables are not common in web content, but can be useful for certain situations, such as classifying concepts.  For example, e-learning content might use matrix tables.

First, let’s look at a quasi-matrix.  This example from the Sloan Management Review on first appearance seems like a matrix, but actually isn’t.  The table has a simple structure, showing percentage values, with both column and row headings of seeming equal importance.  On closer inspection, however, the table is actually a mutual comparison table.  Although not explicitly noted, the table emphasizes the percentages in the rows representing industry sectors, rather than the columns representing technologies. The percentages in the rows add up to 100%.  The row of the table show the relative interest of each industry in different technologies, but the columns don’t show what the relative interest of a technology is among industries.  If the survey’s answers were translated into investment forecasts, the table might suggest what percentage of new investment by the retail industry will go into analytics, but it won’t suggest what  percentage of spending on analytics will be made by the retail industry.  Hence the table is not truly bidirectional in structure.

Source: MIT Sloan Management Review
Source: MIT Sloan Management Review

A genuine matrix can start with a value, and answer two  separate questions.   Consider the following example, from the World Bank.  It answers what kinds of relationships different social networks involve.  The relationships involve two independent facets, neither of which is more important than the other.  We can look at a friendship network such as Facebook and learn both the type and direction of the ties it involves (assuming of course we understand the terminology used in the table).

Source: World Bank
Source: World Bank

The structure of content in a matrix table is presented below.  Matrix tables describe two facets of a subject, and explore alternative categories for each facet.  Each value will have two properties, which together describe the subject.  In terms of the example, we can see that a social network that has both explicit and directed ties is called a friendship network.

diagram-matrix table

Standardizing Tables

Tables presenting content should be planned according to long-term audience and business needs.  Unfortunately, ad-hoc tables are far too common. Problems arise when:

  • Audience-facing tables are not designed around their needs, but are simply generated on demand from queries.
  • Tables are hand-crafted without thought to their wider use.

For database experts who think about tables in terms of rows and columns, an endless variety of tables can be generated.  Much data-centric content suffers from looking like a raw database. Having many variants can be useful for individuals needing highly specific reports, but an anything-is-possible approach lacks editorial oversight.  Audiences need tables that explain and compare key variables influencing their decisions.  They want to have confidence they both understand, and know they are not missing any key information.

The other kind of unplanned table is created by authors who design on their own to fit a specific need.  The problem is especially common in content marketing.  Idiosyncratic tables routinely drop explanatory headings, and sometimes add extraneous rows or columns that aren’t directly related to the subject of the table. Authors focus on how to emphasize the wording of content that appears in tables to attract attention to specific items of information.  They design online tables as if they were PowerPoint slides, centered on specific messages, instead of representing the content as a whole.  Their tables don’t consider how the content may need to be revised and reused in the future.  Audiences can find glib tables confusing and untrustworthy.

Content structure is discovered by deconstructing examples.  For those involved with content design and content engineering, the first step is to take an inventory of tables used in content published.  Look for patterns in tabular content, and standardize these patterns in templates.  Remove unnecessary variation, and designs that can’t be used widely.  Standardized tables allow content to become more flexible.  Templates based on standard table structures will make initial publication and subsequent reuse and updates easier.

— Michael Andrews