Categories
Content Engineering

Content Structure in Tables

Content used in tables requires planning. Some authors consider tables fussy and unnecessary, and assume readers find them confusing. Dislike of tables often reflects mistaken ideas about what content in a table is meant to convey. Many people treat tables as blank cells to fill-in as they please. Such free-form tables are the root source of many problems in large scale content publishing operations.  Tables are most effective when their structure is designed to support the meaning of the content they display.

Contrary to popular perception, tables are not really a single content type. Various types of tables exist, each with distinct  content structures.  Unfortunately, the tools authors use to make tables, whether spreadsheets such as Excel or plain old HTML markup, encourage them to think of tables as a blank canvas on which anything can be added.  Just choose the number of columns and rows you want, and a table results.  Tables shouldn’t be considered merely as a display format.  Tables should, where possible, convey the underlying structure of the content.  Structure provides editorial guidance to readers so they can understand the content more clearly, and helps manage how the content within the table is delivered and reused in different contexts.

Over the past five to ten years, data scientists have focused research on how to extract the information embedded in millions of HTML tables published on the web.  Researchers consider the data in HTML tables as “semi-structured”.  These tables frequently follow predictable patterns, but are subject to great inconsistency as well.  Even when the table follows a pattern, the structure is normally implicit rather than explicit.

Published tables of content should indicate an explicit structure that is clear to readers and to machines alike. To reach that goal, we need to understand the implicit structure of tables published on the web today.  Many tables follow design patterns that reflect consistent content structures.  These patterns can provide a the basis to design templates for tables that will be used consistently.

To perceive the implicit structure of content, think about the content as composed of three parts:

  • A subject or topic (which we will refer to as “S”)
  • A property or attribute of the subject (referred to as “P”)
  • An object or value of the property of the subject (“O”)

For example, we can identify the structure of the following statement: “Mary (S) knows (P) Jane (O).”  The property  announces some information about the subject that is revealed by the object of this statement.

Tables allow many statements to be expressed in a compact space.  The S-P-O technique can be applied to tabular information.  Sometimes the information in tables is more complex than the basic S-P-O structure, and some tables (such as a Sudoku puzzle) lack this structure entirely.  Nonetheless, the S-P-O structure can help identify the underlying structure of content in many common types of tables.

Let’s consider the content structure commonly found in tables.  No standard taxonomy of table formats seems to exist, so I will offer my own terms to refer to these structures.  Five common kinds of tables are:

  1. Mutual comparison tables
  2. Dimensional tables
  3. Alternative list tables
  4. Spectrum tables
  5. Matrix tables

These five kinds of tables are not the only kinds possible, and some authors or data experts will object that these examples limit options for arranging information.  Yet it is important to simplify and standardize how information is displayed in tables when publishing at enterprise scale.  Knowing widely used and effective patterns for tables provides a basis to develop standardized templates to display tabular information.

Mutual Comparison Tables

The mutual comparison is a very common table type.  It lists a number of items (subjects) that all belong to a common category, and then indicates different properties and values for these items.  Let’s look at some examples, starting with a table of most active stocks.  Each company is a subject, and different properties of the company are identified in the column headings.  The values (or objects) of these properties appear within each row.  All the companies belong to a common category: most active stocks.  It is common for the table heading of a mutual comparison table to refer in some way to the kind of subject listed, and one or more of the key properties associated with that subject.  The table makes two kinds of statements.  First, that the most active stocks include certain companies such as BAC and AMD.  Next, a second kind of statement is made where a company, say BAC, has a price that has a dollar value.  Each company in the table has multiple statements, presented in each column.

source: MSNBC
source: MSNBC

Let’s consider another mutual comparison table from the website FiveThirtyEight, showing sports teams rankings.  While by convention the subjects in the table typically appear in the left column, in this table the subjects (the teams) appear in the third column.  The properties of the teams appear on either side of the team’s name.  The table heading only implicitly indicates what the different subjects in the table share in common: that they all belong to the NFL.

Source: FiveThirtyEight
Source: FiveThirtyEight

The archetypal content structure for a mutual comparison table is illustrated below.  The arrows show the relationships between different elements in the table.  The overarching subject, the category that the table discusses, is indicated by a S’.  The individual subjects have one or more properties, each property generally having a single value (but not necessarily so).  What the value means depends on the property that describes it, which in turn depends on the subject the property refers to.

Diagram of Mutual Comparison Table

Dimensional Tables

Dimensional tables are similar to mutual comparison tables, except that only one subject is discussed.   They reveal  dimensions of a single topic.   Let’s look at a table from Wikipedia about winners of the Booker prize for literature.  The subject of the table is the Booker prize.

Source: Wikipedia
Source: Wikipedia

The table has rows of information relating to the winner for each year.  Although the table allows sorting by any column, the key column that defines how the content in a row is related is the year.  We can say that the year property is the primary dimension, while other properties such as country (of the author) and genre are secondary dimensions.  Such tables identify some primary dimension that varies as a way to structure the content.  Typically the most important property of the subject will be the left column of the table.  In many cases the primary property is one that is required to exist in order for other properties to also be present.  The best way to think about dimensional tables is to consider that they involve a chain of statements.  First, we announce that a Booker prize was awarded in 2016.  If no Booker prize was given in 2016 (and some awards choose to skip years if they don’t like the candidates), then none of the other properties relating to a 2016 winner would make sense.  Two kinds of statements are supported by the structure of the table:

  1. A Booker Prize was awarded in 2016.
  2. It was given to Paul Beatty.

Dimensional tables can be applied to qualitative as well as quantitive properties and values.  Here’s an example that’s more conceptual.  The subject of the table is strategic communication.  Again we see that the columns are not of equal importance.  The primary dimension relates to communication function, while other dimensions are structurally dependent on that.  The structure of the table indicates that its author considered communication function as the key to understanding communication approaches, rather than say, channel.

Source: MIT Sloan Management Review
Source: MIT Sloan Management Review

The structure of a dimensional table is shown in the diagram below.  The secondary properties and their values depend on the primary property.

diagram of dimensional table

Alternative List Tables

An alternative list table is similar to a dimensional table in that the table refers to one subject only.  It differs in that each property addressed by the table is independent of the others.  This means that the rows presenting values aren’t related.  The following example of an alternative list table relates to spending decisions.  Although the table has no title, the subject of the table is spending.  The subject has two properties, which are alternatives.  The table answers which kinds of purchases are major, and which are minor.  Examples of each are listed under the columns.  This is a common kind of table to display content, and is often used to compare products.

Source: World Bank
Source: World Bank

The following diagram shows the structure of the content in an alternative list table.

diagram-alternative list table

What can make alternative list tables difficult to interpret is that sometimes the creator of the table leaves out information, or implies relationships that may or may not be intended.  Let’s look at another example, from the World Bank.  The table discusses two alternative ways of thinking, automatic and deliberative.  Examples are shown for each alternative.  Are the examples just a random list, or do they suggest some additional dimensions?

Source: World Bank
Source: World Bank

Many tables using this format choose to leave out a column on the left that would explain what each value represents.  In this table, a line is drawn across the values implying that each pair is related.  For example, “narrow frame” and “wide frame,” or “effortful” and “effortless,” both seem related pairs.  But what about “associative” and “based on reasoning”?  Are those opposite or similar, and what exactly do these values refer to?  What’s the difference between associative and intuitive, which are both properties of automatic thinking?  When tables drop labels, the reader can’t understand the structure of the content presented in the table without consulting accompanying text.  This prevents the table from being reusable in different contexts.

Spectrum Tables

A spectrum table is a special type that mixes elements from the alternative list (showing alternative kinds) and the dimensional (addressing distinct properties of a subject) models.  When structured properly, it is a sophisticated way to present content.

A spectrum table answers the question: how does a value for a property vary according to some other factor?  One set of properties are treated as dependent variables (values change depending on the property considered), while the other set are treated as independent variables.   A concrete example will illustrate how the structure works.  We can see the service hours provided increases with the price of the monthly package.

table-spectrum

In the left column are all the features that could be part of a service offered for sale.  The other columns represent different price options, and the table reveals how much of each feature is offered according to price.

The following diagram illustrates the structure of a spectrum table.  The column heading will be of the same data type, which allows them to be compared directly.

diagram-spectrum table

Most of us are familiar with the pattern shown in the example.  But not all product comparison tables are spectrum tables.  Many are alternative list tables.  Marketers often remove any labels explaining what the values refer to, which allows them to make apples to oranges comparisons, so that the values of every option often sounds positive, even through they many not be addressing the same properties.  It’s a disingenious way to hide information when some options lack certain features. Such tables can never be constructed on the basis of features offered, since there are no rules governing what is shown. They are brittle and one-offs. It is not an approach that scales well.

Matrix Tables

Many tables appear to be matrices, because they contain both row and column headings.  A genuine matrix table has unique structural characteristics.  Its defining characteristic is that both row and column headings are of equal importance.  Each value has two properties.  Matrix tables are not common in web content, but can be useful for certain situations, such as classifying concepts.  For example, e-learning content might use matrix tables.

First, let’s look at a quasi-matrix.  This example from the Sloan Management Review on first appearance seems like a matrix, but actually isn’t.  The table has a simple structure, showing percentage values, with both column and row headings of seeming equal importance.  On closer inspection, however, the table is actually a mutual comparison table.  Although not explicitly noted, the table emphasizes the percentages in the rows representing industry sectors, rather than the columns representing technologies. The percentages in the rows add up to 100%.  The row of the table show the relative interest of each industry in different technologies, but the columns don’t show what the relative interest of a technology is among industries.  If the survey’s answers were translated into investment forecasts, the table might suggest what percentage of new investment by the retail industry will go into analytics, but it won’t suggest what  percentage of spending on analytics will be made by the retail industry.  Hence the table is not truly bidirectional in structure.

Source: MIT Sloan Management Review
Source: MIT Sloan Management Review

A genuine matrix can start with a value, and answer two  separate questions.   Consider the following example, from the World Bank.  It answers what kinds of relationships different social networks involve.  The relationships involve two independent facets, neither of which is more important than the other.  We can look at a friendship network such as Facebook and learn both the type and direction of the ties it involves (assuming of course we understand the terminology used in the table).

Source: World Bank
Source: World Bank

The structure of content in a matrix table is presented below.  Matrix tables describe two facets of a subject, and explore alternative categories for each facet.  Each value will have two properties, which together describe the subject.  In terms of the example, we can see that a social network that has both explicit and directed ties is called a friendship network.

diagram-matrix table

Standardizing Tables

Tables presenting content should be planned according to long-term audience and business needs.  Unfortunately, ad-hoc tables are far too common. Problems arise when:

  • Audience-facing tables are not designed around their needs, but are simply generated on demand from queries.
  • Tables are hand-crafted without thought to their wider use.

For database experts who think about tables in terms of rows and columns, an endless variety of tables can be generated.  Much data-centric content suffers from looking like a raw database. Having many variants can be useful for individuals needing highly specific reports, but an anything-is-possible approach lacks editorial oversight.  Audiences need tables that explain and compare key variables influencing their decisions.  They want to have confidence they both understand, and know they are not missing any key information.

The other kind of unplanned table is created by authors who design on their own to fit a specific need.  The problem is especially common in content marketing.  Idiosyncratic tables routinely drop explanatory headings, and sometimes add extraneous rows or columns that aren’t directly related to the subject of the table. Authors focus on how to emphasize the wording of content that appears in tables to attract attention to specific items of information.  They design online tables as if they were PowerPoint slides, centered on specific messages, instead of representing the content as a whole.  Their tables don’t consider how the content may need to be revised and reused in the future.  Audiences can find glib tables confusing and untrustworthy.

Content structure is discovered by deconstructing examples.  For those involved with content design and content engineering, the first step is to take an inventory of tables used in content published.  Look for patterns in tabular content, and standardize these patterns in templates.  Remove unnecessary variation, and designs that can’t be used widely.  Standardized tables allow content to become more flexible.  Templates based on standard table structures will make initial publication and subsequent reuse and updates easier.

— Michael Andrews

Categories
Intelligent Content

Why Structured Data needs to talk to Structured Content

A recent post on Google’s webmaster blog  illustrates how metadata needs to address both the structure of web content, and the meaning of that content.

People who work in SEO talk about structured data a lot, while those who work in content strategy talk about structured content. These topics are obviously related, but the terminology used by each party obscures how each topic relates to the other. My take: both structured data and structured content are different dimensions of metadata. Structured data is generally descriptive metadata identifying entities discussed in the content. Structured content provides the foundation for structural metadata that indicates the logic and organization of the content. Both descriptive and structural metadata are important in content, and they should ideally be integrated together.

The Google blog advises publishers to include structured data in their content. The below screenshot shows how this advice is presented.

(source: Google Central Webmaster Blog)
(source: Google Central Webmaster Blog)

The advice presented follows a pattern:

  • Advice to follow
  • Rationale
  • Best practices to implement advice (shown in green)
  • Actions not to do (shown in pink)

Some other items of advice in the post include another element:

  • Practices to avoid when implementing advice (shown in yellow)

We can see that the post follows good structure that is easy to scan and understand, and provides a foundation to reuse the information in other contexts. Now, let’s look at the post’s source code. This is where we’d expect to see the structured data associated with the content.

Source code for Blog post.
Source code for Blog post.

Disappointingly, no structured data is associated with the specific items of advice. The details of the advice are marked up with “class” attributes intended to style the content, but not to identify the meaning of the content. The only structured data on the page relates to the blog post in general (such as its author).

Imagine how the content could be reused if structured data identified the meaning of the advice. Someone might type a search looking for tips on “mistakes when using schema.org,” “why use schema.org,” or “schema.org best practices” and get specific bullets of content relating to their query.

In this example, the post’s author has done nothing wrong, though an opportunity has been missed nonetheless. Currently, schema.org doesn’t have any entity types that address advice statements that would contain sub-elements such as Rationale, Do, Avoid, and Don’t. The closest types are related to Questions and Answers, which are slightly different in their structure.

Because the structured data used in SEO, particularly schema.org, tends to focus on descriptive metadata, it has less coverage of other dimensions of metadata such as structural metadata indicating the role of content elements, or technical, administrative and rights metadata. All these kinds of metadata are important to address, to allow content to be shared and reused across different platforms and in different contexts. Fortunately, schema.org has been evolving quickly, and its coverage is improving every month. This expansion will allow for genuinely integrated metadata that indicates both the meaning and the structure of the content.

Metadata is a rich and important topic for everyone concerned with content published on the web. If you are interested in learning more about the many dimensions of metadata, you may be interested in my forthcoming book, Metadata Basics for Web Content, which will be available in early 2017 on Amazon.

— Michael Andrews