Category: Content Engineering

Your Content Needs a Metadata Strategy

Post author By Michael Andrews
Post date May 31, 2017

What’s your metadata strategy? So few web publishers have an articulated metadata strategy that a skeptic may think I’ve made up the concept, and coined a new buzzword. Yet almost a decade ago, Kristina Halvorson explicitly cited metadata strategy as one of “a number of content-related disciplines that deserve their own definition” in her seminal A List Apart article, “The Discipline of Content Strategy”. She also cites metadata strategy in her widely read book on content strategy. It’s been nearly a decade since Kristina’s article, but the discipline of content strategy still hasn’t given metadata strategy the attention it deserves.

A content strategy, to have a sustained impact, needs a metadata strategy to back it up. Without metadata strategy, content strategy can get stuck in a firefighting mode. Many organizations keep making the same mistakes with their content, because they ask overwhelmed staff to track too many variables. Metadata can liberate staff from checklists, by allowing IT systems to handle low level details that are important, but exhausting to deal with. Staff may come and go, and their enthusiasm can wax and wane. But metadata, like the Energizer bunny, keeps performing: it can keep the larger strategy on track. Metadata can deliver consistency to content operations, and can enhance how content is delivered to audiences.

A metadata strategy is a plan for how a publisher can leverage metadata to accomplish specific content goals. It articulates what metadata publishers need for their content, how they will create that metadata, and most importantly, how both the publisher and audiences can utilize the metadata. When metadata is an afterthought, publishers end up with content strategies that can’t be implemented, or are implemented poorly.

The Vaporware Problem: When you can’t implement your Plan

A content strategy may include many big ideas, but translating those ideas into practice can be the hardest part. A strategy will be difficult to execute when its documentation and details are too much for operational teams to absorb and follow. The group designing the content strategy may have done a thorough analysis of what’s needed. They identified goals and metrics, modeled how content needs to fit together, and considered workflows and the editorial lifecycle. But large content teams, especially when geographically distributed, can face difficulties implementing the strategy. Documentation, emails and committees are unreliable ways to coordinate content on a large scale. Instead, key decisions should be embedded into the tools the team uses wherever possible. When their tools have encoded relevant decisions, teams can focus on accomplishing their goals, instead of following rules and checklists.

In the software industry, vaporware is a product concept that’s been announced, but not built. Plans that can’t be implemented are vaporware. Content strategies are sometimes conceived with limited consideration of how to implement them consistently. When executing a content strategy, metadata is where the rubber hits the road. It’s a key ingredient for turning plans into reality. But first, publishers need to have the right metadata in place before they can use it to support their broader goals.

Effective large-scale content governance is impossible without effective metadata, especially administrative metadata. Without a metadata strategy, publishers tend to rely on what their existing content systems offer them, instead of asking first what they want from their systems. Your existing system may provide only some of the key metadata attributes you need to coordinate and manage your content. That metadata may be in a proprietary format, meaning it can’t be used by other systems. The default settings offered by your vendors’ products are likely not to provide the coordination and flexibility required.

Consider all the important information about your content that needs to be supported with metadata. You need to know details about the history of the content (when it was created, last revised, reused from elsewhere, or scheduled for removal), where the content came from (author, approvers, licensing rights for photos, or location information for video recordings), and goals for the content (intended audiences, themes, or channels). Those are just some of the metadata attributes content systems can use to manage routine reporting, tracking, and routing tasks, so web teams can focus on tasks of higher value.

If you have grander visions for your content, such as making your content “intelligent”, then having a metadata strategy becomes even more important. Countless vendors are hawking products that claim to add AI to content. Just remember— Metadata is what makes content intelligent: ready for applications (user decisions), algorithms (machine decisions) and analytics (assessment). Don’t buy new products without first having your own metadata strategy in place. Otherwise you’ll likely be stuck with the vendor’s proprietary vision and roadmap, instead of your own.

Lack of Strategy creates Stovepipe Systems

A different problem arises when a publisher tries to do many things with its content, but does so in a piecemeal manner. Perhaps a big bold vision for a content strategy, embodied in a PowerPoint deck, gets tossed over to the IT department. Various IT members consider what systems are needed to support different functionality. Unless there is a metadata strategy in place, each system is likely to operate according to its own rules:

Content structuring relies on proprietary templates
Content management relies on proprietary CMS data fields
SEO relies on meta tags
Recommendations rely on page views and tags
Analytics rely on page titles and URLs
Digital assets rely on proprietary tags
Internal search uses keywords and not metadata
Navigation uses a CMS-defined custom taxonomy or folder structure
Screen interaction relies on custom JSON
Backend data relies on a custom data model.

Sadly such uncoordinated labeling of content is quite common.

Without a metadata strategy, each area of functionality is considered as a separate system. IT staff then focus on systems integration: trying to get different systems to talk to each other. In reality, they have a collection of stovepipe systems, where metadata descriptions aren’t shared across systems. That’s because various systems use proprietary or custom metadata, instead of using common, standards-based metadata. Stovepipe systems lack a shared language that allows interoperability. Attributes that are defined by your CMS or other vendor system are hostage to that system.

Proprietary metadata is far less valuable than standards-based metadata. Proprietary metadata can’t be shared easily with other systems and is hard or impossible to migrate if you change systems. Proprietary metadata is a sunk cost that’s expensive to maintain, rather than being an investment that will have value for years to come. Unlike standards-based metadata, proprietary metadata is brittle — new requirements can mess up an existing integration configuration.

Metadata standards are like an operating system for your content. They allow content to be used, managed and tracked across different applications. Metadata standards create an ecosystem for content. Metadata strategy asks: What kind of ecosystem do you want, and how are you going to develop it, so that your content is ready for any task?

Who is doing Metadata Strategy right?

Let’s look at how two well-known organizations are doing metadata strategy. One example is current and news-worthy, while the other has a long backstory.

eBay

eBay decided that the proprietary metadata they used in their content wasn’t working, as it was preventing them from leveraging metadata to deliver better experiences for their customers. They embarked on a major program called the “Structured Data Initiative”, migrating their content to metadata based on the W3C web standard, schema.org. Wall Street analysts have been following eBay’s metadata strategy closely over the past year, as it is expected to improve the profitability of the ecommerce giant. The adoption of metadata standards has allowed for a “more personal and discovery-based buying experience with highly tailored choices and unique selection”, according to eBay. eBay is leveraging the metadata to work with new AI technologies to deliver a personalized homepage to each of its customers. It is also leveraging the metadata in its conversational commerce product, the eBay ShopBot, which connects with Facebook Messenger. eBay’s experience shows that a company shouldn’t try to adopt AI without first having a metadata strategy.

Significantly, eBay’s metadata strategy adopts the W3C schema.org standard for their internal content management, in addition to using it for search engine consumers such as Google and Bing. Plenty of publishers use schema.org for search engine purposes, but few have taken the next step like eBay to use it as the basis of their content operations. eBay is also well positioned to take advantage of any new third party services that can consume their metadata.

Australian Government

From the earliest days of online content, the Australian government has been concerned with how metadata can improve online content availability. The Australian government isn’t a single publisher, but comprises a federation of many government websites run by different government organizations. The governance challenges are enormous. Fortunately, metadata standards can help coordinate diverse activity. The AGLS metadata standard has been in use nearly 20 years to classify services provided by different organizations within the Australian government.

The AGLS metadata strategy is unique in a couple of ways. First, it adopts an existing standard and builds upon it. The government identified areas where existing standards didn’t offer attributes that were needed. The government adopted the widely used Dublin Core metadata standard, but added some additional elements that were specific to their needs (for example, indicating the “jurisdiction” that the content relates to). Starting from an existing standard, they extended it and got the W3C to recognize their extension.

Second, the AGLS strategy addresses implementation at different levels in different ways. The metadata standard allow different publishers to describe their content consistently. It ensures all published content is inter-operable. Individual publishers, such as the state government of Victoria, have their own government website principles and requirements, but these mandate the use of the AGLS metadata standard. The common standard has also promoted the availability of tools to implement the standard. For example, Drupal, which is widely used for government websites in Australia, has a plugin that provides support for adding the metadata to content. Currently, over 700 sites use the plugin. But significantly, because AGLS is an open standard, it can work with any CMS, not just Drupal. I’ve also seen a plugin for Joomla.

Australia’s example shows how content metadata isn’t an afterthought, but is a core part of content publishing. A well-considered metadata strategy can provide benefits for many years. Given its long history, AGLS is sure to continue to evolve to address new requirements.

Strategy focuses on the Value Metadata can offer

Occasionally, I encounter someone who warns of the “dangers” of “too much” metadata. When I try to uncover the source of the perceived concern, I learn that the person thinks about metadata as a labor-intensive activity. They imagine they need to hand-create the metadata serially. They think that metadata exists so they can hunt and search for specific documents. This sort of thinking is dated but still quite common. It reflects how librarians and database administrators approached metadata in the past, as a tedious form of record keeping. The purpose of metadata has evolved far beyond record keeping. Metadata no longer is primarily about “findability,” powered by clicking labels and typing within form fields. It is now more about “discovery” — revealing relevant information through automation. Leveraging metadata depends on understanding the range of uses for it.

When someone complains about too much metadata, it also signals to me that a metadata strategy is missing. In many organizations, metadata is relegated to being an electronic checklist, instead of positioned as a valuable tool. When that’s the case, metadata can seem overwhelming. Organizations can have too much metadata when:

Too much of their metadata is incompatible, because different systems define content in different ways
Too much metadata is used for a single purpose, instead of serving multiple purposes.

Siloed thinking about metadata results in stovepipe systems. New metadata fields are created to address narrow needs, such as tracking or locating items for specific purposes. Fields proliferate across various systems. And everyone is confused how anything relates to anything else.

Strategic thinking about metadata considers how metadata can serve all the needs of the publisher, not just the needs of an individual team member or role. When teams work together to develop requirements, they can discuss what metadata is useful for different purposes. They can identify how a single metadata item can be in different contexts. If the metadata describes when an item was last updated, the team might consider how that metadata might be used in different contexts. How might it be used by content creators, by the analytics team, by the UX design team, and by the product manager?

Publishers should ask themselves how they can do more for their customers by using metadata. They need to think about the productivity of their metadata: making specific metadata descriptions do more things that can add value to the content. And they need a strategy to make that happen.

— Michael Andrews

Tags metadata, schema.org

Content Engineering

Content Structure in Tables

Post author By Michael Andrews
Post date November 28, 2016

Content used in tables requires planning. Some authors consider tables fussy and unnecessary, and assume readers find them confusing. Dislike of tables often reflects mistaken ideas about what content in a table is meant to convey. Many people treat tables as blank cells to fill-in as they please. Such free-form tables are the root source of many problems in large scale content publishing operations. Tables are most effective when their structure is designed to support the meaning of the content they display.

Contrary to popular perception, tables are not really a single content type. Various types of tables exist, each with distinct content structures. Unfortunately, the tools authors use to make tables, whether spreadsheets such as Excel or plain old HTML markup, encourage them to think of tables as a blank canvas on which anything can be added. Just choose the number of columns and rows you want, and a table results. Tables shouldn’t be considered merely as a display format. Tables should, where possible, convey the underlying structure of the content. Structure provides editorial guidance to readers so they can understand the content more clearly, and helps manage how the content within the table is delivered and reused in different contexts.

Over the past five to ten years, data scientists have focused research on how to extract the information embedded in millions of HTML tables published on the web. Researchers consider the data in HTML tables as “semi-structured”. These tables frequently follow predictable patterns, but are subject to great inconsistency as well. Even when the table follows a pattern, the structure is normally implicit rather than explicit.

Published tables of content should indicate an explicit structure that is clear to readers and to machines alike. To reach that goal, we need to understand the implicit structure of tables published on the web today. Many tables follow design patterns that reflect consistent content structures. These patterns can provide a the basis to design templates for tables that will be used consistently.

To perceive the implicit structure of content, think about the content as composed of three parts:

A subject or topic (which we will refer to as “S”)
A property or attribute of the subject (referred to as “P”)
An object or value of the property of the subject (“O”)

For example, we can identify the structure of the following statement: “Mary (S) knows (P) Jane (O).” The property announces some information about the subject that is revealed by the object of this statement.

Tables allow many statements to be expressed in a compact space. The S-P-O technique can be applied to tabular information. Sometimes the information in tables is more complex than the basic S-P-O structure, and some tables (such as a Sudoku puzzle) lack this structure entirely. Nonetheless, the S-P-O structure can help identify the underlying structure of content in many common types of tables.

Let’s consider the content structure commonly found in tables. No standard taxonomy of table formats seems to exist, so I will offer my own terms to refer to these structures. Five common kinds of tables are:

Mutual comparison tables
Dimensional tables
Alternative list tables
Spectrum tables
Matrix tables

These five kinds of tables are not the only kinds possible, and some authors or data experts will object that these examples limit options for arranging information. Yet it is important to simplify and standardize how information is displayed in tables when publishing at enterprise scale. Knowing widely used and effective patterns for tables provides a basis to develop standardized templates to display tabular information.

Mutual Comparison Tables

The mutual comparison is a very common table type. It lists a number of items (subjects) that all belong to a common category, and then indicates different properties and values for these items. Let’s look at some examples, starting with a table of most active stocks. Each company is a subject, and different properties of the company are identified in the column headings. The values (or objects) of these properties appear within each row. All the companies belong to a common category: most active stocks. It is common for the table heading of a mutual comparison table to refer in some way to the kind of subject listed, and one or more of the key properties associated with that subject. The table makes two kinds of statements. First, that the most active stocks include certain companies such as BAC and AMD. Next, a second kind of statement is made where a company, say BAC, has a price that has a dollar value. Each company in the table has multiple statements, presented in each column.

Let’s consider another mutual comparison table from the website FiveThirtyEight, showing sports teams rankings. While by convention the subjects in the table typically appear in the left column, in this table the subjects (the teams) appear in the third column. The properties of the teams appear on either side of the team’s name. The table heading only implicitly indicates what the different subjects in the table share in common: that they all belong to the NFL.

The archetypal content structure for a mutual comparison table is illustrated below. The arrows show the relationships between different elements in the table. The overarching subject, the category that the table discusses, is indicated by a S’. The individual subjects have one or more properties, each property generally having a single value (but not necessarily so). What the value means depends on the property that describes it, which in turn depends on the subject the property refers to.

Dimensional Tables

Dimensional tables are similar to mutual comparison tables, except that only one subject is discussed. They reveal dimensions of a single topic. Let’s look at a table from Wikipedia about winners of the Booker prize for literature. The subject of the table is the Booker prize.

The table has rows of information relating to the winner for each year. Although the table allows sorting by any column, the key column that defines how the content in a row is related is the year. We can say that the year property is the primary dimension, while other properties such as country (of the author) and genre are secondary dimensions. Such tables identify some primary dimension that varies as a way to structure the content. Typically the most important property of the subject will be the left column of the table. In many cases the primary property is one that is required to exist in order for other properties to also be present. The best way to think about dimensional tables is to consider that they involve a chain of statements. First, we announce that a Booker prize was awarded in 2016. If no Booker prize was given in 2016 (and some awards choose to skip years if they don’t like the candidates), then none of the other properties relating to a 2016 winner would make sense. Two kinds of statements are supported by the structure of the table:

A Booker Prize was awarded in 2016.
It was given to Paul Beatty.

Dimensional tables can be applied to qualitative as well as quantitive properties and values. Here’s an example that’s more conceptual. The subject of the table is strategic communication. Again we see that the columns are not of equal importance. The primary dimension relates to communication function, while other dimensions are structurally dependent on that. The structure of the table indicates that its author considered communication function as the key to understanding communication approaches, rather than say, channel.

The structure of a dimensional table is shown in the diagram below. The secondary properties and their values depend on the primary property.

Alternative List Tables

An alternative list table is similar to a dimensional table in that the table refers to one subject only. It differs in that each property addressed by the table is independent of the others. This means that the rows presenting values aren’t related. The following example of an alternative list table relates to spending decisions. Although the table has no title, the subject of the table is spending. The subject has two properties, which are alternatives. The table answers which kinds of purchases are major, and which are minor. Examples of each are listed under the columns. This is a common kind of table to display content, and is often used to compare products.

The following diagram shows the structure of the content in an alternative list table.

What can make alternative list tables difficult to interpret is that sometimes the creator of the table leaves out information, or implies relationships that may or may not be intended. Let’s look at another example, from the World Bank. The table discusses two alternative ways of thinking, automatic and deliberative. Examples are shown for each alternative. Are the examples just a random list, or do they suggest some additional dimensions?

Many tables using this format choose to leave out a column on the left that would explain what each value represents. In this table, a line is drawn across the values implying that each pair is related. For example, “narrow frame” and “wide frame,” or “effortful” and “effortless,” both seem related pairs. But what about “associative” and “based on reasoning”? Are those opposite or similar, and what exactly do these values refer to? What’s the difference between associative and intuitive, which are both properties of automatic thinking? When tables drop labels, the reader can’t understand the structure of the content presented in the table without consulting accompanying text. This prevents the table from being reusable in different contexts.

Spectrum Tables

A spectrum table is a special type that mixes elements from the alternative list (showing alternative kinds) and the dimensional (addressing distinct properties of a subject) models. When structured properly, it is a sophisticated way to present content.

A spectrum table answers the question: how does a value for a property vary according to some other factor? One set of properties are treated as dependent variables (values change depending on the property considered), while the other set are treated as independent variables. A concrete example will illustrate how the structure works. We can see the service hours provided increases with the price of the monthly package.

In the left column are all the features that could be part of a service offered for sale. The other columns represent different price options, and the table reveals how much of each feature is offered according to price.

The following diagram illustrates the structure of a spectrum table. The column heading will be of the same data type, which allows them to be compared directly.

Most of us are familiar with the pattern shown in the example. But not all product comparison tables are spectrum tables. Many are alternative list tables. Marketers often remove any labels explaining what the values refer to, which allows them to make apples to oranges comparisons, so that the values of every option often sounds positive, even through they many not be addressing the same properties. It’s a disingenious way to hide information when some options lack certain features. Such tables can never be constructed on the basis of features offered, since there are no rules governing what is shown. They are brittle and one-offs. It is not an approach that scales well.

Matrix Tables

Many tables appear to be matrices, because they contain both row and column headings. A genuine matrix table has unique structural characteristics. Its defining characteristic is that both row and column headings are of equal importance. Each value has two properties. Matrix tables are not common in web content, but can be useful for certain situations, such as classifying concepts. For example, e-learning content might use matrix tables.

First, let’s look at a quasi-matrix. This example from the Sloan Management Review on first appearance seems like a matrix, but actually isn’t. The table has a simple structure, showing percentage values, with both column and row headings of seeming equal importance. On closer inspection, however, the table is actually a mutual comparison table. Although not explicitly noted, the table emphasizes the percentages in the rows representing industry sectors, rather than the columns representing technologies. The percentages in the rows add up to 100%. The row of the table show the relative interest of each industry in different technologies, but the columns don’t show what the relative interest of a technology is among industries. If the survey’s answers were translated into investment forecasts, the table might suggest what percentage of new investment by the retail industry will go into analytics, but it won’t suggest what percentage of spending on analytics will be made by the retail industry. Hence the table is not truly bidirectional in structure.

A genuine matrix can start with a value, and answer two separate questions. Consider the following example, from the World Bank. It answers what kinds of relationships different social networks involve. The relationships involve two independent facets, neither of which is more important than the other. We can look at a friendship network such as Facebook and learn both the type and direction of the ties it involves (assuming of course we understand the terminology used in the table).

The structure of content in a matrix table is presented below. Matrix tables describe two facets of a subject, and explore alternative categories for each facet. Each value will have two properties, which together describe the subject. In terms of the example, we can see that a social network that has both explicit and directed ties is called a friendship network.

Standardizing Tables

Tables presenting content should be planned according to long-term audience and business needs. Unfortunately, ad-hoc tables are far too common. Problems arise when:

Audience-facing tables are not designed around their needs, but are simply generated on demand from queries.
Tables are hand-crafted without thought to their wider use.

For database experts who think about tables in terms of rows and columns, an endless variety of tables can be generated. Much data-centric content suffers from looking like a raw database. Having many variants can be useful for individuals needing highly specific reports, but an anything-is-possible approach lacks editorial oversight. Audiences need tables that explain and compare key variables influencing their decisions. They want to have confidence they both understand, and know they are not missing any key information.

The other kind of unplanned table is created by authors who design on their own to fit a specific need. The problem is especially common in content marketing. Idiosyncratic tables routinely drop explanatory headings, and sometimes add extraneous rows or columns that aren’t directly related to the subject of the table. Authors focus on how to emphasize the wording of content that appears in tables to attract attention to specific items of information. They design online tables as if they were PowerPoint slides, centered on specific messages, instead of representing the content as a whole. Their tables don’t consider how the content may need to be revised and reused in the future. Audiences can find glib tables confusing and untrustworthy.

Content structure is discovered by deconstructing examples. For those involved with content design and content engineering, the first step is to take an inventory of tables used in content published. Look for patterns in tabular content, and standardize these patterns in templates. Remove unnecessary variation, and designs that can’t be used widely. Standardized tables allow content to become more flexible. Templates based on standard table structures will make initial publication and subsequent reuse and updates easier.

— Michael Andrews

Tags tables