Categories
Content Engineering

Your Content Needs a Metadata Strategy

What’s your metadata strategy?  So few web publishers have an articulated metadata strategy that a skeptic may think I’ve made up the concept, and coined a new buzzword.  Yet almost a decade ago, Kristina Halvorson explicitly cited metadata strategy as one of “a number of content-related disciplines that deserve their own definition” in her seminal  A List Apart article, “The Discipline of Content Strategy”.   She also cites metadata strategy in her widely read book on content strategy.  It’s been nearly a decade since Kristina’s article, but the discipline of content strategy still hasn’t given metadata strategy the attention it deserves.

A content strategy, to have a sustained impact, needs a metadata strategy to back it up.  Without metadata strategy, content strategy can get stuck in a firefighting mode.  Many organizations keep making the same mistakes with their content, because they ask overwhelmed staff to track too many variables.  Metadata can liberate staff from checklists, by allowing IT systems to handle low level details that are important, but exhausting to deal with.  Staff may come and go, and their enthusiasm can wax and wane.  But metadata, like the Energizer bunny, keeps performing: it can keep the larger strategy on track. Metadata can deliver consistency to content operations, and can enhance how content is delivered to audiences.

A metadata strategy is a plan for how a publisher can leverage metadata to accomplish specific content goals.  It articulates what metadata publishers need for their content, how they will create that metadata, and most importantly, how both the publisher and audiences can utilize the metadata.  When metadata is an afterthought, publishers end up with content strategies that can’t be implemented, or are implemented poorly.

The Vaporware Problem: When you can’t implement your Plan

A content strategy may include many big ideas, but translating those ideas into practice can be the hardest part.  A strategy will be difficult to execute when its documentation and details are too much for operational teams to absorb and follow.  The group designing the content strategy may have done a thorough analysis of what’s needed.  They identified goals and metrics, modeled how content needs to fit together, and considered workflows and the editorial lifecycle.  But large content teams, especially when geographically distributed, can face difficulties implementing the strategy.  Documentation, emails and committees are unreliable ways to coordinate content on a large scale.  Instead, key decisions should be embedded into the tools the team uses wherever possible.  When their tools have encoded relevant decisions, teams can focus on accomplishing their goals, instead of following rules and checklists.

In the software industry, vaporware is a product concept that’s been announced, but not built. Plans that can’t be implemented are vaporware. Content strategies are sometimes conceived with limited consideration of how to implement them consistently.  When executing a content strategy, metadata is where the rubber hits the road.  It’s a key ingredient for turning plans into reality.  But first, publishers need to have the right metadata in place before they can use it to support their broader goals.

Effective large-scale content governance is impossible without effective metadata, especially administrative metadata.  Without a metadata strategy, publishers tend to rely on what their existing content systems offer them, instead of asking first what they want from their systems.  Your existing system may provide only some of the key metadata attributes you need to coordinate and manage your content. That metadata may be in a proprietary format, meaning it can’t be used by other systems. The default settings offered by your vendors’ products are likely not to provide the coordination and flexibility required.

Consider all the important information about your content that needs to be supported with metadata.  You need to know details about the history of the content (when it was created, last revised, reused from elsewhere, or scheduled for removal), where the content came from (author, approvers, licensing rights for photos, or location information for video recordings), and goals for the content (intended audiences, themes, or channels).  Those are just some of the metadata attributes content systems can use to manage routine reporting, tracking, and routing tasks, so web teams can focus on tasks of higher value.

If you have grander visions for your content, such as making your content “intelligent”, then having a metadata strategy becomes even more important.  Countless vendors are hawking products that claim to add AI to content.  Just remember—  Metadata is what makes content intelligent: ready for applications (user decisions), algorithms (machine decisions) and  analytics (assessment).  Don’t buy new products without first having your own metadata strategy in place.  Otherwise you’ll likely be stuck with the vendor’s proprietary vision and roadmap, instead of your own.

Lack of Strategy creates Stovepipe Systems

A different problem arises when a publisher tries to do many things with its content, but does so in a piecemeal manner.  Perhaps a big bold vision for a content strategy, embodied in a PowerPoint deck, gets tossed over to the IT department.  Various IT members consider what systems are needed to support different functionality.  Unless there is a metadata strategy in place, each system is likely to operate according to its own rules:

  • Content structuring relies on proprietary templates
  • Content management relies on proprietary CMS data fields
  • SEO relies on meta tags
  • Recommendations rely on page views and tags
  • Analytics rely on page titles and URLs
  • Digital assets rely on proprietary tags
  • Internal search uses keywords and not metadata
  • Navigation uses a CMS-defined custom taxonomy or folder structure
  • Screen interaction relies on custom JSON
  • Backend data relies on a custom data model.

Sadly such uncoordinated labeling of content is quite common.

Without a metadata strategy, each area of functionality is considered as a separate system.  IT staff then focus on systems integration: trying to get different systems to talk to each other.  In reality, they have a collection of stovepipe systems, where metadata descriptions aren’t shared across systems.  That’s because various systems use proprietary or custom metadata, instead of using common, standards-based metadata.  Stovepipe systems lack a shared language that allows interoperability.  Attributes that are defined by your CMS or other vendor system are hostage to that system.

Proprietary metadata is far less valuable than standards-based metadata.  Proprietary metadata can’t be shared easily with other systems and is hard or impossible to migrate if you change systems.  Proprietary metadata is a sunk cost that’s expensive to maintain, rather than being an investment that will have value for years to come. Unlike standards-based metadata, proprietary metadata is brittle — new requirements can mess up an existing integration configuration.

Metadata standards are like an operating system for your content.  They allow content to be used, managed and tracked across different applications.  Metadata standards create an ecosystem for content.  Metadata strategy asks: What kind of ecosystem do you want, and how are you going to develop it, so that your content is ready for any task?

Who is doing Metadata Strategy right?

Let’s look at how two well-known organizations are doing metadata strategy.  One example is current and news-worthy, while the other has a long backstory.

eBay

eBay decided that the proprietary metadata they used in their content wasn’t working, as it was preventing them from leveraging metadata to deliver better experiences for their customers. They embarked on a major program called the “Structured Data Initiative”, migrating their content to metadata based on the W3C web standard, schema.org.   Wall Street analysts have been following eBay’s metadata strategy closely over the past year, as it is expected to improve the profitability of the ecommerce giant. The adoption of metadata standards has allowed for a “more personal and discovery-based buying experience with highly tailored choices and unique selection”, according to eBay.  eBay is leveraging the metadata to work with new AI technologies to deliver a personalized homepage to each of its customers.   It is also leveraging the metadata in its conversational commerce product, the eBay ShopBot, which connects with Facebook Messenger.  eBay’s experience shows that a company shouldn’t try to adopt AI without first having a metadata strategy.

eBay’s strategy for structured data (metadata). Screenshot via eBay

Significantly, eBay’s metadata strategy adopts the W3C schema.org standard for their internal content management, in addition to using it for search engine consumers such as Google and Bing.  Plenty of publishers use schema.org for search engine purposes, but few have taken the next step like eBay to use it as the basis of their content operations.  eBay is also well positioned to take advantage of any new third party services that can consume their metadata.

Australian Government

From the earliest days of online content, the Australian government has been concerned with how metadata can improve online content availability. The Australian government isn’t a single publisher, but comprises a federation of many government websites run by different government organizations.  The governance challenges are enormous.  Fortunately, metadata standards can help coordinate diverse activity.  The AGLS metadata standard has been in use nearly 20 years to classify services provided by different organizations within the Australian government.

The AGLS metadata strategy is unique in a couple of ways.  First, it adopts an existing standard and builds upon it.  The government identified areas where existing standards didn’t offer attributes that were needed.  The government adopted the widely used Dublin Core metadata standard, but added some additional elements that were specific to their needs (for example, indicating the “jurisdiction” that the content relates to).  Starting from an existing standard, they extended it and got the W3C to recognize their extension.

Second, the AGLS strategy addresses implementation at different levels in different ways.  The metadata standard allow different publishers to describe their content consistently.  It ensures all published content is inter-operable.  Individual publishers, such as the state government of Victoria, have their own government website principles and requirements, but these mandate the use of the AGLS metadata standard.  The common standard has also promoted the availability of tools to implement the standard.  For example, Drupal, which is widely used for government websites in Australia, has a plugin that provides support for adding the metadata to content.  Currently, over 700 sites use the plugin.  But significantly, because AGLS is an open standard, it can work with any CMS, not just Drupal.  I’ve also seen a plugin for Joomla.

Australia’s example shows how content metadata isn’t an afterthought, but is a core part of content publishing.  A well-considered metadata strategy can provide benefits for many years.  Given its long history, AGLS is sure to continue to evolve to address new requirements.

Strategy focuses on the Value Metadata can offer

Occasionally, I encounter someone who warns of the “dangers” of “too much” metadata.  When I try to uncover the source of the perceived concern, I learn that the person thinks about metadata as a labor-intensive activity. They imagine they need to hand-create the metadata serially.  They think that metadata exists so they can hunt and search for specific documents. This sort of thinking is dated but still quite common.  It reflects how librarians and database administrators approached metadata in the past, as a tedious form of record keeping.  The purpose of metadata has evolved far beyond record keeping.  Metadata no longer is primarily about “findability,” powered by clicking labels and typing within form fields. It is now more about “discovery” — revealing relevant information through automation.  Leveraging metadata depends on understanding the range of uses for it.

When someone complains about too much metadata, it also signals to me that a metadata strategy is missing.  In many organizations, metadata is relegated to being an electronic checklist, instead of positioned as a valuable tool.   When that’s the case, metadata can seem overwhelming.  Organizations can have too much metadata when:

  • Too much of their metadata is incompatible, because different systems define content in different ways
  • Too much metadata is used for a single purpose, instead of serving multiple purposes.

Siloed thinking about metadata results in stovepipe systems. New metadata fields are created to address narrow needs, such as tracking or locating items for specific purposes.  Fields proliferate across various systems.  And everyone is confused how anything relates to anything else.

Strategic thinking about metadata considers how metadata can serve all the needs of the publisher, not just the needs of an individual team member or role.  When teams work together to develop requirements, they can discuss what metadata is useful for different purposes. They can identify how a single metadata item can be in different contexts.  If the metadata describes when an item was last updated, the team might consider how that metadata might be used in different contexts.  How might it be used by content creators, by the analytics team, by the UX design team, and by the product manager?

Publishers should ask themselves how they can do more for their customers by using metadata.  They need to think about the productivity of their metadata: making specific metadata descriptions do more things that can add value to the content.  And they need a strategy to make that happen.

— Michael Andrews

Categories
Intelligent Content

A Visual Approach to Learning Schema.org Metadata

Everyone involved with publishing web content, whether a writer, designer, or developer, should understand how  metadata can describe content. Unfortunately, web metadata has a reputation, not entirely undeserved, for being a beast to understand. My book, Metadata Basics for Web Content, explains the core concepts of metadata. This post is for those ready to take the next step: to understand how a metadata standard relates to their specific content.

Visualizing Metadata

How can web teams make sense of voluminous and complex metadata documentation?  Documentation about web metadata is generally written from a developer perspective, and can be hard for non-techies to comprehend. When relying on detailed documentation, it can be difficult for the entire web team to have a shared understanding of what metadata is available.  Without such a shared understanding, teams can’t have a meaningful discussion of what metadata to use in their content, and how to take advantage of it to support their content goals.

The good news is that metadata can be visualized.  I want to show how anyone can do this, with specific reference to schema.org, the most important web metadata standard today. The technique can be useful not only for content and design team members who lack a technical background, but also for developers.

Everyone who works with a complex metadata standard such as schema.org faces common challenges:

  1. A large and growing volume of entities and properties to be aware of
  2. Cases where entities and properties sometimes have overlapping roles that may not be immediately apparent
  3. Terminology that can be misunderstood unless the context is comprehended correctly
  4. The prevalence of many horizontal linkages between entities and properties, making navigation through documentation a pogo-like experience.

First, team members need to understand what kinds of things associated with their content can be described by a metadata standard.  Things mentioned in content are called entities.  Entities have properties.  Properties describe values, or  they express the relationship of one entity to another.

Entities are classified according to types, which range from general to specific.  Entity types form a hierarchy that can be expressed as a tree.  All entities derive from the parent entity, called Thing.  Currently, schema.org has over 600 entity types.  Dan Brickley, an engineer at Google who is instrumental in the development of schema.org, has helpfully developed an interactive visualization in D3 (a Javascript library for data visualization), presented as a radial tree, which shows the distribution of entity types within schema.org.  The tool is a helpful way to explore the scope of entities addressed, and the different levels of granularity available.

Screenshot of entity tree, available at http://bl.ocks.org/danbri/raw/1c121ea8bd2189cf411c/

D3 is a great visualization library, but it requires both knowledge and time to code.  For our  second kind of visualization, we’ll rely on a much simpler tool.

Graphs of Linked Data

Web metadata can connect or link different items of information together, forming a graph of knowledge.  Graphs are ideal to visualize.  By visualizing this structure, content teams can see how entities have properties that relate to other entities, or that have different kinds of values.  This kind of visualization is known as a concept map.

Let’s visualize a common topic for web content: product information.  Many things can be said about a product: who is it from, what is like, and how much it costs.  I’ve created the below graph using an affordable and easy-to-use concept mapping app called Conceptorium (though other graphic tools can be used).  Working from the schema.org documentation for products, I’ve identified some common properties and relationships for products.  Entities (things described with metadata) are in green boxes, while literal values (data you might see about them) are in salmon colored boxes.  Properties (attributes or qualities of things) are represented by lines with arrows, with the name of the property next to the line.

Concept map of schema.org entities and properties related to products

The graph illustrates some key issues in schema.org that web teams need to understand:

  • The boundary between different entity types that address similar properties
  • The difference between different instances of the same entity type
  • The directional relationships of properties.

Entity Boundaries

Concept maps help us see the boundaries between related entity types.  A product, shown in the center of our graph, has various properties, such as a name, a color, and an average user rating (AggregateRating).  But when the product is offered for sale, properties associated with the conditions of sale need to be expressed through the Offer entity.  So in schema.org, we can see that products don’t have prices or warranties; offers have prices or warranties.  Schema.org allows publishers to express an offer without providing granular details about a product.  Publishers can note the name and product code (referred to as gtin14) in the offer together with the price, and not need to use the Product entity type at all.  The Offer and Product entity types both use the name and product code (gtin14) properties.   So when discussing a product, the team needs to decide if the content is mostly about the terms of sale (the Offer), or about the features of the product (the Product), or both.

Instances and Entity Types

Concept maps help us distinguish different instances of entities, as well as cases where instances are performing different roles. From the graph, we can see that a product can be related to other products.  This can be hard to grasp in the documentation, where an entity type is presented as both the subject and the object of various properties.  Graphs can show how there can be different product instances that may have different values for the same properties (e.g., all products have a name, but each product has a different name).  In our example, we can see that on product at the bottom right is a competitive product to the product in the center.  We can compare the average rating of the competitor product with the average ratings of the main product.  We can also see another related product, which is an accessory for the main product.  This relationship can help identify products to display as complements.

An entity type provides a list of properties available to describe something.  Web content may discuss numerous, related things that all belong to the same entity type.  In our example, we see several instances of the Organization entity type.  In one case, an organization owns a product (perhaps a tractor).  In another case, the Organization is a seller.  In a third case, the Organization is a manufacturer of the product. Organizations can have different roles relating to an entity.

Content teams need to identify in their metadata which Organizations are responsible for which role.  Is the seller the manufacturer of the product, or are two different Organizations involved?  Our example illustrates how a single Person can be both an owner and a seller of a Product.

What Properties Mean

Concept maps can help web teams see what properties really represent.  Each line with an arrow has a label, which is the name of the property associated with an entity type.  Properties have a direction, indicated by the arrow.  The names of properties don’t always directly translate into an English verb, even when they at first appear to.  For example, in English, Product > manufacturer > Organization doesn’t make much sense. The product doesn’t make the organization, but rather the organization manufactures the product.  It’s important to pay attention to the direction of a property: what entity type is expected — especially when these relationships seem inverted to how we think about them normally.

Many properties are adjectives or even nouns, and need helper verbs such as “has” to make sense.  If the property describes another entity, then that entity can involve many more properties to describe additional dimensions of that entity.  So we might say that “a Product has a manufacturer which is an Organization (having a name, address, etc.)”  That’s not very elegant in English, but the diagram keeps the focus on the nature of the relationships described.

Broader Benefits of Concept Mapping for Content Strategy

So far, we’ve discussed how concept maps can help web teams understand what the metadata means, and how they need to organize their metadata descriptions.  Concept maps can also help web teams plan their content.  Teams can use maps to decide what content to present to audiences, and even what content to create that audiences may be interested in.

Content Planning

Jarno van Driel, a Dutch SEO expert, notes that many publishers treat schema.org as “an afterthought.”  Instead, Jarno argues, publishers should consult the properties available in schema.org to plan their content.  Schema.org is a collective project, where different contributors identify properties relating to entities they would like to mention that they feel would be of interest to audiences.  Schema.org can be thought of as a blueprint for information you can provide audiences about different things you publish.  While our example concept map for product properties is simplified to conserve space, a more complete map would show many more properties, some of which you might decide to address in your content.  For example, audiences might want to know about the material, the width, or the weight of the product — properties available in schema.org that publishers may not have considered including in their content.

Content Design and Interaction Design

Concept maps can also reveal relationships between different levels of information that publishers can present.  Consider how this information is displayed on the screen.  Audiences may want to compare different values. They may want to know all the values for a specific property (such as all the colors available), or they want to compare the values for a property of two different instances (average rating of two different products).

Concept maps can reveal qualifications about the content (e.g., an Offer may be qualified by an area served).  Values (shown in salmon) can be sorted and ranked.  Concept maps also help web teams decide on the right level of detail to present.  Do they want to show average ratings for a specific product, or a brand overall?  By consulting the map, they can consider what data is available, and what data would be most useful to audiences.

Concept map app shows columns of entities and values, which allow exploration of relationships

Conclusion

Creating a concept map requires effort, but is rewarding.  It requires you to compare the specification of the standard with your representation of it, to check that relationships are known and understood correctly.  It allows you to see some characteristics, such as properties used by more than one entity. It can help content teams see the bigger picture of what’s available in schema.org to describe their content, so that the team can collectively agree to metadata requirements relating to their web content.  If you want to understand schema.org more completely, to know how it relates to the content you publish, creating a concept map is a good place to start.

— Michael Andrews