Categories
Content Engineering

Your Content Needs a Metadata Strategy

What’s your metadata strategy?  So few web publishers have an articulated metadata strategy that a skeptic may think I’ve made up the concept, and coined a new buzzword.  Yet almost a decade ago, Kristina Halvorson explicitly cited metadata strategy as one of “a number of content-related disciplines that deserve their own definition” in her seminal  A List Apart article, “The Discipline of Content Strategy”.   She also cites metadata strategy in her widely read book on content strategy.  It’s been nearly a decade since Kristina’s article, but the discipline of content strategy still hasn’t given metadata strategy the attention it deserves.

A content strategy, to have a sustained impact, needs a metadata strategy to back it up.  Without metadata strategy, content strategy can get stuck in a firefighting mode.  Many organizations keep making the same mistakes with their content, because they ask overwhelmed staff to track too many variables.  Metadata can liberate staff from checklists, by allowing IT systems to handle low level details that are important, but exhausting to deal with.  Staff may come and go, and their enthusiasm can wax and wane.  But metadata, like the Energizer bunny, keeps performing: it can keep the larger strategy on track. Metadata can deliver consistency to content operations, and can enhance how content is delivered to audiences.

A metadata strategy is a plan for how a publisher can leverage metadata to accomplish specific content goals.  It articulates what metadata publishers need for their content, how they will create that metadata, and most importantly, how both the publisher and audiences can utilize the metadata.  When metadata is an afterthought, publishers end up with content strategies that can’t be implemented, or are implemented poorly.

The Vaporware Problem: When you can’t implement your Plan

A content strategy may include many big ideas, but translating those ideas into practice can be the hardest part.  A strategy will be difficult to execute when its documentation and details are too much for operational teams to absorb and follow.  The group designing the content strategy may have done a thorough analysis of what’s needed.  They identified goals and metrics, modeled how content needs to fit together, and considered workflows and the editorial lifecycle.  But large content teams, especially when geographically distributed, can face difficulties implementing the strategy.  Documentation, emails and committees are unreliable ways to coordinate content on a large scale.  Instead, key decisions should be embedded into the tools the team uses wherever possible.  When their tools have encoded relevant decisions, teams can focus on accomplishing their goals, instead of following rules and checklists.

In the software industry, vaporware is a product concept that’s been announced, but not built. Plans that can’t be implemented are vaporware. Content strategies are sometimes conceived with limited consideration of how to implement them consistently.  When executing a content strategy, metadata is where the rubber hits the road.  It’s a key ingredient for turning plans into reality.  But first, publishers need to have the right metadata in place before they can use it to support their broader goals.

Effective large-scale content governance is impossible without effective metadata, especially administrative metadata.  Without a metadata strategy, publishers tend to rely on what their existing content systems offer them, instead of asking first what they want from their systems.  Your existing system may provide only some of the key metadata attributes you need to coordinate and manage your content. That metadata may be in a proprietary format, meaning it can’t be used by other systems. The default settings offered by your vendors’ products are likely not to provide the coordination and flexibility required.

Consider all the important information about your content that needs to be supported with metadata.  You need to know details about the history of the content (when it was created, last revised, reused from elsewhere, or scheduled for removal), where the content came from (author, approvers, licensing rights for photos, or location information for video recordings), and goals for the content (intended audiences, themes, or channels).  Those are just some of the metadata attributes content systems can use to manage routine reporting, tracking, and routing tasks, so web teams can focus on tasks of higher value.

If you have grander visions for your content, such as making your content “intelligent”, then having a metadata strategy becomes even more important.  Countless vendors are hawking products that claim to add AI to content.  Just remember—  Metadata is what makes content intelligent: ready for applications (user decisions), algorithms (machine decisions) and  analytics (assessment).  Don’t buy new products without first having your own metadata strategy in place.  Otherwise you’ll likely be stuck with the vendor’s proprietary vision and roadmap, instead of your own.

Lack of Strategy creates Stovepipe Systems

A different problem arises when a publisher tries to do many things with its content, but does so in a piecemeal manner.  Perhaps a big bold vision for a content strategy, embodied in a PowerPoint deck, gets tossed over to the IT department.  Various IT members consider what systems are needed to support different functionality.  Unless there is a metadata strategy in place, each system is likely to operate according to its own rules:

  • Content structuring relies on proprietary templates
  • Content management relies on proprietary CMS data fields
  • SEO relies on meta tags
  • Recommendations rely on page views and tags
  • Analytics rely on page titles and URLs
  • Digital assets rely on proprietary tags
  • Internal search uses keywords and not metadata
  • Navigation uses a CMS-defined custom taxonomy or folder structure
  • Screen interaction relies on custom JSON
  • Backend data relies on a custom data model.

Sadly such uncoordinated labeling of content is quite common.

Without a metadata strategy, each area of functionality is considered as a separate system.  IT staff then focus on systems integration: trying to get different systems to talk to each other.  In reality, they have a collection of stovepipe systems, where metadata descriptions aren’t shared across systems.  That’s because various systems use proprietary or custom metadata, instead of using common, standards-based metadata.  Stovepipe systems lack a shared language that allows interoperability.  Attributes that are defined by your CMS or other vendor system are hostage to that system.

Proprietary metadata is far less valuable than standards-based metadata.  Proprietary metadata can’t be shared easily with other systems and is hard or impossible to migrate if you change systems.  Proprietary metadata is a sunk cost that’s expensive to maintain, rather than being an investment that will have value for years to come. Unlike standards-based metadata, proprietary metadata is brittle — new requirements can mess up an existing integration configuration.

Metadata standards are like an operating system for your content.  They allow content to be used, managed and tracked across different applications.  Metadata standards create an ecosystem for content.  Metadata strategy asks: What kind of ecosystem do you want, and how are you going to develop it, so that your content is ready for any task?

Who is doing Metadata Strategy right?

Let’s look at how two well-known organizations are doing metadata strategy.  One example is current and news-worthy, while the other has a long backstory.

eBay

eBay decided that the proprietary metadata they used in their content wasn’t working, as it was preventing them from leveraging metadata to deliver better experiences for their customers. They embarked on a major program called the “Structured Data Initiative”, migrating their content to metadata based on the W3C web standard, schema.org.   Wall Street analysts have been following eBay’s metadata strategy closely over the past year, as it is expected to improve the profitability of the ecommerce giant. The adoption of metadata standards has allowed for a “more personal and discovery-based buying experience with highly tailored choices and unique selection”, according to eBay.  eBay is leveraging the metadata to work with new AI technologies to deliver a personalized homepage to each of its customers.   It is also leveraging the metadata in its conversational commerce product, the eBay ShopBot, which connects with Facebook Messenger.  eBay’s experience shows that a company shouldn’t try to adopt AI without first having a metadata strategy.

eBay’s strategy for structured data (metadata). Screenshot via eBay

Significantly, eBay’s metadata strategy adopts the W3C schema.org standard for their internal content management, in addition to using it for search engine consumers such as Google and Bing.  Plenty of publishers use schema.org for search engine purposes, but few have taken the next step like eBay to use it as the basis of their content operations.  eBay is also well positioned to take advantage of any new third party services that can consume their metadata.

Australian Government

From the earliest days of online content, the Australian government has been concerned with how metadata can improve online content availability. The Australian government isn’t a single publisher, but comprises a federation of many government websites run by different government organizations.  The governance challenges are enormous.  Fortunately, metadata standards can help coordinate diverse activity.  The AGLS metadata standard has been in use nearly 20 years to classify services provided by different organizations within the Australian government.

The AGLS metadata strategy is unique in a couple of ways.  First, it adopts an existing standard and builds upon it.  The government identified areas where existing standards didn’t offer attributes that were needed.  The government adopted the widely used Dublin Core metadata standard, but added some additional elements that were specific to their needs (for example, indicating the “jurisdiction” that the content relates to).  Starting from an existing standard, they extended it and got the W3C to recognize their extension.

Second, the AGLS strategy addresses implementation at different levels in different ways.  The metadata standard allow different publishers to describe their content consistently.  It ensures all published content is inter-operable.  Individual publishers, such as the state government of Victoria, have their own government website principles and requirements, but these mandate the use of the AGLS metadata standard.  The common standard has also promoted the availability of tools to implement the standard.  For example, Drupal, which is widely used for government websites in Australia, has a plugin that provides support for adding the metadata to content.  Currently, over 700 sites use the plugin.  But significantly, because AGLS is an open standard, it can work with any CMS, not just Drupal.  I’ve also seen a plugin for Joomla.

Australia’s example shows how content metadata isn’t an afterthought, but is a core part of content publishing.  A well-considered metadata strategy can provide benefits for many years.  Given its long history, AGLS is sure to continue to evolve to address new requirements.

Strategy focuses on the Value Metadata can offer

Occasionally, I encounter someone who warns of the “dangers” of “too much” metadata.  When I try to uncover the source of the perceived concern, I learn that the person thinks about metadata as a labor-intensive activity. They imagine they need to hand-create the metadata serially.  They think that metadata exists so they can hunt and search for specific documents. This sort of thinking is dated but still quite common.  It reflects how librarians and database administrators approached metadata in the past, as a tedious form of record keeping.  The purpose of metadata has evolved far beyond record keeping.  Metadata no longer is primarily about “findability,” powered by clicking labels and typing within form fields. It is now more about “discovery” — revealing relevant information through automation.  Leveraging metadata depends on understanding the range of uses for it.

When someone complains about too much metadata, it also signals to me that a metadata strategy is missing.  In many organizations, metadata is relegated to being an electronic checklist, instead of positioned as a valuable tool.   When that’s the case, metadata can seem overwhelming.  Organizations can have too much metadata when:

  • Too much of their metadata is incompatible, because different systems define content in different ways
  • Too much metadata is used for a single purpose, instead of serving multiple purposes.

Siloed thinking about metadata results in stovepipe systems. New metadata fields are created to address narrow needs, such as tracking or locating items for specific purposes.  Fields proliferate across various systems.  And everyone is confused how anything relates to anything else.

Strategic thinking about metadata considers how metadata can serve all the needs of the publisher, not just the needs of an individual team member or role.  When teams work together to develop requirements, they can discuss what metadata is useful for different purposes. They can identify how a single metadata item can be in different contexts.  If the metadata describes when an item was last updated, the team might consider how that metadata might be used in different contexts.  How might it be used by content creators, by the analytics team, by the UX design team, and by the product manager?

Publishers should ask themselves how they can do more for their customers by using metadata.  They need to think about the productivity of their metadata: making specific metadata descriptions do more things that can add value to the content.  And they need a strategy to make that happen.

— Michael Andrews

Categories
Content Integration

The Future of Content is Multimodal

We’re entering a new era of digital transformation: every product and service will become connected, coordinated, and measured. How can publishers prepare content that’s ready for anything?  The stock answer over the past decade has been to structure content.  This advice — structuring content — turns out to be inadequate.  Disruptive changes underway have overtaken current best practices for making content future-ready.  The future of content is no longer about different formats and channels.  The future of content is about different modes of interaction.  To address this emerging reality, content strategy needs a new set of best practices centered on the strategic use of metadata.  Metadata enables content to be multimodal.

What does the Future of Content look like?

For many years, content strategists have discussed how people need their content in terms of making it available in any format, at any time, through any channel that the user wanted.  For a while, the format-shifting, time-shifting, and channel-shifting seemed like it could be managed.  Thoughtful experts advocated ideas such as single-sourcing and COPE (create once, publish everywhere) which seemed to provide a solution to the proliferation of devices.  And it did, for a while.  But what these approaches didn’t anticipate was a new paradigm.  Single-sourcing and COPE assume all content will be delivered to a screen (or its physical facsimile, paper).  Single-sourcing and COPE didn’t anticipate screenless content.

Let’s imagine how people will use content in the very near future — perhaps two or three years from now.  I’ll use the classic example of managed content: a recipe.  Recipes are structured content, and provide opportunities to search according to different dimensions.  But nearly everyone still imagines recipes as content that people need to read.  That assumption no longer is valid.

Cake made by Meredith via Flickr (CC BY-SA 2.0)

In the future, you may want to bake a cake, but you might approach the task a bit differently.  Cake baking has always been a mixture of high-touch craft and low-touch processes.  Some aspects of cake baking require the human touch to deliver the best results, while other steps can be turned over to machines.

Your future kitchen is not much different, except that you have a speaker/screen device similar to the new Amazon Echo Show, and also a smart oven that’s connected to  the Internet of Things in the cloud.

You ask the voice assistant to find an appropriate cake recipe based on wishes you express.  The assistant provides a recipe, which has a choice on how to prepare the cake.  You have a dialog with the voice assistant about your preferences.  You can either use a mixer, or hand mix the batter.  You prefer hand mixing, since this ensures you don’t over-beat the eggs, and keep the cake light.  The recipe is read aloud, and the voice assistant asks if you’d like to view a video about how to hand-beat the batter.  You can ask clarifying questions.  As the interaction progresses, the recipe sends a message to the smart oven to tell it to preheat, and provides the appropriate temperature.  There is no need for the cook to worry about when to start preheating the oven and what temperature to set: the recipe can provide that information directly to the oven.  The cake batter is placed in the ready oven, and is cooked until the oven alerts you that the cake is ready.  The readiness is not simply a function of elapse time, but is based on sensors detecting moisture and heat.  When the cake is baked, it’s time to return giving it the human touch.  You get instructions from the voice/screen device on how to decorate it.  You can ask questions to get more ideas, and tips on how to execute the perfect finishing touches.  Voila.

Baking a cake provides a perfect example of what is known in human-computer interaction as a multimodal activity.  People seamlessly move between different digital and physical devices.  Some of these are connected to the cloud, and some things are ordinary physical objects.  The essential feature of multimodal interaction is that people aren’t tied to a specific screen, even if it is a highly mobile and portable one.  Content flows to where it is needed, when it is needed.

The Three Interfaces

Our cake baking example illustrates three different interfaces (modes) for exchanging content:

  1. The screen interface, which SHOWS content and relies on the EYES
  2. The conversational interface, which TELLS and LISTENS, and relies on the EARS and VOICE
  3. The machine interface, which processes INSTRUCTIONS and ALERTS, and relies on CODE.

The scenario presented is almost certain to materialize.  There are no technical or cost impediments. Both voice interaction and smart, cloud-connected appliances are moving into the mainstream. Every major player in the world of technology is racing to provide this future to consumers. Conversational UX is an emerging discipline, as is ambient computing that embeds human-machine interactions in the physical world. The only uncertainty is whether content will be ready to support these scenarios.

The Inadequacy of Screen-based Paradigms

These are not the only modes that could become important in the future: gestures, projection-based augmented reality (layering digital content over physical items), and sensor-based interactions could become more common.  Screen reading and viewing will no longer be the only way people use content.  And machines of all kinds will need access to the content as well.

Publishers, anchored in a screen-based paradigm, are unprepared for the tsunami ahead.  Modularizing content is not enough.  Publishers can’t simply write once, and publish everywhere.  Modular content isn’t format-free.  That’s because different modes require content in different ways.  Modes aren’t just another channel.  They are fundamentally different.

Simply creating chunks or modules of content doesn’t work when providing content to platforms that aren’t screens:

  • Pre-written chunks of content are not suited to conversational dialogs that are spontaneous and need to adapt.  Natural language processing technology is needed.
  • Written chunks of content aren’t suited to machine-to-machine communication, such as having a recipe tell an oven when to start.  Machines need more discrete information, and more explicit instructions.

Screen-based paradigms presume that chunks of content would be pushed to audiences.  In the screen world, clicking and tapping are annoyances, so the strategy has been to assemble the right content at delivery.  Structured content based on chunks or modules was never designed for rapid iterations of give and take.

Metadata Provides the Solution for Multimodal Content

Instead of chunks of content, platforms need metadata that explains the essence of the content.  The metadata allows each platform to understand what it needs to know, and utilize the essential information to interact with the user and other devices.  Machines listen to metadata in the content.  The metadata allows the voice interface and oven to communicate with the user.

These are early days for multimodal content, but the outlines of standards are already in evidence  (See my book, Metadata Basics for Web Content, for a discussion of standards).   To return to our example, recipes published on the web are already well described with metadata.  The earliest web standard for metadata, microformats, provided a schema for recipes, and schema.org, today’s popular metadata standard, provides a robust set of properties to express recipes.  Already millions of online recipes are described with metadata standards, so the basic content is already in place.

The extra bits needed to allow machines to act on recipe metadata are now emerging.  Schema.org provides a basic set of actions that could be extended to accommodate IoT actions (such as Bake).  And schema.org is also establishing a HowTo entity that can specify more specific instructions relating to a recipe, that would allow appliances to act on the instructions.

Metadata doesn’t eliminate the need for written text or video content.  Metadata makes such content more easily discoverable.  One can ask Alexa, Siri, or Google to find a recipe for a dish, and have them read aloud or play the recipe.  But what’s needed is the ability to transform traditional stand-alone content such as articles or videos into content that’s connected and digitally native.  Metadata can liberate the content from being a one-way form of communication, and transform it into being a genuine interaction.  Content needs to accommodate dialog.  People and machines need to be able to talk back to the content, and the content needs to provide an answer that makes sense for the context.  When the oven says the cake is ready, the recipe needs to tell the cook what to do next.  Metadata allows that seamless interaction between oven, voice assistant and user to happen.

Future-ready content needs to be agnostic about how it will be used.  Metadata makes that future possible.  It’s time for content strategists to develop comprehensive metadata requirements for their content, and have a metadata strategy that can support their content strategy in the future. Digital transformation is coming to web content. Be prepared.

— Michael Andrews

Categories
Storytelling

Writers Should Care About Metadata

Writers and computers share a common trait: a fussiness about words.  Writers choose their words with care. Computers are selective about the words they notice as well.

Metadata helps computers understand writing.  Writers should care about metadata.  Metadata influences how their writing connects to audiences.  Metadata is an important editorial tool, though writers often don’t appreciate the value it offers.

I can hear some writers saying: “Hold on! — That’s not my job.  I don’t know anything about metadata  — I studied literature in college.”  Metadata sounds like the antithesis of creative flair.  And in some ways it is.  I want to assure my friends who are writers that I’m not trying to turn them into geeks.  Instead, I want to suggest that by having a little understanding of the geeky side of content, they can be more successful as writers.

Metadata, put very simply, is computer code that explains the meaning of content.  That computer code can seem forbidding.  But such code offers practical benefits to writers, and helps make content more interesting.

Writers should think about metadata as a form of communication, just as pantomime and poetry are.  Metadata expresses ideas that are conveyed to audiences.

Metadata is a special form of communication, however,  Unlike pantomime, metadata is purpose-built for the web.

Metadata as Describing

The most common type of metadata is the description.  All web articles have META descriptions, which are short pithy statements summarizing the article.  These statements often appear below the article title in Google search results.  How well they are written can influence whether someone clicks on the link to read the article.  Descriptions, by their nature, involve editorial decisions.

Another important description relates to photographs.  Writers need to tell people what’s in a photograph.  If the description is boring and vague, why would people want to view the photo?  Describing visuals is becoming more important as people switch off their screens, and have content read aloud to them.

Metadata plays a valuable editorial role.  It indicates what’s important about the content.  Let’s consider some areas where metadata can help writers.

Suppose you are a film critic.  You’ve quit a boring job writing training manuals about industrial equipment, and can finally use your literature degree in your work.  Even as a film critic, you can amplify your writing by using metadata.  Contrary to what you might expect, metadata can help writers tell stories.

Let’s imagine you want to review a new French film about the painter Paul Cézanne.  The first conundrum is deciding how to refer to the film.  Do you use the original French title, Cézanne et moi, or the translated English title, Cezanne and I?  Fortunately, by using metadata, you can skirt this decision, by including the titles in both languages.  Metadata can indicate the language of content.  Someone in France could use language metadata to locate English language reviews of this French film, and compare them with the French language reviews.  Do French and English speaking critics rate the film in the same way?

Another decision might be how to categorize the theme of the film.  As a writer, you want your review to appear with other reviews about similar themes.  Is the film about friendship, or is it a buddy movie?   These terms relate to a concept  in metadata called controlled vocabulary values.  The writer needs to decide whether the theme is more about the friendship between two men, or about friendship generally.  The decision will influence who sees the review, based on their interests and expectations.

Metadata can describe many aspects of a film, such as all the cast and crew involved.  Writers might wonder, how interesting is all this information?

From the audience perspective, some information will be interesting to almost everyone, while other information is of interest only to committed fans.  For some, detailed information seems like a list of dry facts.  But for those who enjoy a film, the credits at the end provide extra value that enriches their experience.

We can see the different editorial dimensions of metadata in the IMDb entry for the Cézanne film.  (IMDb, the Amazon owned database, uses metadata extensively.  But I’ll hide the code, and show only what’s presented on the screen.)

'Storyline' metadata from IMDb description of film Cezzane and I. (screenshot via IMDb)
‘Storyline’ metadata from IMDb description of film Cezanne and I. (screenshot via IMDb)

First, we have the storyline, or plot summary.  Several sentences describe the film. To audiences, this is what’s important.  Does the film sound interesting or boring?  What is it really about (beyond friendship)?  Audiences need to know if the film is potentially interesting before they will care enough to read a critique of it.

Metadata and Prose

The metadata for the storyline is prose, in contrast to the list of names of cast members.  Some content strategists consider such prose as an unstructured “blob” — long passages, full of details, that aren’t broken out into a list or table.  But it is a mistake to view prose content as being beyond the reach of metadata.  Structuring content by breaking it into sections is a separate activity from adding metadata to content. Writers don’t need to “structure” their content into a list, table or other tightly defined unit to take advantage of metadata. Writers can, and should, add metadata to their prose.  By doing so, they will highlight some of the most interesting material.  Metadata is not a straight-jacket that limits how writers express their  perspectives.  Writers can write words, sentences and paragraphs as they please, and then add metadata to highlight important people, places and things mentioned in their text.

We can see in the storyline that the film concerns not only the painter Paul Cézanne (which we knew from the film title), but also the writer Emile Zola.  After reading the storyline, people may be interested in learning more about the film, or may want to learn more about the subject of the film.  Metadata can link this review to other writings related to the film in some way.  Perhaps readers want to read reviews about other films concerning Paul Cézanne, or concerning the same time period.  Metadata acts as a curator: linking to writings on related topics.

details
‘Details’ metadata from IMDb entry for Cezanne and I film (screenshot via IMDb)

Let’s turn to the more fact-oriented metadata.   To many writers, this material is dull.  Because it is presented in a list or in a table, and deals with minutiae such as film duration and release date, the content seems to offer little editorial interest.  Unless you are a big fan of someone in the film, or collect obscure facts to win pub quizzes, why would someone care about these details?

Stories from Metadata

For the writer, such detailed metadata presents an opportunity to tell more stories.  It may not be immediately obvious, but some of the details are unusual, or notable for some reason.  Since these details are described in a way computers can understand, the writer can easily compare these details with details for other films.  The writer can tell readers what’s significant about the film — in terms of casting, location, historical firsts, or contribution to overall performance for different kinds of film.

Metadata offers writers a lens to think about different dimensions of a topic.  By identifying various characteristics, metadata highlights connections between two or more of them.  This film is one of a number of friendship-themed movies that use the musical composition ”Roses of Picardy” by Haydn Wood.  (Other films include A Passage to India, and Charlie Brown’s Halloween special.)  What’s going on with this use of music?  There’s a story there, somewhere.

Metadata can bring attention to details that might not otherwise be noticed.  Writers can use metadata to discover and highlight details of interest to audiences.

Metadata can be a writer’s friend. It can help writers tell stories. Writers, for their part, can help computers appreciate their words and ideas by using metadata.

To become friends with metadata, writers will want to know more about how to create metadata and include it in their content.  They can learn about how that’s done in my new book, Metadata Basics for Web Content.  Read the book, so the content you write will be content that is read.  Make it your job to identify metadata that will connect audiences with your writing.

— Michael Andrews