Categories
Content Integration

Multi-source Publishing: the Next Evolution

Most organizations that create web content primarily focus on how to publish and deliver the content to audiences directly.  In this age where “everyone is a publisher,” organizations have become engrossed in how to form a direct relationship with audiences, without a third party intermediary.  As publishers try to cultivate audiences, some are noticing that audience attention is drifting away from their website.  Increasingly, content delivery platforms are collecting and combining content from multiple sources, and presenting such integrated content to audiences to provide a more customer-centric experience.  Publishers need to consider, and plan for, how their content will fit in an emerging framework of integrated, multi-source publishing.

The Changing Behaviors of Content Consumption: from bookmarks to snippets and cards

Bookmarks were once an important tool to access websites. People wanted to remember great sources of content, and how to get to them.  A poster child for the Web 2.0 era was a site called Delicious, which combined bookmarking with a quaint labelling approach called a folksonomy.  Earlier this year, Delicious, abandoned and forgotten, was sold at a fire sale for a few thousand dollars for the scrap value of its legacy data.

People have largely stopped bookmarking sites.  I don’t even know how to use them on my smartphone.  It seems unnecessary to track websites anymore.  People expect information they need to come to them.  They’ve become accustomed to seeing snippets and cards that surface in lists and timelines within their favorite applications.

Delicious represents the apex of the publisher centric era for content.  Websites were king, and audiences collected links to them.

Single Source Publishing: a publisher centric approach to targeting information

In the race to become the best source of information — the top bookmarked website — publishers have struggled with how a single website can successfully please a diverse range of audience needs.  As audience expectations grew, publishers sought to create more specific web pages that would address the precise informational needs of individuals.  Some publishers embraced single source publishing.  Single source publishing assembles many different “bundles” of content that all come from the same publisher.  The publisher uses a common content repository (a single source) to create numerous content variations.  Audiences benefit when able to read custom webpages that address their precise needs.  Provided the audience locates the exact variant of information they need, they can bookmark it for later retrieval.

By using single source publishing, publishers have been able to dramatically increase the volume of webpages they produce.  That content, in theory, is much more targeted.  But the escalating volume of content has created new problems.  Locating specific webpages with relevant information in a large website can be as challenging as finding relevant information on more generic webpages within a smaller website.  Single source publishing, by itself, doesn’t solve the information hunting problem.

The Rise of Content Distribution Platforms: curated content

As publishers focused on making their websites king of the hill, audiences were finding new ways to avoid visiting websites altogether.  Over the past decade, content aggregation and distribution platforms have become the first port of call for audiences seeking information.  Such platforms include social media such as Facebook, Snapchat, Instagram and Pinterest, aggregation apps such as Flipboard and Apple News, and a range of Google products and apps.  In many cases, audiences get all the information they need while within the distribution or aggregation platform, with no need to visit the website hosting the original content.

Hipmunk aggregates content from other websites, as well as from other aggregators.

The rise of distribution platforms mirrors broader trends toward customer-driven content consumption. Audiences are reluctant to believe that any single source of content provides comprehensive and fully credible information.  They want easy access to content from many sources.  An early example of this trend were travel aggregators that allow shoppers to compare airfares and hotel rates from different vendor websites.  The travel industry has fought hard to counter this trend, with limited success.  Audiences are reluctant to rely on a single source such as an airline or hotel website to make choices about their plans.  They want options.  They want to know what different websites are offering, and compare these options.  They also want to know the range of perspectives on a topic. Various review and opinion websites such as Rotten Tomatoes present the judgment from different websites.

The movie review site Rotten Tomatoes republishes snippets of reviews from many websites.

Another harbinger of the future has been the evolution of Google search away from its original purpose of presenting links to websites, and toward providing answers.  Consider Google’s “featured snippets,” which interprets user queries, and provides a list of related questions and answers.   Featured snippets are significant in two respects :

  1. They present answers on the Google platform, instead of taking the user to the publisher’s website.
  2. They show different related questions and answers, meaning the publisher has less control framing how users consider a topic.
Google’s “featured snippets” presents related questions together, with answers using content extracted directly from different websites.

Google draws on content from many different websites, and combines the content together.  Google scrapes the content from different webpages, and reuses content as it decides will be in the best interest of Google searchers.  Website publishers can’t ask Google to be in a featured snippet.  They need to opt-out with a  <meta name="googlebot" content="nosnippet"> if they don’t want their content used by Google in such snippets.  These developments illustrate  how publishers no longer control exactly how their content is viewed.

A Copernican Revolution Comes to Publishing

Despite lip service to the importance of the customer, many publishers still have a publisher centric mentality that imagines customers orbiting around them.  The publisher considers itself as the center of the customer’s universe.  Nothing has changed: customers are seeking out the publisher’s content, visiting the publisher’s website.  Publishers still expect customers to come to them. The customer is not at the center of the process.

Publishers do acknowledge the role of Facebook and Google in driving traffic, and more publish directly on these platforms.  Yet such measures fall short of genuine customer-centricity.  Publishers still want to talk uninterrupted, instead of contributing information that will fill-in the gaps in the audience’s knowledge and understanding.  They expect audiences to read or view an entire article or presentation, even if that content contains information the audience knows already.

A publisher centric mentality assumes they can be, and will be, the one-best source of information, covering everything important about the topic.  The publisher decides what they believe the audience needs to know, then proceeds to tell the audience about all those things.

A customer-centric approach to content, in contrast, expects and accepts that audiences will be viewing many sources of content.  It recognizes that no one source of content will be complete or definitive.  It assumes that the customer already has prior knowledge about a topic, which may have been acquired from other sources.  It also assumes that audiences don’t want to view redundant information.

Let’s consider content needs from an audience perspective.  Earlier this month I was on holiday in Lisbon.  I naturally consulted travel guides to the city from various sources such as Lonely Planet, Rough Guides and Time Out.  Which source was best?  While each source did certain things slightly better than their rivals, there wasn’t a big difference in the quality of the content.  Travel content is fairly generic: major sources approach information in much the same way.  But while each source was similar, they weren’t identical.  Lisbon is a large enough city that no one guide could cover it comprehensively.  Each guide made its own choices about what specific highlights of the city to include.

As a consumer of this information, I wanted the ability to merge and compare the different entries from each source.  Each source has a list of “must see” attractions.  Which attractions are common to all sources (the standards), and which are unique to one source (perhaps more special)?  For the specific neighborhood where I was staying, each guide could only list a few restaurants.  Did any restaurants get multiple mentions, which perhaps indicated exquisite food, but also possibly signaled a high concentration of tourists? As a visitor to a new city, I want to know about what I don’t know, but also want to know about what others know (and plan to do), so I can plan with that in mind.  Some experiences are worth dealing with crowds; others aren’t.

The situation with travel content applies to many content areas.  No one publisher has comprehensive and definitive information, generally speaking.  People by and large want to compare perspectives from different sources.  They find it inconvenient to bounce between different sources.  As the Google featured snippets example shows, audiences gravitate toward sources that provide convenient access to content drawing on multiple sources.

A publisher-centric attitude is no longer viable. Publishers that expect audiences to read through monolithic articles on their websites will find audiences less inclined to make that effort.  The publishers that will win audience attention are those who can unbundle their content, so that audiences can get precisely want they want and need (perhaps as a snippet on a card on their smartphone).

Platforms have re-intermediated the publishing process, inserting themselves between the publisher and the audience.  Audiences are now more loyal to a channel that distributes content than they are loyal to the source creating the content.  They value the convenience of one-stop access to content.  Nonetheless, the role of publishers remains important.  Customer-centric content depends on publishers. To navigate these changes, publishers need to understand the benefit of unbundling content, and how it is done.

Content Unbundling, and playing well with others

Audience face a rich menu of choices for content. For most publishers, it is unrealistic to aspire to be the single best source of content, with the notable exception of when you are discussing your own organization and products.  Even in these cases, audiences will often be considering content from other organizations that will be in competition with your own content.

CNN’s view of different content platforms where their audiences may be spending time. Screenshot via Tow Center report on the Platform Press.

Single source publishing is best suited for captive audiences, when you know the audience is looking for something specific, from you specifically.  Enterprise content about technical specifications or financial results are good candidates for single source publishing.  Publishers face a more challenging task when seeking to participate in the larger “dialog” that the audience is having about a topic not “owned” by a brand.  For most topics, audiences consult many sources of information, and often discuss this information among themselves. Businesses rely on social media, for example, finding forums where different perspectives are discussed, and inserting teasers with links to articles.  But much content consumption happens outside of active social media discussions, where audiences explicitly express their interests.  Publishers need more robust ways to deliver relevant information when people are scanning content from multiple sources.

Consumers want all relevant content in one place. Publishers must decide where that one place might be for their audiences.  Sometimes consumers will look to topic-specific portals that aggregate perspectives from different sources.  Other times consumers will rely on generic content delivery platforms to gather preliminary information. Publishers need their content to be prepared for both scenarios.

To participate in multi-source publishing, publishers need to prepare their content so it can be used by others.  They need to follow the Golden Rule: make it easy for others to incorporate your content in other content.  Part of that task is technical: providing the technical foundation for sharing content between different organizations.  The other part of the task is shifting  perspective, by letting go of possessiveness about content, and fears of loss of control.

Rewards and Risks of Multi-source publishing

Multi-source content involves a different set of risks and rewards than when distributing content directly.  Publishers must answer two key questions:

  1. How can publishers maximize the use of their content across platforms? (Pursue rewards)
  2. What conditions, if any, do they want to place on that use? (Manage risks)

More fundamentally, why would publishers want other platforms to display their content?  The benefits are manifold.  Other platforms:

  • Can increase reach, since these platforms will often get more traffic than one’s own website, and will generally offer incrementally more views of one’s content
  • May have better authority on a topic, since they combine information from multiple sources
  • May have superior algorithms that understand the importance of different informational elements
  • Can make it easier to audiences to locate specific content of interest
  • May have better contextual or other data about audiences, which can be leveraged to provide more precise targeting.

In short, multi-source publishing can reduce the information hunting problem that audiences face. Publishers can increase the likelihood that their content will be seen at opportune moments.

Publishers have a choice about what content to limit sharing, and what content to make easy to share.  If left unmanaged, some of their content will be used by other parties regardless, and not necessarily in ways the publisher would like.  If actively managed, the publisher can facilitate the sharing of specific content, or actively discourage use of certain content by others. We will discuss the technical dimensions shortly.  First, let’s consider the strategic dimensions.

When deciding how to position their content with respect to third party publishing and distribution, publishers need to be clear on the ultimate purpose of their content.  Is the content primarily about a message intended to influence a behavior?  Is the content primarily about forming a relationship with an audience and measuring audience interests?  Or is the content intended to produce revenues through subscriptions or advertising?

Publishers will want to control access to revenue-producing content, to ensure they capture the subscription or advertising revenues of that content, and not allow the revenue value benefit a free-rider.  They want to avoid unmanaged content reuse.

In the other two cases, a more permissive access can make business sense.  Let’s call the first case the selective exposure of content highlights — for example, short tips that are related to the broader category of product you offer.  If the purpose of content is about forming a relationship, then it is important to attract interest in your perspectives, and demonstrate the brand’s expertise and helpfulness.  Some information and messages can be highlighted by third party platforms, and audiences can see that your brand is trying to be helpful.  Some of these viewers, who may not have been aware of your brand or website, may decide to click through to see the complete article.  Exposure through a platform to new audiences can be the start of new customer relationships.

The second case of promoted content relates to content about a brand, product or company. It might be a specification about a forthcoming product, a troubleshooting issue, or news about a store opening.  In cases where people are actively seeking out these details, or would be expected to want to be alerted to news about these issues, it makes sense to provide this information on whatever platform they are using directly.  Get their questions answered and keep them happy.  Don’t worry about trying to cross-sell them on viewing content about other things.  They know where to find your website if they need greater details.  The key metric to measure is customer satisfaction, not volume of articles read by customers. In this case, exposure through a platform to an existing audience can improve the customer relationship.

How to Enable Content to be Integrated Anywhere

Many pioneering examples of multi-source publishing, such as price comparison aggregators, job search websites, and Google’s featured snippets, have relied a brute-force method of mining content from other websites.  They crawl websites, looking for patterns in the content, and extract relevant information programatically.  Now, the rise of metadata standards for content, and their increased implementation by publishers, makes easier the task of assembling content derived from different sources.  Standards-based metadata can connect a publisher’s content to content elsewhere.

No one knows what new content distribution or aggregation platform will become the next Hipmunk or Flipboard.  But we can expect aggregation platforms will continue to evolve and expand.  Data on content consumption behavior (e.g., hours spent each week by website, channel and platform) indicates customers more and more favor consolidated and integrated content.  The technical effort needed to deliver content sourced from multiple websites is decreasing.  Platforms have a range of financial incentives to assemble content from other sources, including ad revenues, the development of comparative data metrics on customer interest in different products, and the opportunity to present complementary content about topics related to the content that’s being republished.  Provided your content is useful in some form to audiences, other parties will find opportunities to make money featuring your content.  Price comparison sites make money from vendors who pay for the privilege of appearing on their site.

To get in front of audiences as they browse content from different sources, a publisher needs to be able to merge content into their feed or stream, whether it is a timeline, a list of search results, or a series of recommendations that appear as audiences scroll down their screen.  Two options are available to facilitate content merging:

  1. Planned syndication
  2. Discoverable reuse

Planned Syndication

Publishers can syndicate their content, and plan how they want others to use it.  The integration of content between different  publishers can be either tightly coupled, or loosely coupled.  For publishers who follow a single sourcing process, such as DITA, it is possible to integrate their content with content from other publishers, provided the other publishers follow the same DITA approach.  Seth Earley, a leading expert on content metadata, describes a use case for syndication of content using DITA:

“Manufacturers of mobile devices work through carriers like Verizon who are the distribution channels.   Content from an engineering group can be syndicated through to support who can in turn syndicate their content through marketing and through distribution partners.  In other words, a change in product support or technical specifications or troubleshooting content can be pushed off through channels within hours through automated and semi-automated updates instead of days or weeks with manual conversions and refactoring of content.”

While such tightly coupled approaches can be effective, they aren’t flexible, as they require all partners to follow a common, publisher-defined content architecture.  A more flexible approach is available when publisher systems are decoupled, and content is exchanged via APIs.  Content integration via APIs embraces a very different philosophy than  the single sourcing approach.  APIs define chunks of content to exchange flexibly, whereas single-sourcing approaches like DITA define chunks more formally and rigidly. While APIs can accommodate a wide range of source content based on any content architecture, single sourcing only allows content that conforms to a publisher’s existing content architecture.  Developers are increasingly using flexible microservices to make content available to different parties and platforms.

In the API model, publishers can expand the reach of their content two ways.  They can submit their content to other parties, and/or permit other parties to access and use their content.  The precise content they exchange, and the conditions under which it is exchanged, is defined by the API.  Publishers can define their content idiosyncratically when using an API, but if they follow metadata standards, the API will be easier to adopt and use.  The use of metadata standards in APIs can reduce the amount of special API documentation required.

Discoverable Reuse

Many examples cited earlier involve the efforts of a single party, rather than the cooperation of two parties.  Platforms often acquire content from many sources without the active involvement of the original publishers.  When the original publisher of the content does not need to be involved with the reuse of their content, the content has the capacity to reach a wider audience, and be discovered in unplanned, serendipitous ways.

Aggregators and delivery platforms can bypass the original publisher two ways.  First, they can rely on crowdsourcing.  Audiences might submit content to the platform, such as Pinterest’s “pins”.  Users can pin images to Pinterest because these images contain Open Graph or schema.org metadata.

Second, platforms and aggregators can discover content algorithmically. Programs can crawl websites to find interesting content to extract.  Web scraping, which was once solely done by search engines such as Google, has become easier and more widely available, due to the emergence of services such as Import.IO.  Aided by advances in machine learning, some webscraping tools don’t require any coding at all, though to achieve greater precision requires some coding.  The content that is most easily discovered by crawlers is content described by metadata standards such as schema.org.  Tools can use simple Regex or XPath expressions to extract specific content that is defined by metadata .

Influencing Third-party Re-use

Publishers can benefit when other parties want to re-publish their content, but they will also want to influence how their content is used by others.   Whether they actively manage this process by creating or accessing an API, or they choose not to directly coordinate with other parties, publishers can influence how others use their content through various measures:

  • They can choose what content elements to describe with metadata, which facilitates use of that content elsewhere
  • They can assert their authorship and copyright ownership of the content using metadata, to ensure that appropriate credit is given to the original source
  • They can indicate, using metadata, any content licensing requirements.
  • For publishers using APIs, they can control access via API keys, and limit the usage allowed to a party
  • When the volume of re-use justifies, publishers can explore revenue sharing agreements with platforms, as newspapers are doing with Facebook.

Readers interested in these issues can consult my book, Metadata Basics for Web Content, for a discussion of rights and permissions metadata, which covers issues such as content attribution and licensing.

Where is Content Sourcing heading?

Digital web content in some ways is starting to resemble electronic dance music, where content gets “sampled” and “remixed” by others. The rise of content microservices, and of customer expectations for multi-sourced, integrated content experiences, are undermining the supremacy of the article as the defining unit of content.

For publishers accustomed being in control, the rise of multi-source publishing represents a “who moved my cheese” moment.  Publishers need to adapt to a changing reality that is uncertain and diffuse. Unlike the parable about cheese, publishers have choices about how they respond.  New opportunities also beckon. This area is still very fluid, and eludes any simple list of best practices to follow.  Publishers would be foolish, however, to ignore the many signals that collectively suggest a shift from individual websites and toward more integrated content destinations.  They need to engage with these trends to be able to capitalize on them effectively.

— Michael Andrews

Categories
Content Engineering

Your Content Needs a Metadata Strategy

What’s your metadata strategy?  So few web publishers have an articulated metadata strategy that a skeptic may think I’ve made up the concept, and coined a new buzzword.  Yet almost a decade ago, Kristina Halvorson explicitly cited metadata strategy as one of “a number of content-related disciplines that deserve their own definition” in her seminal  A List Apart article, “The Discipline of Content Strategy”.   She also cites metadata strategy in her widely read book on content strategy.  It’s been nearly a decade since Kristina’s article, but the discipline of content strategy still hasn’t given metadata strategy the attention it deserves.

A content strategy, to have a sustained impact, needs a metadata strategy to back it up.  Without metadata strategy, content strategy can get stuck in a firefighting mode.  Many organizations keep making the same mistakes with their content, because they ask overwhelmed staff to track too many variables.  Metadata can liberate staff from checklists, by allowing IT systems to handle low level details that are important, but exhausting to deal with.  Staff may come and go, and their enthusiasm can wax and wane.  But metadata, like the Energizer bunny, keeps performing: it can keep the larger strategy on track. Metadata can deliver consistency to content operations, and can enhance how content is delivered to audiences.

A metadata strategy is a plan for how a publisher can leverage metadata to accomplish specific content goals.  It articulates what metadata publishers need for their content, how they will create that metadata, and most importantly, how both the publisher and audiences can utilize the metadata.  When metadata is an afterthought, publishers end up with content strategies that can’t be implemented, or are implemented poorly.

The Vaporware Problem: When you can’t implement your Plan

A content strategy may include many big ideas, but translating those ideas into practice can be the hardest part.  A strategy will be difficult to execute when its documentation and details are too much for operational teams to absorb and follow.  The group designing the content strategy may have done a thorough analysis of what’s needed.  They identified goals and metrics, modeled how content needs to fit together, and considered workflows and the editorial lifecycle.  But large content teams, especially when geographically distributed, can face difficulties implementing the strategy.  Documentation, emails and committees are unreliable ways to coordinate content on a large scale.  Instead, key decisions should be embedded into the tools the team uses wherever possible.  When their tools have encoded relevant decisions, teams can focus on accomplishing their goals, instead of following rules and checklists.

In the software industry, vaporware is a product concept that’s been announced, but not built. Plans that can’t be implemented are vaporware. Content strategies are sometimes conceived with limited consideration of how to implement them consistently.  When executing a content strategy, metadata is where the rubber hits the road.  It’s a key ingredient for turning plans into reality.  But first, publishers need to have the right metadata in place before they can use it to support their broader goals.

Effective large-scale content governance is impossible without effective metadata, especially administrative metadata.  Without a metadata strategy, publishers tend to rely on what their existing content systems offer them, instead of asking first what they want from their systems.  Your existing system may provide only some of the key metadata attributes you need to coordinate and manage your content. That metadata may be in a proprietary format, meaning it can’t be used by other systems. The default settings offered by your vendors’ products are likely not to provide the coordination and flexibility required.

Consider all the important information about your content that needs to be supported with metadata.  You need to know details about the history of the content (when it was created, last revised, reused from elsewhere, or scheduled for removal), where the content came from (author, approvers, licensing rights for photos, or location information for video recordings), and goals for the content (intended audiences, themes, or channels).  Those are just some of the metadata attributes content systems can use to manage routine reporting, tracking, and routing tasks, so web teams can focus on tasks of higher value.

If you have grander visions for your content, such as making your content “intelligent”, then having a metadata strategy becomes even more important.  Countless vendors are hawking products that claim to add AI to content.  Just remember—  Metadata is what makes content intelligent: ready for applications (user decisions), algorithms (machine decisions) and  analytics (assessment).  Don’t buy new products without first having your own metadata strategy in place.  Otherwise you’ll likely be stuck with the vendor’s proprietary vision and roadmap, instead of your own.

Lack of Strategy creates Stovepipe Systems

A different problem arises when a publisher tries to do many things with its content, but does so in a piecemeal manner.  Perhaps a big bold vision for a content strategy, embodied in a PowerPoint deck, gets tossed over to the IT department.  Various IT members consider what systems are needed to support different functionality.  Unless there is a metadata strategy in place, each system is likely to operate according to its own rules:

  • Content structuring relies on proprietary templates
  • Content management relies on proprietary CMS data fields
  • SEO relies on meta tags
  • Recommendations rely on page views and tags
  • Analytics rely on page titles and URLs
  • Digital assets rely on proprietary tags
  • Internal search uses keywords and not metadata
  • Navigation uses a CMS-defined custom taxonomy or folder structure
  • Screen interaction relies on custom JSON
  • Backend data relies on a custom data model.

Sadly such uncoordinated labeling of content is quite common.

Without a metadata strategy, each area of functionality is considered as a separate system.  IT staff then focus on systems integration: trying to get different systems to talk to each other.  In reality, they have a collection of stovepipe systems, where metadata descriptions aren’t shared across systems.  That’s because various systems use proprietary or custom metadata, instead of using common, standards-based metadata.  Stovepipe systems lack a shared language that allows interoperability.  Attributes that are defined by your CMS or other vendor system are hostage to that system.

Proprietary metadata is far less valuable than standards-based metadata.  Proprietary metadata can’t be shared easily with other systems and is hard or impossible to migrate if you change systems.  Proprietary metadata is a sunk cost that’s expensive to maintain, rather than being an investment that will have value for years to come. Unlike standards-based metadata, proprietary metadata is brittle — new requirements can mess up an existing integration configuration.

Metadata standards are like an operating system for your content.  They allow content to be used, managed and tracked across different applications.  Metadata standards create an ecosystem for content.  Metadata strategy asks: What kind of ecosystem do you want, and how are you going to develop it, so that your content is ready for any task?

Who is doing Metadata Strategy right?

Let’s look at how two well-known organizations are doing metadata strategy.  One example is current and news-worthy, while the other has a long backstory.

eBay

eBay decided that the proprietary metadata they used in their content wasn’t working, as it was preventing them from leveraging metadata to deliver better experiences for their customers. They embarked on a major program called the “Structured Data Initiative”, migrating their content to metadata based on the W3C web standard, schema.org.   Wall Street analysts have been following eBay’s metadata strategy closely over the past year, as it is expected to improve the profitability of the ecommerce giant. The adoption of metadata standards has allowed for a “more personal and discovery-based buying experience with highly tailored choices and unique selection”, according to eBay.  eBay is leveraging the metadata to work with new AI technologies to deliver a personalized homepage to each of its customers.   It is also leveraging the metadata in its conversational commerce product, the eBay ShopBot, which connects with Facebook Messenger.  eBay’s experience shows that a company shouldn’t try to adopt AI without first having a metadata strategy.

eBay’s strategy for structured data (metadata). Screenshot via eBay

Significantly, eBay’s metadata strategy adopts the W3C schema.org standard for their internal content management, in addition to using it for search engine consumers such as Google and Bing.  Plenty of publishers use schema.org for search engine purposes, but few have taken the next step like eBay to use it as the basis of their content operations.  eBay is also well positioned to take advantage of any new third party services that can consume their metadata.

Australian Government

From the earliest days of online content, the Australian government has been concerned with how metadata can improve online content availability. The Australian government isn’t a single publisher, but comprises a federation of many government websites run by different government organizations.  The governance challenges are enormous.  Fortunately, metadata standards can help coordinate diverse activity.  The AGLS metadata standard has been in use nearly 20 years to classify services provided by different organizations within the Australian government.

The AGLS metadata strategy is unique in a couple of ways.  First, it adopts an existing standard and builds upon it.  The government identified areas where existing standards didn’t offer attributes that were needed.  The government adopted the widely used Dublin Core metadata standard, but added some additional elements that were specific to their needs (for example, indicating the “jurisdiction” that the content relates to).  Starting from an existing standard, they extended it and got the W3C to recognize their extension.

Second, the AGLS strategy addresses implementation at different levels in different ways.  The metadata standard allow different publishers to describe their content consistently.  It ensures all published content is inter-operable.  Individual publishers, such as the state government of Victoria, have their own government website principles and requirements, but these mandate the use of the AGLS metadata standard.  The common standard has also promoted the availability of tools to implement the standard.  For example, Drupal, which is widely used for government websites in Australia, has a plugin that provides support for adding the metadata to content.  Currently, over 700 sites use the plugin.  But significantly, because AGLS is an open standard, it can work with any CMS, not just Drupal.  I’ve also seen a plugin for Joomla.

Australia’s example shows how content metadata isn’t an afterthought, but is a core part of content publishing.  A well-considered metadata strategy can provide benefits for many years.  Given its long history, AGLS is sure to continue to evolve to address new requirements.

Strategy focuses on the Value Metadata can offer

Occasionally, I encounter someone who warns of the “dangers” of “too much” metadata.  When I try to uncover the source of the perceived concern, I learn that the person thinks about metadata as a labor-intensive activity. They imagine they need to hand-create the metadata serially.  They think that metadata exists so they can hunt and search for specific documents. This sort of thinking is dated but still quite common.  It reflects how librarians and database administrators approached metadata in the past, as a tedious form of record keeping.  The purpose of metadata has evolved far beyond record keeping.  Metadata no longer is primarily about “findability,” powered by clicking labels and typing within form fields. It is now more about “discovery” — revealing relevant information through automation.  Leveraging metadata depends on understanding the range of uses for it.

When someone complains about too much metadata, it also signals to me that a metadata strategy is missing.  In many organizations, metadata is relegated to being an electronic checklist, instead of positioned as a valuable tool.   When that’s the case, metadata can seem overwhelming.  Organizations can have too much metadata when:

  • Too much of their metadata is incompatible, because different systems define content in different ways
  • Too much metadata is used for a single purpose, instead of serving multiple purposes.

Siloed thinking about metadata results in stovepipe systems. New metadata fields are created to address narrow needs, such as tracking or locating items for specific purposes.  Fields proliferate across various systems.  And everyone is confused how anything relates to anything else.

Strategic thinking about metadata considers how metadata can serve all the needs of the publisher, not just the needs of an individual team member or role.  When teams work together to develop requirements, they can discuss what metadata is useful for different purposes. They can identify how a single metadata item can be in different contexts.  If the metadata describes when an item was last updated, the team might consider how that metadata might be used in different contexts.  How might it be used by content creators, by the analytics team, by the UX design team, and by the product manager?

Publishers should ask themselves how they can do more for their customers by using metadata.  They need to think about the productivity of their metadata: making specific metadata descriptions do more things that can add value to the content.  And they need a strategy to make that happen.

— Michael Andrews