Categories
Content Experience

What is Content Design?

The growing interest in content design is a welcome development.  Such interest recognizes that content decisions can’t be separated from the context in which the content will be used.  Consideration of content design corrects two common misperceptions: the notion that content presentation is simply visual styling, and the belief that because content may need to exist in many contexts, the context in which content is displayed becomes irrelevant.  Direct collaboration between writers and UI designers is now encouraged.  Content must fit the design where it appears — and conversely, UI designs must support the content displayed.  Content has no impact independently of a container or interaction platform for which it has been designed, and is being relied upon by users.  Content depends on context.  And context frames the content experience.

Yet content design is more than a collaborative attitude. What content design actually entails is still not well understood. Content design requires all involved to consider how different elements should work together as a system.

“Content and Design Are Inseparable Work Partners”  — Jared Spool

Current Definitions of Content Design

There is no single accepted definition of content design.  Two meanings are in use, both of which are incomplete.

The first emphasizes layout and UI decisions relating to the presentation of content.  It looks at such questions as will the text fit on the screen, or how to show and hide information.  The layout perspective of content design is sometimes referred to as the application of content patterns.

The second, popularized by the Government Digital Service (GDS) in Britain, focuses on whether the words being presented in an article support the tasks that users are trying to accomplish.  The GDS instructs: “know your users’ needs and design your content around them” and talks about “designing by writing great content.”  The GDS’ emphasis on words reflects the fixed character of their content types —a stock of 40 formats.  These structures provide ready templates for inserting content, but don’t give content creators a voice in how or what to present apart from wording.

Content design encompasses much more than wording and layout.

The design of content, including printed media, has always involved layout and wording, and the interaction between the two. Comprehensive content design today goes further by considering behavior: the behavior of the content, and the behavior of users interacting with the content.  It designs content as a dynamic resource.  It evaluates and positions content within a stream of continuous interaction.

Most discussion of content design approaches content from a “one size fits all” perspective.  What’s missing in current discussions is how to design content that can serve multiple needs.  User needs are neither fixed, nor uniform.  Designs must be able to accommodate diverse needs.  Formulaic templates generally fall short of doing this.  Content must be supported by structures that are sophisticated enough to accommodate different scenarios of use.

Breaking Free from the Static Content Paradigm

Content creators typically think about content in terms of topics.  Topics are monolithic.  They are meant to be solid: to provide the answers to questions the audience has. In an ideal scenario, the content presented on the topic perfectly matches the goals of the audience.

The problem with topics is that they too often reflect a publisher-centric view of the world.  Publishers know people are seeking information about certain topics — their web logs tell them this.  They know the key information they need to provide on the topic.  They strive to provide succinct answers relating to the topic.  But they don’t consider the wide variation of user needs relating to the topic.  They can’t imagine that numerous people all reading the same content might want slightly different things.

Consider the many factors that can influence what people want and expect from content:

  • Their path of arrival — where they have come from and what they’ve seen already
  • Their prior knowledge of the topic
  • Their goals or motivations that brought them to the content
  • The potential actions they might want to take after they’ve seen the content

Some people are viewing the content to metaphorically “kick the tires,” while others approach the content motivated to take action.  Some people will choose to take action after seeing the content, but others will defer action.  People may visit the content with one goal, and after viewing the content have a different goal. Regardless of the intended purpose of the content, people are prone to redefine their goals, because their decisions always involve more than what is presented on the screen.

In the future, content might be able to adjust automatically to accommodate differences in user familiarity and intent.  Until that day arrives (if it ever does), creators of content need to produce content that addresses a multitude of users with slightly varying needs.  This marks the essence of content design: to create units of content that can address diverse needs successfully.

A common example of content involving diverse needs relates to product comparison.  Many people share a common task of comparing similar products.  But they may differ in what precisely they are most interested in:

  • What’s available?
  • What’s best?
  • What are the tradeoffs between products?
  • What options are available?
  • How to configure product options and prices?
  • How to save options for use later?
  • How to buy a specific configuration?

A single item of content providing a product comparison may need to support many different purposes, and accommodate people with different knowledge and interests.  That is the challenge of content design.

Aspects of Content Design

How does one create content structures that respond to the diverse needs of users in different scenarios? Content design needs to think beyond words and static informational elements.  When designs include features and dynamic information, content can accomplish more.  The goal is to build choice into the content, so that different people can take away different information from the same item of content.

Design of Content Features

A feature in content is any structural element of the content that is generated by code.  Much template-driven content, in contrast, renders the structure fixed, and makes the representation static.  Content features can make content more “app-like” — exhibiting behaviors such as updating automatically, and offering interactivity.  Designing content features involves asking how functionality can change the representation of content to deliver additional value to audiences and the business.  Features can provide for different views of content, with different levels of detail or different perspectives.

Consider a simple content design decision: should certain information be presented as a list, in a table, or as a graph?  Each of these options are structures.  The same content can be presented in all three structures.  Each structure has benefits. Graphs are easy to scan, tables allow more exact information, while lists are better for screen readers.  The “right” choice may depend on the expected context of use — assuming only one exists.  But it is also possible that the same content could be delivered in all three structures, which could be used by different users in different contexts.

Design of Data-driven Information

Many content features depend on data-driven information.  Instead of considering content as static — only reflecting what was known at the time it was published — content can be designed to incorporate information about activities related to the content that have happened after publication of the article.

Algorithmically-generated information is increasingly common.  A major goal is to harvest behavioral data that might be informative to audiences, and use that data to manage and prioritize the display of information.  Doing this successfully requires the designer to think in terms of a system of inter-relationships between activities, needs, and behavioral scenarios.

Features and data can be tools to solve problems that words and layout alone can’t address.  Both these aspects involve loosening the control over what the audience sees and notices.  Features and data can enrich the content experience.  They can provide different points of interest, so that different people can choose to focus on what elements of information interest them the most.   Features and data can make the content more flexible in supporting various goals by offering users more choice.

Content Design in the Wild

Real-world examples provide the best way to see the possibilities of content design, and the challenges involved.

Amazon is famous for both the depth of its product information, and its use of data.  Product reviews on Amazon are sometimes vital to the success of a product.  Many people read Amazon product reviews, even if they’ve no intention of buying the product from Amazon.  And people who have not bought the product are allowed to leave reviews, and often do.

Amazon’s product reviews illustrate different aspects of content design.  The reviews are enriched with various features and data that let people scan and filter the content according to their priorities.  But simply adding features and data does not automatically result in a good design.

Below is a recent screenshot of reviews for a book on Amazon.  It illustrates some of the many layers of information available.  There are ratings of books, comments on the books, identification of the reviewers, and reactions to the ratings.  Seemingly, everything that might be useful has been included.

Design of product review information for a book
Design of product review information for a book

The design is sophisticated on many levels. But instead of providing clear answers for users trying to evaluate the suitability of a book, the design raises various questions.  Consider the information conveyed:

  • The book attracted three reviews
  • All three reviewers rated the book highly, either four or five stars
  • All the reviewers left a short comment
  • Some 19 people provided feedback on the reviews
  • Only one person found the reviews helpful; the other 18 found the reviews unhelpful

Perhaps the most puzzling element is the heading: “Most Helpful Customer Reviews.”  Clearly people did not find the reviews helpful, but the page indicates the opposite.

This example illustrates some important aspects of content design.  First, different elements of content can be inter-dependent.  The heading depends on the feedback on reviews, and the feedback on reviews depend on the reviews themselves. Second, because the content is dynamic, what gets displayed is subject to a wide range of inputs that can change over time.  Whether what’s display makes sense to audiences will depend on the design’s capacity to adapt to different scenarios in a meaningful way. Content design depends on a system of interactions.

Content Design as Problem Solving

Content design is most effective when treated as the exploration of user problems, rather than as the fulfillment of user tasks.  Amazon’s design checks the box in terms of providing information that can be consulted as part of a purchase decision.  A purely functional perspective would break tasks into user stories: “Customer reads reviews”, etc.  But tasks have a tendency to make the content interaction too generic. The design exploration needs to come before writing the stories, rather than the reverse. The design needs to consider various problems the user may encounter.  Clearly the example we are critiquing did not consider all these possibilities.

An examination of the content as presented in the design suggests the source of problems readers of the reviews encountered.  They did not find the comments helpful.  The comments are short, and vague as to what would justify the high rating.  A likely reason the comments are vague is that the purchasers of the product were not the true endusers of the product, so they refrained from evaluating the qualities of the product, and commented on their purchase experience instead.  The algorithms that prioritize the reviews don’t have a meaningful subroutine for dealing with cases where all the reviews are rated as unhelpful.

 Critique as the Exploration of Questions

Critiquing the design of content allows content creators to consider the interaction of content as seen from the audience perspective.  As different scenarios are applied to various content elements, the critique can ask more fundamental questions about audience expectations, and in so doing, reconsider design assumptions.

Suppose we shift the discussion away from the minutiae of screen elements to consider the people involved.  The issue is not necessarily whether a specific book is sold.  The lifetime value of customers shopping on Amazon is far more important.  And here, the content design is failing in a big way.

Customers want to know if a book, which they can’t look at physically or in extensive detail, is really what they want to purchase.  Amazon counts on customers to give other customers confidence that what they purchase is what they want.  Returned merchandise is a lose-lose proposition for everyone.  Most customers who leave reviews do so voluntarily, without direct benefit — that is what makes their reviews credible.   So we have buyers of a book altruistically offering their opinion about the product.  They have taken the trouble to log-in and provide a review, with the expectation the review will be published, and the hope it will be helpful to others.  Instead, potential buyers of the book are dinging the reviews.  The people who have volunteered their time to help others are being criticized, while people who are interested in buying the book are unhappy they can’t get reliable information.  Through poor content design, Amazon is alienating two important customer constituencies at once: loyal customers who provide reviews on which Amazon depends, and potential buyers considering a product.

How did this happen, and how can it be fixed?  Amazon has talented employees, and unrivaled data analytics.  Despite those enviable resources, the design of the product review information nonetheless has issues.  Issues of this sort don’t lend themselves to A/B testing, or quick fixes, because of the interdependencies involved.  One could deploy a quick fix such as changing the heading if no helpful reviews exist, but the core problems would remain.  Indeed, the tendency in agile IT practices to apply incremental changes to designs is often a source of content design problems, rather than a means of resolving them.  Such patchwork changes mean that elements are considered in isolation, rather than as part of a system involving interdependencies.

Many sophisticated content designs such as the product review pages evolve over time.  No one person is directing the design: different people work on the design at different stages, sometimes over the course of years.  Paradoxically, even though the process is trumpeted as being agile, it can emulate some of the worst aspects of a “design by committee” approach where everyone leaves their fingerprints on the design, but no holistic concept is maintained.

News reports indicate Amazon has been concerned with managing review contributors.  Amazon wants to attract known reviewers, and has instituted a program called Vine that provides incentives to approved reviewers.  At the same time, it wants to discourage reviewers who are paid by outside parties, and has sued people it believes provide fake reviews.  To address the issue of review veracity, reviews use badges indicating the reviewer’s status as being a verified purchaser, a top reviewer, or a Vine reviewer.  The feedback concerning whether a review is helpful is probably also linked to goals of being able to distinguish real reviews from fake ones.  It would appear that the issue of preventing fake reviews has become conflated with the issue of providing helpful reviews, when in reality they are separate issues.  The example clearly shows that real reviews are not necessarily helpful reviews.

The content design should support valid business goals, but it needs to make sure that doing so doesn’t work at cross-purposes with the goals of audiences using the design.  Letting customers criticize other customers may support the management of review content, but in some cases it may do so at the cost of customer satisfaction.

A critique of the design also brings into focus the fact that the review content involves two distinct user segments: the readers of reviews, and the writers of reviews.  The behavior of each affects the other.  The success of the content depends on meeting the needs of both.

The design must look beyond the stated problem of how to present review information.  It must also solve second-order problems.  How to encourage useful reviews?  What to do when there are no useful reviews?  Many critical design issues may be lurking behind the assumptions of the “happy path” scenario.

Re-examining Assumptions

A comprehensive content design process keeps in mind the full range of (sometimes competing) goals the design needs to fulfill, and the range of scenarios in which the design must accommodate.  From these vantage points, it can test assumptions about how a design solution performs in different situations and against different objectives.

When applied to the example of product reviews, different vantage points raise different core questions.   Let’s focus on the issue of encouraging helpful reviews, given its pivotal leverage.  The issue involves many dimensions.

Who is the audience for the reviews: other customers, or the seller or maker of the product?  Who do the reviewers imagine is seeing their content, and what do they imagine is being done with that information?  What are the expectations of reviewers, and how can the content be designed to match their expectations — or to reset them?

What are the reviewers supposed to be rating?  Are they rating the product, or rating Amazon?  When the product is flawed, who does the reviewer hold accountable, and is that communicated clearly?  Do raters or readers of ratings want finer distinctions, or not?  How does the content design influence these expectations?

What do the providers of feedback on reviews expect will be done with their feedback?  Do they expect it to be used by Amazon, by other customers, or be seen and considered by the reviewer evaluated?  How does the content design communicate these dimensions?

What is a helpful review, according to available evidence?  What do customers believe is a helpful review?  Is “most helpful” the best metric?  Suppose long reviews are more likely to be considered helpful reviews. Is “most detailed” a better way to rank reviews?

What kinds of detail are expected in the review comments?  What kinds of statements do people object to?  How does the content design impact the quality of the comments?

What information is not being presented?  Should Amazon include information about number of returns?  Should people returning items provide comments that show up in the product reviews?

There are of course many more questions that could be posed.  The current design reflects a comment moderation structure, complete with a “report abuse” link.  The policing of comments, and voting on reviews hits on extrinsic motivators — people seeking status from positive feedback, or skirting negative feedback. But it doesn’t do much to address intrinsic motivators to participate and contribute.  A fun exercise to shift perspective would be to try imagining how to design the reviews to rank-order them according to their sincerity. Because people can be so different in what they seek in product information, it is always valuable to ask what different people care about most, and never to assume to know the answer to that with certainty.

Designing Experiences, Not Tasks

Tasks are a starting point for thinking about content design, but are not sufficient for developing a successful design.  Tasks tend to simplify activities, without giving sufficient attention to contextual issues or alternative scenarios.  A task-orientation tends to make assumptions about user motivations to do things.

Content design is stronger when content is considered experientially.  Trust is a vitally important factor for content, but it is difficult to reduce into a task.  Part of what makes trust so hard is that it is subjective.  Different people value different factors when assessing trustworthiness — rationality or emotiveness, thoroughness or clarity.  For that reason, content designs often need to provide a range of information and detail.

Designing for experiences frees us from thinking about user content needs as being uniform. Instead of focusing only on what people are doing (or what we want them to be doing), the experiential perspective focuses on why people may want to take action, or not.

People expect a richly-layered content experience, able to meet varied and changing needs.  Delivering this vision entails creating a dynamic ecosystem that provides the right kinds of details. The details must be coordinated so that they are meaningful in combination. Content becomes a living entity, powered by many inputs.  Dynamic content, properly designed, can provide people with positive and confidence-inducing experiences. Unless people feel comfortable with the information they view, they are reluctant to take action.  Experience may seem intangible, and thus inconsequential.  But the content experience has real-world consequences: it impacts behavior.

— Michael Andrews

Categories
Intelligent Content

Data Types and Data Action

We often think about content from a narrative perspective, and tend to overlook the important roles that data play for content consumers. Specific names or numeric figures often carry the greatest meaning for readers. Such specific factual information is data. It should be described in a way that lets people use the data effectively.

Not all data is equally useful; what matters is our ability to act on data. Some data allows you to do many different things with it, while other data is more limited. The stuff one can do with types of data is sometimes described as the computational affordances of data, or as data affordances.

The concept of affordances comes from the field of ecological psychology, and was popularized by the user experience guru Donald Norman. An affordance is a signal encoded in the appearance of an object that suggests how it can be used and what actions are possible. A door handle may suggest that is should be pushed, pulled or turned, for example. Similarly, with content we need to be able to recognize the characteristics of an item of data, to understand how it can be used.

Data types and affordances

The postal code is an important data type in many countries. Why is it so important? What can you do with a postal code? How people use postal codes provides a good illustration of data affordances in action.

Data affordances can be considered in terms of their purpose-depth, and purpose-scope, according to Luciano Floridi of the Oxford Internet Institute. Purpose-depth relates to how well the data serves its intended purpose. Purpose-scope relates to how readily the data can be repurposed for other uses. Both characteristics influence how we perceive the value of the data.

A postal code is a simplified representation of a location composed of households. Floridi notes that postal codes were developed to optimize the delivery of mail, but subsequently were adopted by other actors for other purposes, such as to allocate public spending, or calculate insurance premiums.

He states: “Ideally, high quality information… is optimally fit for the specific purpose/s for which it is elaborated (purpose–depth) and is also easily re-usable for new purpose/s (purpose–scope). However, as in the case of a tool, sometimes the better [that] some information fits its original purpose, the less likely it seems to be repurposable, and vice versa.” In short, we don’t want data to be too vague or imprecise, and we also want the data to have many ways it can be used.

Imagine if all data were simple text. That would limit what one could do with that data. Defining data types is one way that data can work harder for specific purposes, and become more desirable in various contexts.

A data type determines how an item is formatted and what values are allowed. The concept will be familiar to anyone who works with Excel spreadsheets, and notices how Excel needs to know what kind of value a cell contains.

In computer programming, data types tell a program how to assess and act on variables. Many data types relate to issues of little concern to content strategy, such as various numeric types that impact the speed and precision of calculations. However, there is a rich range of data types that provide useful information and functionality to audiences. People make decisions based on data, and how that data is characterized influences how easily they can make decisions and complete tasks.

Here are some generic data types that can be useful for audiences, each of which has different affordances:

  • Boolean (true or false)
  • Code (showing computer code to a reader, such as within the HTML code tags)
  • Currency (monetary cost or value denominated in a currency)
  • Date
  • Email address
  • Geographic coordinate
  • Number
  • Quantity (a number plus a unit type, such as 25 kilometers)
  • Record (an identifier composed of compound properties, such as 13th president of a country)
  • Telephone number
  • Temperature (similar to quantity)
  • Text – controlled vocabulary (such as the limited ranged of values available in a drop down menu)
  • Text – variable length free text
  • Time duration (number of minutes, not necessarily tied to a specific date)
  • URI or URN (authoritative resource identifier belonging to a specific namespace, such as an ISBN number)
  • URL (webpage)

Not all content management systems will provide structure for these data types out of the box, but most should be supportable with some customization. I have adapted the above list from the listing of data types supported by Semantic MediaWiki, a widely used open source wiki, and the data types common in SQL databases.

By having distinct data types with unique affordances, publishers and audiences can do more with content. The ways people can act on data are many:

  • Filter by relevant criteria: Content might use geolocation data to present a telephone number in the reader’s region
  • Start an action: Readers can click-to-call telephone numbers that conform to an international standard format
  • Sort and rank: Various data types can be used to sort items or rank them
  • Average: When using controlled vocabularies in text, the number of items with a given value can be counted or averaged
  • Sum together: Content containing quantities can be summed: for example, recipe apps allow users to add together common ingredients from different dishes to determine the total amount of an ingredient required for a meal
  • Convert: A temperature can be converted into different units depending on the reader’s preference

The choice of data type should be based on what your organization wants to do with the content, and what your audience might want to do with it. It is possible to reduce most character-based data to either a string or a number, but such simplification will reduce the range of actions possible.

Data verses Metadata

The boundary between data and metadata is often blurry. Data associated with both metadata and the content body-field have important affordances. Metadata and data together describe things mentioned within or about the content. We can act on data in the content itself, as well as act on data within metadata framing the content.

Historically, structural metadata outside the content played a prominent role indicating the organization of the content that implied what the content was about. Increasingly, meaning is being embedded with semantic markup within the content itself, and structural metadata surrounding the content may be limited. A news article may no longer indicate a location in its dateline, but may have the story location marked up within the article that is referenced by content elsewhere.

Administrative metadata, often generated by a computer and traditionally intended for internal use, may have value to audiences. Consider the humble date stamp, indicating when an article was published. By seeing a list of most recent articles, audiences can tell what’s new and what that content is about, without necessarily viewing the content itself.

Van Hooland and Verborgh ask in their recent book on linked data: “[W]here to draw the line between data and metadata. The short answer is you cannot. It is the context of the use which decides whether to considered data as metadata or not. You should also not forget that one of the basic characteristics of metadata: they are ever extensible …you can always add another layer of metadata to describe your metadata.” They point out that annotations, such as reviews of products, become content that can itself be summarized and described by other data. The number of stars a reviewer gives a product, is aggregated with the feedback of other reviewers, to produce an average rating, which is metadata about both the product and the individual reviews on which it is based.

Arguably, the rise of social interaction with nearly all facets of content merits an expansion of metadata concepts. By convention, information standards divide metadata into three categories: structural metadata, administrative metadata and descriptive metadata. But one academic body suggests a fourth type of metadata they call “use metadata,” defined as “metadata collected from or about the users themselves (e.g., user annotations, number of people accessing a particular resource).” Such metadata would blend elements of administrative and descriptive metadata relating to readers, rather than authors.

Open Data and Open Metadata

Open data is another data dimension of interest to content strategy. Often people assume open data refers to numeric data, but it is more helpful to think of open data as the re-use of facts.

Open data offers a rich range of affordances, including the ability to discover and use other people’s data, and the ability to make your data discoverable and available to others. Because of this emphasis on the exchange of data, how this data is described and specified is important. In particular, transparency and use rights issues with open data are a key concern, as administrative metadata in open data is a weakness.

Unfortunately, discussion of open data often focuses on the technical accessibility of data to systems, rather than the utility of data to end-users. There is an emphasis on data formats, but not on vocabularies to describe the data. Open data promotes the use of open formats that are non-proprietary. While important, this focus misses the criticality of having shared understandings of what the data represents.

To the content strategist, the absence of guidelines for metadata standards is a shortcoming in the open data agenda. This problem was recognized in a recent editorial in the Semantic Web Journal entitled “Five Stars of Linked Data Vocabulary Use.” Its authors note: “When working with data providers and software engineers, we often observe that they prefer to have control over their local vocabulary instead of importing a wide variety of (often under-specified, not regularly maintained) external vocabularies.” In other words, because there is not a commonly agreed and used metadata standard, people rely on proprietary ones instead, even when they publish their data openly, which has the effect of limiting the value of that data. They propose a series of criteria to encourage the publication of metadata about vocabulary used to describe data, and the provision of linkages between different vocabularies used.

Classifying Openness

Whether data is truly open depends on how freely available the data is, and whether the metadata vocabulary (markup) used to describe it is transparent. In contrast to the Open Data Five Star frameworks, I view how proprietary the data is as a decisive consideration. Data can be either open or proprietary, and the metadata used to describe the data can be based either on an open or proprietary standard. Not all data that is described as “Open” is in fact non-proprietary.

What is proprietary? For data and metadata, the criteria for what is non-proprietary can be ambiguous, unlike with creative content, where the creative commons framework governs rights for use and modifications. Modification of data and its metadata is of less concern, since such modifications can destroy the re-use value of the content. Practicality of data use and metadata visibility are the central concerns. To untangle various issues, I will present a tentative framework, recognizing that some distinctions are difficult to make. How proprietary data and metadata is often reflects how much control the body responsible for this information exerts. Generally, data and metadata standards that are collectively managed are more open than those managed by a single firm.

Data

We can grade data into three degrees, based on how much control is applied to its use:

  1. Freely available open data
  2. Published but copyrighted data
  3. Selectively disclosed data

Three criteria are relevant:

  1. Is all the data published?
  2. Does a user need to request specific data?
  3. Are there limits on how the data can be used?

If factual data is embedded within other content (for example, using RDFa markup within articles), it is possible that only the data is freely available to re-use, while the contextual content is not freely available to re-use. Factual data cannot be copyrighted in the United States, but may under certain conditions be subject to protection in the EU when a significant investment was made collecting these facts.

Rights management and rights clearance for open data are areas of ongoing (if inconclusive) deliberation among commercial and fee-funded organizations. The BBC is an organization that contributes open data for wider community use, but that generally retains the copyright on their content. More and more organizations are making their data discoverable by adopting open metadata standards, but the extent to which they sanction the re-use of that data for purposes different from it’s original intention is not always clear. In many cases, everyday practices concerning data re-use are evolving ahead of official policies defining what is permitted and not permitted.

Metadata

Metadata is either open or proprietary. Open metadata is when the structure and vocabulary that describes the data is fully published, and is available for anyone to use for their own purposes. The metadata is intended to be a standard that can be used by anyone. Ideally, they have the ability to link their own data using this metadata vocabulary to data sets elsewhere. This ability to link one’s own data distinguishes it from proprietary metadata standards.

Proprietary metadata is one where the schema is not published or is only partially published, or where the metadata restricts a person’s ability to define their own data using the vocabulary.

Examples

Freely Available Open Data

  • With Open Metadata. Open data published using a publicly available, non-proprietary markup. There are many standards organizations that are creating open metadata vocabularies. Examples include public content marked up in Schema.org, and NewsML. These are publicly available standards without restrictions on use. Some standards bodies have closed participation: Google, Yahoo, and Bing decide what vocabulary to include in Schema, for example.
  • With Proprietary Metadata. It may seem odd to publish your data openly but use proprietary markup. However, organizations may choose to use a proprietary markup if they feel a good public one is not available. Non-profit organizations might use OpenCalais, a markup service available for free, which is maintained by Reuters. Much of this markup is based on open standards, but it also uses identifiers that are specific to Reuters.

Published But Copyrighted Data

  • With Open Metadata. This situation is common with organizations that make their content available through a public API. They publish the vocabularies used to describe the data and may use common standards, but they maintain the rights to the content. Anyone wishing to use the content must agree to the terms of use for the content. An example would be NPR’s API.
  • With Proprietary Metadata. Many organizations publish content using proprietary markup to describe their data. This situation encourages web-scraping by others to unlock the data. Sometimes publishers may make their content available through an API, but they retain control over the metadata itself. Amazon’s ASIN product metadata would be an example: other parties must rely on Amazon to supply this number.

Selectively Disclosed Proprietary Data

  • With Open Metadata. Just because a firm uses a data vocabulary that’s been published and is available for others to use, it doesn’t mean that such firms are willing to share their own data. Many firms use metadata standards because it is easier and cheaper to do so, compared with developing their own. In the case of Facebook, they have published their Open Graph schema to encourage others to use it so that content can be read by Facebook applications. But Facebook retains control over the actual data generated by the markup.
  • With Proprietary Metadata. Applies to any situation where firms have limited or no incentive to share data. Customer data is often in this category.

Taking Action on Data

Try to do more with the data in your content. Think about how to enable audiences to take actions on the data, or how to have your systems take actions to spare your audiences unnecessary effort. Data needs to be designed, just like other elements of content. Making this investment will allow your organization to reuse the data in more contexts.

— Michael Andrews