Categories
Intelligent Content

A Visual Approach to Learning Schema.org Metadata

Everyone involved with publishing web content, whether a writer, designer, or developer, should understand how  metadata can describe content. Unfortunately, web metadata has a reputation, not entirely undeserved, for being a beast to understand. My book, Metadata Basics for Web Content, explains the core concepts of metadata. This post is for those ready to take the next step: to understand how a metadata standard relates to their specific content.

Visualizing Metadata

How can web teams make sense of voluminous and complex metadata documentation?  Documentation about web metadata is generally written from a developer perspective, and can be hard for non-techies to comprehend. When relying on detailed documentation, it can be difficult for the entire web team to have a shared understanding of what metadata is available.  Without such a shared understanding, teams can’t have a meaningful discussion of what metadata to use in their content, and how to take advantage of it to support their content goals.

The good news is that metadata can be visualized.  I want to show how anyone can do this, with specific reference to schema.org, the most important web metadata standard today. The technique can be useful not only for content and design team members who lack a technical background, but also for developers.

Everyone who works with a complex metadata standard such as schema.org faces common challenges:

  1. A large and growing volume of entities and properties to be aware of
  2. Cases where entities and properties sometimes have overlapping roles that may not be immediately apparent
  3. Terminology that can be misunderstood unless the context is comprehended correctly
  4. The prevalence of many horizontal linkages between entities and properties, making navigation through documentation a pogo-like experience.

First, team members need to understand what kinds of things associated with their content can be described by a metadata standard.  Things mentioned in content are called entities.  Entities have properties.  Properties describe values, or  they express the relationship of one entity to another.

Entities are classified according to types, which range from general to specific.  Entity types form a hierarchy that can be expressed as a tree.  All entities derive from the parent entity, called Thing.  Currently, schema.org has over 600 entity types.  Dan Brickley, an engineer at Google who is instrumental in the development of schema.org, has helpfully developed an interactive visualization in D3 (a Javascript library for data visualization), presented as a radial tree, which shows the distribution of entity types within schema.org.  The tool is a helpful way to explore the scope of entities addressed, and the different levels of granularity available.

Screenshot of entity tree, available at http://bl.ocks.org/danbri/raw/1c121ea8bd2189cf411c/

D3 is a great visualization library, but it requires both knowledge and time to code.  For our  second kind of visualization, we’ll rely on a much simpler tool.

Graphs of Linked Data

Web metadata can connect or link different items of information together, forming a graph of knowledge.  Graphs are ideal to visualize.  By visualizing this structure, content teams can see how entities have properties that relate to other entities, or that have different kinds of values.  This kind of visualization is known as a concept map.

Let’s visualize a common topic for web content: product information.  Many things can be said about a product: who is it from, what is like, and how much it costs.  I’ve created the below graph using an affordable and easy-to-use concept mapping app called Conceptorium (though other graphic tools can be used).  Working from the schema.org documentation for products, I’ve identified some common properties and relationships for products.  Entities (things described with metadata) are in green boxes, while literal values (data you might see about them) are in salmon colored boxes.  Properties (attributes or qualities of things) are represented by lines with arrows, with the name of the property next to the line.

Concept map of schema.org entities and properties related to products

The graph illustrates some key issues in schema.org that web teams need to understand:

  • The boundary between different entity types that address similar properties
  • The difference between different instances of the same entity type
  • The directional relationships of properties.

Entity Boundaries

Concept maps help us see the boundaries between related entity types.  A product, shown in the center of our graph, has various properties, such as a name, a color, and an average user rating (AggregateRating).  But when the product is offered for sale, properties associated with the conditions of sale need to be expressed through the Offer entity.  So in schema.org, we can see that products don’t have prices or warranties; offers have prices or warranties.  Schema.org allows publishers to express an offer without providing granular details about a product.  Publishers can note the name and product code (referred to as gtin14) in the offer together with the price, and not need to use the Product entity type at all.  The Offer and Product entity types both use the name and product code (gtin14) properties.   So when discussing a product, the team needs to decide if the content is mostly about the terms of sale (the Offer), or about the features of the product (the Product), or both.

Instances and Entity Types

Concept maps help us distinguish different instances of entities, as well as cases where instances are performing different roles. From the graph, we can see that a product can be related to other products.  This can be hard to grasp in the documentation, where an entity type is presented as both the subject and the object of various properties.  Graphs can show how there can be different product instances that may have different values for the same properties (e.g., all products have a name, but each product has a different name).  In our example, we can see that on product at the bottom right is a competitive product to the product in the center.  We can compare the average rating of the competitor product with the average ratings of the main product.  We can also see another related product, which is an accessory for the main product.  This relationship can help identify products to display as complements.

An entity type provides a list of properties available to describe something.  Web content may discuss numerous, related things that all belong to the same entity type.  In our example, we see several instances of the Organization entity type.  In one case, an organization owns a product (perhaps a tractor).  In another case, the Organization is a seller.  In a third case, the Organization is a manufacturer of the product. Organizations can have different roles relating to an entity.

Content teams need to identify in their metadata which Organizations are responsible for which role.  Is the seller the manufacturer of the product, or are two different Organizations involved?  Our example illustrates how a single Person can be both an owner and a seller of a Product.

What Properties Mean

Concept maps can help web teams see what properties really represent.  Each line with an arrow has a label, which is the name of the property associated with an entity type.  Properties have a direction, indicated by the arrow.  The names of properties don’t always directly translate into an English verb, even when they at first appear to.  For example, in English, Product > manufacturer > Organization doesn’t make much sense. The product doesn’t make the organization, but rather the organization manufactures the product.  It’s important to pay attention to the direction of a property: what entity type is expected — especially when these relationships seem inverted to how we think about them normally.

Many properties are adjectives or even nouns, and need helper verbs such as “has” to make sense.  If the property describes another entity, then that entity can involve many more properties to describe additional dimensions of that entity.  So we might say that “a Product has a manufacturer which is an Organization (having a name, address, etc.)”  That’s not very elegant in English, but the diagram keeps the focus on the nature of the relationships described.

Broader Benefits of Concept Mapping for Content Strategy

So far, we’ve discussed how concept maps can help web teams understand what the metadata means, and how they need to organize their metadata descriptions.  Concept maps can also help web teams plan their content.  Teams can use maps to decide what content to present to audiences, and even what content to create that audiences may be interested in.

Content Planning

Jarno van Driel, a Dutch SEO expert, notes that many publishers treat schema.org as “an afterthought.”  Instead, Jarno argues, publishers should consult the properties available in schema.org to plan their content.  Schema.org is a collective project, where different contributors identify properties relating to entities they would like to mention that they feel would be of interest to audiences.  Schema.org can be thought of as a blueprint for information you can provide audiences about different things you publish.  While our example concept map for product properties is simplified to conserve space, a more complete map would show many more properties, some of which you might decide to address in your content.  For example, audiences might want to know about the material, the width, or the weight of the product — properties available in schema.org that publishers may not have considered including in their content.

Content Design and Interaction Design

Concept maps can also reveal relationships between different levels of information that publishers can present.  Consider how this information is displayed on the screen.  Audiences may want to compare different values. They may want to know all the values for a specific property (such as all the colors available), or they want to compare the values for a property of two different instances (average rating of two different products).

Concept maps can reveal qualifications about the content (e.g., an Offer may be qualified by an area served).  Values (shown in salmon) can be sorted and ranked.  Concept maps also help web teams decide on the right level of detail to present.  Do they want to show average ratings for a specific product, or a brand overall?  By consulting the map, they can consider what data is available, and what data would be most useful to audiences.

Concept map app shows columns of entities and values, which allow exploration of relationships

Conclusion

Creating a concept map requires effort, but is rewarding.  It requires you to compare the specification of the standard with your representation of it, to check that relationships are known and understood correctly.  It allows you to see some characteristics, such as properties used by more than one entity. It can help content teams see the bigger picture of what’s available in schema.org to describe their content, so that the team can collectively agree to metadata requirements relating to their web content.  If you want to understand schema.org more completely, to know how it relates to the content you publish, creating a concept map is a good place to start.

— Michael Andrews

Categories
Storytelling

Writers Should Care About Metadata

Writers and computers share a common trait: a fussiness about words.  Writers choose their words with care. Computers are selective about the words they notice as well.

Metadata helps computers understand writing.  Writers should care about metadata.  Metadata influences how their writing connects to audiences.  Metadata is an important editorial tool, though writers often don’t appreciate the value it offers.

I can hear some writers saying: “Hold on! — That’s not my job.  I don’t know anything about metadata  — I studied literature in college.”  Metadata sounds like the antithesis of creative flair.  And in some ways it is.  I want to assure my friends who are writers that I’m not trying to turn them into geeks.  Instead, I want to suggest that by having a little understanding of the geeky side of content, they can be more successful as writers.

Metadata, put very simply, is computer code that explains the meaning of content.  That computer code can seem forbidding.  But such code offers practical benefits to writers, and helps make content more interesting.

Writers should think about metadata as a form of communication, just as pantomime and poetry are.  Metadata expresses ideas that are conveyed to audiences.

Metadata is a special form of communication, however,  Unlike pantomime, metadata is purpose-built for the web.

Metadata as Describing

The most common type of metadata is the description.  All web articles have META descriptions, which are short pithy statements summarizing the article.  These statements often appear below the article title in Google search results.  How well they are written can influence whether someone clicks on the link to read the article.  Descriptions, by their nature, involve editorial decisions.

Another important description relates to photographs.  Writers need to tell people what’s in a photograph.  If the description is boring and vague, why would people want to view the photo?  Describing visuals is becoming more important as people switch off their screens, and have content read aloud to them.

Metadata plays a valuable editorial role.  It indicates what’s important about the content.  Let’s consider some areas where metadata can help writers.

Suppose you are a film critic.  You’ve quit a boring job writing training manuals about industrial equipment, and can finally use your literature degree in your work.  Even as a film critic, you can amplify your writing by using metadata.  Contrary to what you might expect, metadata can help writers tell stories.

Let’s imagine you want to review a new French film about the painter Paul Cézanne.  The first conundrum is deciding how to refer to the film.  Do you use the original French title, Cézanne et moi, or the translated English title, Cezanne and I?  Fortunately, by using metadata, you can skirt this decision, by including the titles in both languages.  Metadata can indicate the language of content.  Someone in France could use language metadata to locate English language reviews of this French film, and compare them with the French language reviews.  Do French and English speaking critics rate the film in the same way?

Another decision might be how to categorize the theme of the film.  As a writer, you want your review to appear with other reviews about similar themes.  Is the film about friendship, or is it a buddy movie?   These terms relate to a concept  in metadata called controlled vocabulary values.  The writer needs to decide whether the theme is more about the friendship between two men, or about friendship generally.  The decision will influence who sees the review, based on their interests and expectations.

Metadata can describe many aspects of a film, such as all the cast and crew involved.  Writers might wonder, how interesting is all this information?

From the audience perspective, some information will be interesting to almost everyone, while other information is of interest only to committed fans.  For some, detailed information seems like a list of dry facts.  But for those who enjoy a film, the credits at the end provide extra value that enriches their experience.

We can see the different editorial dimensions of metadata in the IMDb entry for the Cézanne film.  (IMDb, the Amazon owned database, uses metadata extensively.  But I’ll hide the code, and show only what’s presented on the screen.)

'Storyline' metadata from IMDb description of film Cezzane and I. (screenshot via IMDb)
‘Storyline’ metadata from IMDb description of film Cezanne and I. (screenshot via IMDb)

First, we have the storyline, or plot summary.  Several sentences describe the film. To audiences, this is what’s important.  Does the film sound interesting or boring?  What is it really about (beyond friendship)?  Audiences need to know if the film is potentially interesting before they will care enough to read a critique of it.

Metadata and Prose

The metadata for the storyline is prose, in contrast to the list of names of cast members.  Some content strategists consider such prose as an unstructured “blob” — long passages, full of details, that aren’t broken out into a list or table.  But it is a mistake to view prose content as being beyond the reach of metadata.  Structuring content by breaking it into sections is a separate activity from adding metadata to content. Writers don’t need to “structure” their content into a list, table or other tightly defined unit to take advantage of metadata. Writers can, and should, add metadata to their prose.  By doing so, they will highlight some of the most interesting material.  Metadata is not a straight-jacket that limits how writers express their  perspectives.  Writers can write words, sentences and paragraphs as they please, and then add metadata to highlight important people, places and things mentioned in their text.

We can see in the storyline that the film concerns not only the painter Paul Cézanne (which we knew from the film title), but also the writer Emile Zola.  After reading the storyline, people may be interested in learning more about the film, or may want to learn more about the subject of the film.  Metadata can link this review to other writings related to the film in some way.  Perhaps readers want to read reviews about other films concerning Paul Cézanne, or concerning the same time period.  Metadata acts as a curator: linking to writings on related topics.

details
‘Details’ metadata from IMDb entry for Cezanne and I film (screenshot via IMDb)

Let’s turn to the more fact-oriented metadata.   To many writers, this material is dull.  Because it is presented in a list or in a table, and deals with minutiae such as film duration and release date, the content seems to offer little editorial interest.  Unless you are a big fan of someone in the film, or collect obscure facts to win pub quizzes, why would someone care about these details?

Stories from Metadata

For the writer, such detailed metadata presents an opportunity to tell more stories.  It may not be immediately obvious, but some of the details are unusual, or notable for some reason.  Since these details are described in a way computers can understand, the writer can easily compare these details with details for other films.  The writer can tell readers what’s significant about the film — in terms of casting, location, historical firsts, or contribution to overall performance for different kinds of film.

Metadata offers writers a lens to think about different dimensions of a topic.  By identifying various characteristics, metadata highlights connections between two or more of them.  This film is one of a number of friendship-themed movies that use the musical composition ”Roses of Picardy” by Haydn Wood.  (Other films include A Passage to India, and Charlie Brown’s Halloween special.)  What’s going on with this use of music?  There’s a story there, somewhere.

Metadata can bring attention to details that might not otherwise be noticed.  Writers can use metadata to discover and highlight details of interest to audiences.

Metadata can be a writer’s friend. It can help writers tell stories. Writers, for their part, can help computers appreciate their words and ideas by using metadata.

To become friends with metadata, writers will want to know more about how to create metadata and include it in their content.  They can learn about how that’s done in my new book, Metadata Basics for Web Content.  Read the book, so the content you write will be content that is read.  Make it your job to identify metadata that will connect audiences with your writing.

— Michael Andrews