Categories
Storytelling

Writers Should Care About Metadata

Writers and computers share a common trait: a fussiness about words.  Writers choose their words with care. Computers are selective about the words they notice as well.

Metadata helps computers understand writing.  Writers should care about metadata.  Metadata influences how their writing connects to audiences.  Metadata is an important editorial tool, though writers often don’t appreciate the value it offers.

I can hear some writers saying: “Hold on! — That’s not my job.  I don’t know anything about metadata  — I studied literature in college.”  Metadata sounds like the antithesis of creative flair.  And in some ways it is.  I want to assure my friends who are writers that I’m not trying to turn them into geeks.  Instead, I want to suggest that by having a little understanding of the geeky side of content, they can be more successful as writers.

Metadata, put very simply, is computer code that explains the meaning of content.  That computer code can seem forbidding.  But such code offers practical benefits to writers, and helps make content more interesting.

Writers should think about metadata as a form of communication, just as pantomime and poetry are.  Metadata expresses ideas that are conveyed to audiences.

Metadata is a special form of communication, however,  Unlike pantomime, metadata is purpose-built for the web.

Metadata as Describing

The most common type of metadata is the description.  All web articles have META descriptions, which are short pithy statements summarizing the article.  These statements often appear below the article title in Google search results.  How well they are written can influence whether someone clicks on the link to read the article.  Descriptions, by their nature, involve editorial decisions.

Another important description relates to photographs.  Writers need to tell people what’s in a photograph.  If the description is boring and vague, why would people want to view the photo?  Describing visuals is becoming more important as people switch off their screens, and have content read aloud to them.

Metadata plays a valuable editorial role.  It indicates what’s important about the content.  Let’s consider some areas where metadata can help writers.

Suppose you are a film critic.  You’ve quit a boring job writing training manuals about industrial equipment, and can finally use your literature degree in your work.  Even as a film critic, you can amplify your writing by using metadata.  Contrary to what you might expect, metadata can help writers tell stories.

Let’s imagine you want to review a new French film about the painter Paul Cézanne.  The first conundrum is deciding how to refer to the film.  Do you use the original French title, Cézanne et moi, or the translated English title, Cezanne and I?  Fortunately, by using metadata, you can skirt this decision, by including the titles in both languages.  Metadata can indicate the language of content.  Someone in France could use language metadata to locate English language reviews of this French film, and compare them with the French language reviews.  Do French and English speaking critics rate the film in the same way?

Another decision might be how to categorize the theme of the film.  As a writer, you want your review to appear with other reviews about similar themes.  Is the film about friendship, or is it a buddy movie?   These terms relate to a concept  in metadata called controlled vocabulary values.  The writer needs to decide whether the theme is more about the friendship between two men, or about friendship generally.  The decision will influence who sees the review, based on their interests and expectations.

Metadata can describe many aspects of a film, such as all the cast and crew involved.  Writers might wonder, how interesting is all this information?

From the audience perspective, some information will be interesting to almost everyone, while other information is of interest only to committed fans.  For some, detailed information seems like a list of dry facts.  But for those who enjoy a film, the credits at the end provide extra value that enriches their experience.

We can see the different editorial dimensions of metadata in the IMDb entry for the Cézanne film.  (IMDb, the Amazon owned database, uses metadata extensively.  But I’ll hide the code, and show only what’s presented on the screen.)

'Storyline' metadata from IMDb description of film Cezzane and I. (screenshot via IMDb)
‘Storyline’ metadata from IMDb description of film Cezanne and I. (screenshot via IMDb)

First, we have the storyline, or plot summary.  Several sentences describe the film. To audiences, this is what’s important.  Does the film sound interesting or boring?  What is it really about (beyond friendship)?  Audiences need to know if the film is potentially interesting before they will care enough to read a critique of it.

Metadata and Prose

The metadata for the storyline is prose, in contrast to the list of names of cast members.  Some content strategists consider such prose as an unstructured “blob” — long passages, full of details, that aren’t broken out into a list or table.  But it is a mistake to view prose content as being beyond the reach of metadata.  Structuring content by breaking it into sections is a separate activity from adding metadata to content. Writers don’t need to “structure” their content into a list, table or other tightly defined unit to take advantage of metadata. Writers can, and should, add metadata to their prose.  By doing so, they will highlight some of the most interesting material.  Metadata is not a straight-jacket that limits how writers express their  perspectives.  Writers can write words, sentences and paragraphs as they please, and then add metadata to highlight important people, places and things mentioned in their text.

We can see in the storyline that the film concerns not only the painter Paul Cézanne (which we knew from the film title), but also the writer Emile Zola.  After reading the storyline, people may be interested in learning more about the film, or may want to learn more about the subject of the film.  Metadata can link this review to other writings related to the film in some way.  Perhaps readers want to read reviews about other films concerning Paul Cézanne, or concerning the same time period.  Metadata acts as a curator: linking to writings on related topics.

details
‘Details’ metadata from IMDb entry for Cezanne and I film (screenshot via IMDb)

Let’s turn to the more fact-oriented metadata.   To many writers, this material is dull.  Because it is presented in a list or in a table, and deals with minutiae such as film duration and release date, the content seems to offer little editorial interest.  Unless you are a big fan of someone in the film, or collect obscure facts to win pub quizzes, why would someone care about these details?

Stories from Metadata

For the writer, such detailed metadata presents an opportunity to tell more stories.  It may not be immediately obvious, but some of the details are unusual, or notable for some reason.  Since these details are described in a way computers can understand, the writer can easily compare these details with details for other films.  The writer can tell readers what’s significant about the film — in terms of casting, location, historical firsts, or contribution to overall performance for different kinds of film.

Metadata offers writers a lens to think about different dimensions of a topic.  By identifying various characteristics, metadata highlights connections between two or more of them.  This film is one of a number of friendship-themed movies that use the musical composition ”Roses of Picardy” by Haydn Wood.  (Other films include A Passage to India, and Charlie Brown’s Halloween special.)  What’s going on with this use of music?  There’s a story there, somewhere.

Metadata can bring attention to details that might not otherwise be noticed.  Writers can use metadata to discover and highlight details of interest to audiences.

Metadata can be a writer’s friend. It can help writers tell stories. Writers, for their part, can help computers appreciate their words and ideas by using metadata.

To become friends with metadata, writers will want to know more about how to create metadata and include it in their content.  They can learn about how that’s done in my new book, Metadata Basics for Web Content.  Read the book, so the content you write will be content that is read.  Make it your job to identify metadata that will connect audiences with your writing.

— Michael Andrews

Categories
Intelligent Content

Why Standards Compliance is a Tricky Notion

I just published a book about metadata, called Metadata Basics for Web Content.  The book refers to many standards, and provides samples of code illustrating metadata (or structured data, if you prefer) using these standards.  To locate good code examples, I relied on international organizations such as the W3C, industry working groups such as schema.org, and prominent companies such as Google.

All these sources are important ones for publishers to consult.  But if you pay very close attention, you may notice that the various sources aren’t always completely aligned with one another. This is a bit disconcerting. Publishers, after all, are expected to comply with standards. Various standards reference and build on each other. But certain details are different as you move between different actors in the standards arena. How can that be, that standards aren’t completely aligned?  To answer that question one must consider the governance, mission, and adoption goals of various parties involved with standards.

Publishers should recognize that no one party is in charge of metadata standards. Many parties are involved.  Decisions and practices evolve organically through a combination of planning and adaptation.  Different parties offer different choices.

The W3C is the largest standards body addressing web content.  It has a fairly open structure.  If there is sufficient interest in a topic, where enough people volunteer to work on standards issue, then a group can be started, which can begin a process of drafting notes, recommendations, and eventually standards.  The W3C doesn’t always initiate standards.  Sometimes they embrace standards that have been developed by other groups.  And sometimes the W3C has different groups addressing broadly similar issues, but in different ways.  While W3C recommendations and standards carry tremendous weight, they do not always represent a single consensus about priorities.  Generally, they skew toward accommodating a diverse range of needs, rather than enforcing a narrow set of practices.  As a nonprofit body, the W3C isn’t marketing anything, or promoting adoption of one standard over another.

Many industry groups develop standards as well.  An important one in the area of web content metadata is called schema.org.  This group started out as a partnership between search engine companies, namely Google, Bing, Yahoo and Yandex.  These companies developed a core set of standards for describing common web content with metadata.  Now that the core standard has been developed, schema.org has subsequently transformed to become a W3C community group.  Google remains the single most important driver of schema.org’s development.  But as a community, the standard has accepted contributions from many parties, and the scope of the standard is expanding.

In addition to international bodies and industry groups, certain companies, on account of their size and influence, influence standards practices through the implementation choices they make.  They may set trends of what are deemed “best practices” or they may recommend to others how to do things.  Google again is a leading example of a single firm having a big influence on standards.  As a private company, it recommends guidelines to its customers, the publishers who want their content to display in Google’s search results.  These guidelines seem like standards, though they are specific to one company.

Let’s consider how different levels of standards interact with each other.

Metadata needs to be encoded using a syntax. One widely used syntax is called RDFa, which is a W3C standard.

Metadata also needs schema to indicate entities and properties within the content.  Schema.org metadata can be encoded using RDFa syntax.  So we have one standard relying on another.  But schema.org only uses part of the RDFa specification.  There are some features in RDFa that aren’t needed when implementing schema.org.  Other metadata schemas also use the RDFa syntax, and some of these take advantage of the additional features.  The group designing schema.org decided to pare down what was needed to implement schema.org in RDFa.  They chose to keep things as simple as they could to help promote adoption of their schema.

As mentioned earlier, Google is a key player as both a developer of schema.org, and as a consumer of schema.org metadata.  Google evangelizes the use of schema.org metadata, and they offer guidelines and tools to help webmasters learn what they need to do.  Publishers often take this advice as gospel.  They presume they need to comply with Google’s standards, at least as they understand them.   What they may not realize is that Google’s tools and guidelines are often advice rather than rigid rules.  When developing its advice and tools, Google has chosen to focus on high priority content that many organizations produce, and provide guidelines to help webmasters ensure that they don’t make mistakes when creating metadata for such content.  Google’s guidelines only cover a subset of the range of content addressed by schema.org.  In effect, Google has chosen to simplify schema.org further to encourage wider adoption of it.

Google’s guidelines provide assurance that if complied with, the metadata will work with Google.  However, it does not follow that if the publisher deviates from Google’s guidelines that their metadata is wrong.  Many publishers use Google’s structured data testing tool (SDTT) to validate their metadata.  It’s a useful tool, but it validates only some dimensions of schema.org metadata, not all dimensions.

Google's structured data testing tool "complaining" about a webpage on the schema.org website
Google’s structured data testing tool “complaining” about a webpage on the schema.org website

We can see the limitations of Google’s structured data testing tool by looking at how it assesses the schema.org website.  We can find pages where the schema.org website, which Google is involved with developing, fails Google’s own SDTT.  How can that be?  The schema.org website and Google’s SDTT serve different purposes, and even different audiences.  The SDTT is trying to encourage certain practices, and in a almost gamified manner, gives a thumbs up if the metadata code conforms to the advice.  Schema.org continually develops to cover a range of needs.  Some of these needs will be more specialized, and publishers may decide to implement metadata in a standards-compliant manner that doesn’t pass inspection by Google’s SDTT.  I would not assume, however, that Google’s search algorithms are incapable of interpreting standards-compliant metadata that fails Google’s SDTT.   I’d guess that Google’s search algorithms are probably more sophisticated than the code used in the SDTT.  Sometimes the SDTT is playing catch-up with new developments in schema.org.

Google is trying to do two things at once: expand the coverage of schema.org to make it even more useful in a wider range of domains and scenarios, and popularize schema.org by presenting a simple set of guidelines for publishers to follow.  It’s a difficult situation to balance, how to manage and evolve standards over time, while promoting easy-to-follow guidelines that publishers consider reliable.  I would not expect Google to encourage publishers to adopt complicated metadata implementations that some would struggle to code correctly.  If less sophisticated publishers fail, they might fault Google for encouraging them to try something that exceeded their understanding or abilities.

Sometimes publishers gripe that they’ve created logically-valid schema.org metadata that nonetheless fails Google’s SDTT.   But publishers seem more upset when they’ve created metadata that passes the SDTT, yet they fail to see how it shines in Google’s search results.  Where’s my rich snippet I was expecting? they complain.  For many publishers, seeing the rich snippet payoff is the reward for using schema.org structured data, and for using the SDTT.  The SDTT is not just a technical tool: it is a marketing and public relations tool for Google.

A representative rich snippet as shown is SDTT. For some publishers, seeing their structured data in search results provides tangible proof they are correct and compliant.
A representative rich snippet as shown in Google’s SDTT. For some publishers, seeing their structured data in search results provides tangible proof they are correct and compliant with standards.

So does metadata compliance mean that one follows the pages of details in W3C standards, or that one gets a snippet to show in Google’s search results? Standards compliance can involve many layers. There is no one standard to follow: there can be various permutations of a standard that are sanctioned or encouraged by different parties. Publishers need to rely on the standards guidance that best supports the goals they are trying to achieve with their metadata.

— Michael Andrews