Categories
Content Integration

Metadata Standards and Content Portability

Content strategists encounter numerous metadata standards.  It can be confusing why they matter and how to use them.  Don’t feel bad if you find metadata standards confusing: they are confusing.  It’s not you.  But don’t give up: it’s useful to understand the landscape.  Metadata standards are crucial to content portability.

Trees in the Forest

The most frustrating experiences can be when we have trouble getting to where we want to go.  We want to do something with our content, but our content isn’t set up to allow us to do that, often because it lacks the metadata standards to enable that.

The problem of informational dead-ends is not new.  The sociologist Andrew Abbott compares the issue to how primates move through a forest.  “You need to think about an ape swinging through the trees,” he says.  “You’ve got your current source, which is the branch you are on, and then you see the next source, on the next branch, so you swing over. And on that new hanging vine, you see the next source, which you didn’t see before, and you swing again.”  Our actions are prompted by the opportunities available.

Need a branch to grab: Detail of painting of gibbon done by Ming Dynasty Emperor Zhu Zhanji, via Wikipedia.
Need a branch to grab: Detail of painting of gibbon done by Ming Dynasty Emperor Zhu Zhanji, via Wikipedia.

When moving around, one wants to avoid becoming the ape “with no branch to grab, and you are stopped, hanging on a branch with no place to go.”  Abbot refers to this notion of primates swinging between trees (and by extension people moving between information sources) by the technical name of brachiation.  That word comes from the Latin word for arm — tree-swinging primates have long arms.  We want long arms to be able swing from place to place.

We can use this idea of swinging between trees to think about content.  We are in one context, say a website, and want to shift the content to another context: perhaps download it to an application we have on our tablet or laptop.  Or we want to share something we have on our laptop with a site in the cloud, or discuss it in a social network.

The content-seeking human encounters different trees of content: the different types of sites and applications where content lives.  When we swing between these sites, we need branches to grab.  That’s where metadata comes in.  Metadata provides the branches we can reach for.

Content Shifting

The range of content people use each day is quite diverse.  There is content people control themselves because it is only available to them, or people they designate.  And there is content that is published and fully public.

There is content that people get from other sources, and there is content they create themselves.

We can divide content into four broad categories:

  • Published content that relates to topics people follow, products and events they want to purchase, and general interests they have
  • Purchased and downloaded content, which is largely personal media of differing types
  • Personal data, which includes personal information and restricted social media content
  • User generated content of different sorts that has been published on cloud-based platforms
Diagram of different kinds of content sources, according to creator and platform
Diagram of different kinds of content sources, according to creator and platform

There are many ways content in each area might be related, and benefit from being connected.  But because they are hosted on different platforms, they can be siloed, and the connections and relationships between the different content items might not be made.

To overcome the problem of siloed content, three approaches have been used:

  1. Putting all the content on a common platform
  2. Using APIs
  3. Using common metadata standards

These approaches are not mutually exclusive, though different players tend to emphasize one approach over others.

Common Platform

The common platform approach seems elegant, because everything is together using a shared language.  One interesting example of this approach was pursued a few years ago by the open source KDE semantic desktop NEPOMUK project.  It developed a common, standards-based language of different kinds of content people used called a personal information model (PIMO), with an aim of integrating these.  The pathbreaking project may have been too ambitious, and ultimately failed to gain traction.

Diagram of PIMO content model, via semantic desktop.org
Diagram of PIMO content model, via semantic desktop.org

More recently, Microsoft has introduced Delve, a cloud-based knowledge graph for Microsoft Office that resembles aspects of the KDE semantic desktop.  Microsoft has unparalleled access to enterprise content and can use various metadata to relate various pieces to each other.  However, it is a closed system, with proprietary metadata standards and a limited ability to incorporate content from outside the Office ecosystem.

In the realm of personal content, Facebook’s recent moves to host publisher content and expand into video hints they are aiming to become a general content platform, where they can tightly integrate personal and social content with external content.  But the inherently closed nature of this ecosystem calls into question how far they can take this vision.

APIs

API use is growing rapidly.  APIs are a highly efficient solution for narrow problems.  But they don’t provide an ideal  solution for a many-to-many environment where diverse content is needed by diverse actors.  By definition, consumers need to form agreements with providers to use their APIs.  It is a “you come to me and sign my agreement” approach.  This means it doesn’t scale well if someone needs many kinds of content from many different sources.  There are often restrictions on the types or amount of content available, or its uses.  APIs are often a way that content providers can avoid offering their content in an industry standard metadata format.  The consumer of the content may get it in a schemaless JSON feed, and needs to create their own schema to manage the content.   For content consumers, APIs can foster dependence, rather than independence.

Common Metadata Standards

Content reuse is greatly enhanced when both content providers and content consumers embrace common metadata standards.  This content does not need to be on the same platform, and there does not need to be explicit party-to-party agreement for reuse to happen.  Because the metadata schema is included, it is easy to repurpose the content without having to rebuild a data architecture around it.

So why doesn’t everyone just rely on common metadata standards?  They should in theory, but in practice there are obstacles.  The major one is that not everyone is playing by the same rules.  Metadata standards are chaotic.  No one organization is in charge.  People are free to follow whichever ones they like.  There may be competing standards, or no accepted common standard at all.  Some of this is by design: to encourage flexibility and innovation.  People can even mix-and-match different standards.

But chaos is hard to manage.  Some content providers ignore standards, or impose them on others but don’t offer them in return.  Standards are sometimes less robust than they could be.  Some standards like Dublin Core are so generic that it can be hard to figure out how to use them effectively.

The Metadata Landscape

Because there are so many metadata standards available that relate to so many different domains, I conducted a brief inventory of them to identify ones relating to everyday kinds of content.  This is a representative list, meant to highlight the kinds of metadata a content strategist might encounter.  These aren’t necessarily recommendations on standards to use, which can be very specific to project needs.  But by having some familiarity with these standards, one may be able to identify opportunities to piggyback on content using these standards to benefit content users.

Diagram showing common metadata standards used for everyday content
Diagram showing common metadata standards used for everyday content

Let’s imagine you want to offer a widget that let’s readers compile a list of items relating to a theme.  They may want to pull content from other places, and they may want to push the list to another platform, where it might be transformed again.  Metadata standards can enable this kind of movement of content between different sources.

Consider tracking apps.  Fitness, health and energy tracking apps are becoming more popular.  Maybe the next thing will be content tracking apps.  Publishers already collect heaps of data about what we look at.  We are what we read and view.  It would be interesting for readers to have access to those same insights.  Content users would need access to metadata across different platforms to get a consolidated picture of their content consumption habits and behavior.  There are many other untapped possibilities for using content metadata from different sources.

What is clear from looking at the metadata available for different kinds of content is that there are metadata givers, and metadata takers.  Publishers are often givers.  They offer content with metadata in order to improve their visibility on other platforms.  Social media platforms such as Facebook, LinkedIn and Twitter are metadata takers.  They want metadata to improve their management of content, but they are dead-end destinations: once the content is in their ecosystems, its trapped.  Perhaps the worst parties are the platforms that host user generated content, the so-called sharing platforms such as Slideshare or YouTube.  They are often indifferent to metadata standards.  Not only are they a dead-end (content published there can’t be repurposed easily), they sometimes ask people to fill in proprietary metadata to fulfill their own platform needs.  Essentially, they ask people to recreate metadata because they don’t use common standards.

Three important standards in terms of their ubiquity are Open Graph, schema.org, and iCal.  Open Graph is very limited in what it describes, and is largely the product of Facebook.  It is used opportunistically by other social networks (except Twitter), so is important for content visibility.  The schema.org vocabulary is still oriented toward the search needs of Google (its originator and patron), but it shows some signs of becoming a more general-purpose metadata schema.   Its strength is its weakness: a tight alignment with search marketing.  For example, airlines don’t rely on it for flight information, because they rely instead on APIs linked to their databases to seed vertical travel search engines that compete with Google.  So travel information that is marked up in schema is limited, even though there is a yawning gap in markup standards for travel information.  Finally, iCal is important simply because it is the critical standard that coordinates informational content about events into actions that appear in users’ calendars.  Enabling people to take actions on content will be increasingly important, and getting something in or from someone’s calendar is an essential aspect of most any action.

Whither Standards

Content strategists need to work with the standards available, both to reuse content marked up in these standards, and to leverage existing markup so as to not reinvent the wheel.  The most solid standards concern anchoring information such as dates, geolocations, and identity (the central oAuth standard).  Metadata for some areas such as video seems far from unified. Metadata relating to other areas such as people profiles and event information can be converted between different standards.

If recent trends continue, independently developed standards such as microformats will have an increasingly difficult time gaining wide acceptance, which is a pity.  This is a reflection of the consolidation of the digital industry into the so-called Gafam group (Google/Apple/Facebook/Amazon/Microsoft), and the shift from the openness associated with firms like Sun Microsystems in the past, to epic turf battles and secrecy that today dominate the headlines in the tech press.  Currently, Google is probably the most vested in promoting open metadata standards in this group through its work with schema, although it promotes proprietary standards for its cloud-based document suite.  Adobe, now very second tier, also promotes some open standards.  Facebook and Apple, both enjoying a strong position these days, seem content to run closed ecosystems and don’t show much commitment to open metadata standards.  The same is true of Amazon.

The beauty of standards is that they are fungible: you can convert from one to another.  It is always wise to adopt an existing standard: you will enjoy more flexibility to change in the future by doing so.  Don’t be caught without a branch to swing to.

— Michael Andrews

Categories
Intelligent Content

Key Verbs: Actions in Taxonomies

When authors tell stories, verbs provide the action. Verbs move audiences. We want to know “what happened next?” But verbs are hard to categorize in ways computers understand and can act on. Despite that challenge, verbs are important enough that we must work harder to capture their intent, so we can align content with the needs of audiences. I will propose two approaches to overcome these challenges: task-focused and situational taxonomies. These approaches involve identifying the “key verbs” in our content.

Nouns and Verbs in Writing

I recently re-read a classic book on writing by Sir Ernest Gowers entitled The Complete Plain Words. Published immediately after the Second World War, the book was one of the first to advocate the use of plain language.

Gowers attacks obtuse, abstract writing. He quotes with approval a now forgotten essayist G.M. Young:

“Excessive reliance on the noun at the expense of the verb will in the end detach the mind of the writer from the realities of here and now, from when and how, and in what mood this thing was done and insensibly induce a habit of abstraction, generalization and vagueness.”

If we look past the delicious irony — a critique of abstraction that is abstract — we learn that writing that emphasizes verbs is vivid.

Gowers refers to this snippet as an example of abstract writing:

  • “Communities where anonymity in personal relationships prevails.”

Instead, he says the wording should be:

  • “Communities where people do not know one another.”

Without doubt the second example is easier to read, and feels more relevant to us as individuals. But the meaning of the two sentences, while broadly similar, is subtly different. We can see this by diagramming the key components. The strengthening of the verb in the second example has the effect of making the subject and object more vague.

role of verb in sentence

It is easy to see the themes in the first example, which are explicitly called out. The first diagram highlights themes of anonymity and personal relationships in a way the second diagram does not. The different levels of detail in the wording will draw attention to different dimensions.

With a more familiar style of writing, the subject is often personalized or implied. The subject is about you, or people like you. This may be one reason why lawyers and government officials like to use abstract words. They aren’t telling a specific story; they are trying to make a point about a more general concept.

Abstract vs Familiar Styles

I will make a simple argument. Abstract writing focuses on nouns, which are easier for computers to understand. Conversely, content written in a familiar style is more difficult for computers to understand and act on. Obviously, people — and not computers — are the audience for our content. I am not advocating an abstract style of writing. But we should understand and manage the challenges that familiar styles of writing pose for computers. Computers do matter. Until natural language processing by computers truly matches human abilities, humans are going to need to help computers understand what we mean. Because it’s hard for computers to understand discussions about actions, it is even more important that we have metadata that describes those actions.

The table below summarizes the orientations of each style. These sweeping characterizations won’t be true in all cases. Nonetheless, these tendencies are prevalent, and longstanding.

Abstract Style Familiar Style
Emphasis
  • Nouns
  • General concepts
  • Reader is outside of the article context
  • Verbs
  • Specific advice
  • Reader is within the article context
Major uses
  • Represents a class of concepts or events
  • Good for navigation
  • Shows instance of a concept or event
  • Good for reading
Benefits
  • Promotes analytic tracking
  • Promotes automated content recommendations
  • Promotes content engagement
  • Promotes social referrals
Limitations
  • Can trigger weak writing
  • Can trigger weak metadata

These tendencies are not destiny. Steven Pinker, the MIT cognitive scientist turned prose guru, can write about abstract topics in an accessible manner — he makes an effort to do so. Likewise, it is possible to develop good metadata for narrative content. It requires the ability to sense what is missing and implied.

Challenges of a Taxonomy of Verbs

Why is metadata difficult for narrative content? Why is so much metadata tilted toward abstract content? There are three main issues:

  • Indexing relies on basic vocabulary matching
  • Taxonomies are noun-centric
  • Verbs are difficult to specify

Indexing and Vocabulary Matching

Computers rely on indexes to identify content. Metadata is a type of index that identifies and describes the content. Metadata indexes may be based the manual tagging of content (application of metadata) with descriptive terms, or be based on auto-indexing and auto-categorization.

Computers can easily identify and index nouns, often referred to as entities. Named entity recognition can identify proper nouns such as personal names. It is also comparatively easy to identify common nouns in a text when a list of nouns of interest has been identified ahead of time. This is done either through string indexing (matching the character string to the index term) or assigned indexing (matching a character string to a concept term that has been identified as equivalent.)

The manual tagging of entities is also straightforward. A person will identify the major nouns used in a text, and select the appropriate index term that corresponds to the noun. When they decide what things are most important in the article (often the things mentioned most frequently), they find tags that describe those things.

When the text has entities that are proper or common nouns, it isn’t too hard to identify which ones are important and should be indexed. Abstract content is loaded with such nouns, and computers (and people) have an easy time identifying key words that describe the content. But as we will see, when the meaning of a text is based on the heavy use of pronouns and descriptive verbs, the task of matching terms to an index vocabulary becomes more difficult. Narrative content, where verbs are especially important to the meaning, is challenging to index. Nouns are easier to decipher than verbs.

Taxonomies are Noun-centric

When we offer a one-word description, we tend to label stuff using nouns. The headings in an encyclopedia are nouns. Taxonomies similarly rely on nouns to identify what an article is about. It’s our default way of thinking about descriptions.

Because we focus on the nouns, we can easily overlook the meaning carried by the verbs when tagging our content. But verbs can carry great meaning. Consider an article entitled “How to feel more energetic.” There are no nouns in the title to match up with taxonomy terms. Depending on the actual content of the article, it might relate to exercise, or diet, or mental attitude, but those topics are secondary in purpose to the theme of the article, which is about feeling better. A taxonomy may have granular detail, and include a thesaurus of equivalent and related terms, but the most critical issue is that the explicit wording of the article can be translated into the vocabulary used in the taxonomy.

Verbs are Difficult to Specify

Verbs also can be included in descriptive vocabularies for content, but they are more challenging to use. Verbs are sometimes looser in meaning than nouns. Sometimes they are figurative.

graph of verb definition
Verbs such as to make can have many different meanings

A verb may have many meanings. These meanings are sometimes fuzzy. Actions and sentiments can be described by multiple verbs and verbal phrases. Consider the most overworked and meaningless verb used on the web today: to like. If Ralph “likes” this, what does that really mean? Compared to what else? The English language has a range of nuanced verbs (love, being fond of, being interested in, being obsessed with, etc.) to express positive sentiment, though it is hard to demarcate their exact equivalences and differences.

Many common verbs (such as work, make or do) have a multitude of meanings. When the meaning of a verbs is nebulous, it is takes more work to identify the preferred synonym used in a taxonomy. Consider this example from a text-tagging tool. The person reading the text needs to make the mental leap that the verb “moving” refers to “money transfer.” The task is not simply to match a word, but to represent a concept for an activity. We often use imprecise verb like move instead of more precise verb like transfer money. Such verbal informality makes tagging more difficult.

Tagging a verb with a taxonomy term.  Screenshot via Brat.
Tagging a verb with a taxonomy term. Screenshot via Brat.

With the semantic web, predicates play the role of verbs defining the relationship between subjects and objects. The predicates can have many variants to express related concepts. If we say, “Jane Blogs was formerly married to Joe Blogs,” we don’t know what other verbal phrase would be equivalent. Did Jane Blogs divorce Joe Blogs? Did Joe Blogs die? Another piece of information may be needed to infer the full meaning. Verbal phrases can carry a degree of ambiguity, and this makes using a standard vocabulary for verbs harder to do.

Samuel Goto, a software engineer at Google, has said: “Verbs … they are kind of weird.”

Computers can’t understand verbs easily. Verb concepts are challenging for humans to describe with standardized vocabulary. Tagging verbs requires thought.

Why Verb Metadata Matters

If verbs are a pain to tag, why bother? So we can satisfy both the needs of our audiences and the needs of the computers that must be able to offer our audiences precisely what they want. As an organization, we need to make sure all this is happening effectively. We need to harmonize three buckets of needs: audience, IT, and brand.

Audience needs: Most audiences expect content written in familiar style, and want content with strong, active verbs. Those verbs often carry a big share of the meaning of what you are communicating. Audiences also want precise content, rather than hoping to stumble on something they like by accident. This requires good metadata.

IT needs: Computers have trouble understanding the meaning of verbs. Computers need a good taxonomy to support navigation through the content, and deliver good recommendations.

Brand needs: Brands need to be able to manage and analyze content according to the activities discussed in the content, not just the static nouns mentioned in it. If they don’t have a plan in place to identify key verbs in their content, and tag their meaning, they run the risk of having a hollow taxonomy that doesn’t deliver the results needed.

A solution to these competing needs is to have our metadata represent the actions mentioned in the content. I’m calling this approach finding your key verbs.[1]

Approaches to a Metadata of Actions

Two approaches are available to represent verb concepts. The first is to make verbs part of your taxonomy. The second is to translate verbs in your content into nouns in your taxonomy.

Task-focused Taxonomies

The first approach is to develop a list of verbs that express the actions discussed in your content. Starting with the general topics about which you produce content, you can do an analysis and see what specific activities the content discusses. We’ll call these activities “tasks.”

Think about the main tasks for the people we want to reach. How do they talk about these tasks? People don’t label themselves as a new-home buyer: they are looking for a new home. They may never actually buy, but they are looking. Verbs help us focus on what the story is. There may be sub tasks that our reader would do, and would want to read about. Not only are they looking for a new home, they are evaluating kitchens and getting recommendations on renovations. This task focus is important to help us manage content components, and track their value to audience segments. We can do this using a task-focused taxonomy.

I am aware of two general-purpose taxonomies that incorporate verbs. The tasks these taxonomies address may differ from your needs, but they may provide a starting point for building your own.

The new “actions” vocabulary available in schema.org is the better known of the two. Schema.org has identified around 100 actions “to describe what can be done with resources.” The purpose is to be able not only to find content items according to the action discussed, but to enable actions to be taken with the content. As a simple example, you might find an event, and click a button to confirm your attendance. Behind the scenes, that action will be managed by the vocabulary.

The schema actions are diverse. Some describe high-level activities such as to travel, while others refer to very granular activities, such as to follow somebody on a social network. Some task are real world tasks, and others strictly digital ones. I presume real-world actions are included to support activity-reporting from the Internet of Things (IoT) devices that monitor real-world phenomena such as exercise.

screenshot of schema.org actions terms
Schema.org actions taxonomy (partial)

Framenet, a semantic tagging vocabulary used by linguists, is a another general vocabulary that provides coverage of verbs. If a sentence uses the verb “freeze” (in the sense of “to stop”), it is tagged with the concept of “activity_pause.” It is easiest to see how Framenet verb vocabulary works using an example from David Caswell’s project, Structured Stories. Verbs that encapsulate events form the core of each story element. [2]

screenshot structured stories
Screenshot from the Structured Stories project, which uses Framenet.

Applications of Task Taxonomies

While both these vocabularies describe actions at the sentence or statement level, they can be applied to an entire article or section of content as well.

A task focus offers several benefits. Brands can track and manage content about activities independently of who specifically is featured doing the activity, or where/what the object or outcome of the activity is. So if brands produce content discussing options to travel, they might want to examine the performance of travel as a theme, rather than the variants of who travels or where they travel.

Task taxonomies also enable task-focused navigation, which lets people to start with an activity, then narrow down aspects of it. A sequence might start: What do you want to do? Then ask: Where would you like to do that? The sequence can work in reverse as well: people can discover something of interest (a destination) and then want to explore what to do there (a task).

Situational taxonomies

A second option uses nouns to indicate the notable events or situations discussed. Using nouns as proxies for actions unfortunately doesn’t capture a sense of dynamic movement. But if you can’t support a faceted taxonomy that can mix nouns and verbs, it may be the most practical option. When you have a list of descriptors that express actions discussed in your content, you are more likely to tag these qualities than if your taxonomy is entirely thing-centric. I’ll call a taxonomy that represents occasions using noun phrases a situational taxonomy. The terms in a situational taxonomy describe situations and events that may involve one or more activities.

If you have ever done business process modeling, you are familiar with the idea of describing things as passing through a routine lifecycle. We reify activities by giving them statuses: a project activity is under development, in review, launched, and so on. Many dimensions of our work and life involve routines with stages or statuses. When we produce content about these dimensions, we should tag the situation discussed.

One way to develop a situational taxonomy is by creating a blueprint of a detailed user journey that includes an end-to-end analysis of the various stages that real-world users go through, including the “unhappy path” where they encounter a situation they don’t want. Andrew Hinton has made a compelling case in his book Understanding Context that the situations that people find themselves in drive the needs they have. Many user journey maps don’t name the circumstances, they jump immediately into the actions people might do. Try to avoid doing that. Name each distinct situation: both the ones actively chosen by them as well as those foisted on them. Then map these terms to your content.

Situational taxonomies are suited to content about third parties (news for example) or when emphasizing the outcomes of a process rather than the factors that shape it. Processes that are complex or involve chance (financial gyrations or a health misfortune, for example) are suited to situational taxonomies. A situational taxonomy term describes “what happened?” at a high level. Thinking about events as a process or sequence can help to identify terms to describe the action discussed in the content.

The technical word for making nouns out of verbs is “nominalization.” For example, the verb “decide” becomes the noun “decision.” Not all nominalizations are equal: some are very clunky or empty of meaning. Decision is a better word than determination, for example. Try to keep situational terms from becoming too abstract.

Situational taxonomies are less granular than task-based ones. They provide an umbrella term that can represent several related actions. They can enhance tracking, navigation and recommendations, but not as precisely as task-based terms. Task taxonomies express more, suggesting not only what happens, but also how it happens.

Key Verbs Mark the Purpose of the Content

Identifying key verbs can be challenging work. Not all headlines will contain verbs. But ideally the opening paragraph should reveal verbs that frame the purpose of the article. Content strategists know that too much content is created without a well-defined purpose. Taxonomy terms focused on actions indicate what happens in the content, and suggest why that matters. Headlines, and taxonomy terms that rely entirely on nouns, don’t offer that.

We will look at some text from an animal shelter. I have intentionally removed the headline so we can focus on the content, to find the core concepts discussed. A simple part-of-speech app will allow us to isolate different kinds of words. First we will focus on the verbs in the text, which includes the terms “match”, “spot”, “suit”, “ask”, and “arrange”. The verb focus seems to be “matching.” Matching could be a good candidate term in a task taxonomy.

part of search view of verbs in narrative

Now we’ll look at nouns. In additional to common nouns such as dogs and families, we see some nouns that suggest a process. Specifically, several nouns include the word “adoption.” Adoption would be a candidate term in a situational taxonomy. Note the shift in focus: adoption suggests a broader discussion about the process, whereas matching suggests a more specific goal.

part of search view of nouns in narrative

When you look at content through the lens of verbs, questions arise. What verbs capture what the content is describing? Why is the content here? What is the reader or viewer supposed to do with this information? Could they tell someone else what is said here?

If you are having trouble finding key verbs, that could indicate problems with the content. Your content may not describe an activity. There is plenty of content that is “background content,” where readers are not expected to take any action after reading the content. If your goal for producing the content is simply to offer information for reference purposes, then it is unlikely you will find key verbs, because the content will probably be very noun-centric. The other possibility is that the writing is not organized clearly, and so key actions discussed are not readily seen. Both possibilities suggest a strategy check-up might be useful.

Avoid a Hollow Taxonomy

Even when tagging well-written content, capturing what activity is represented will require some effort. This can’t be automated, and the people doing the tagging need to pay close attention to what is implied in the content. They are identifying concepts, not simply matching words.

Tagging is easier to do when one already has vocabulary to describe the activities mentioned in your content. That requires auditing, discovery and planning. If your taxonomy only addresses things and not actions, it may be hollow. It can have gaps.

Most content is created to deliver an outcome. Metadata shouldn’t only describe the things that are mentioned. It should describe the actions that the content discusses, which will be explicitly or implicitly related to the actions you would like your customers to take. You want to articulate within metadata the intent of the content, and thus be in a position to use the content more effectively as a result. Key verbs let you capture the essence of your content.

By identifying key verbs, brands can use active terminology in their metadata to deliver content that is aligned with the intent of audiences.

diagram of key verb roles
How key verb metadata can support content outcomes

The Future Web of Verbs

Web search is moving “from the noun to the verb,” according to Prabhakar Raghavan, Google’s Vice President of Engineering.

We are at the start of a movement toward a web of verbs, the fusing of content and actions. Taxonomy is moving away from its bookish origins as the practice of describing documents. Its future will increasingly be focused on supporting user actions, not just finding content. But before we can reach that stage, we need to understand the relationship between the content and actions of interest to the user.

Taxonomies need to reflect the intent of the user. We can understand that intent better when we can track content according to the actions it discusses. We can serve that intent better when we can offer options (recommendations or choices) centered on the actions of greatest interest to the user.

The first area that verb taxonomies will be implemented will likely be transactional ones, such as making reservations using Schema actions. But the applications are much broader than these “bottom of the funnel” interventions. Brands should start to think about using action-oriented taxonomy terms through their content offerings. This is an uncharted area, linking our metadata to our desired content outcomes.

— Michael Andrews


  1. Key verbs build on the pre-semantic idea of key words, but are specific to activities, and represent concepts (semantic meaning) instead of literal word strings.  ↩
  2. You can watch a great video of the process on YouTube.  ↩
Categories
Content Effectiveness

The Growing Irrelevance of SEO

If you listen to discussions about penguins, hummingbirds, pandas, and knowledge bots, you might get the impression that search engine optimization is starting to converge with the discipline of content strategy. The SEO industry sounds enlightened when they talk about the importance of content quality, and the value of semantics in positioning content. But it would be a mistake to assume that the SEO industry is now on the same page as content strategy. SEO consultants continue to view content through the wrong end of the telescope, and believe that demystifying Google is the key to content success. They still don’t understand that Google is not the audience for your content. The more you worry about Google, the more likely your content won’t meet the needs of your real audience, because you’ve diverted your attention from important goals, and squandered limited resources.

Why Fear Google?

No one knows exactly how big the SEO industry is — or even how to define it. According to some estimates, global SEO spending accounts for several billion dollars each year. Unlike search engine marketing, SEO is supposed to be cost-free; yet counterintuitively, firms spend billions of dollars on it. Brands seem to hire SEO consultants for two main reasons: the fear of making a costly mistake, and the fear they don’t understand what exactly Google is doing that might impact them. Google is a formidable (and secretive) $60 billion company. The software industry is full of consultants who exist to explain the proprietary products of big vendors. Microsoft and SAP have their third party explainers who decipher the products for customers, and help them implement them. Marketers have the SEO industry to take the fear out of Google. And since Google keeps changing things, the SEO consultants carry on explaining the supposed implications of the latest changes.

Change and Denial

On nearly every front, content discovery is experiencing massive changes. Search is declining as a source of referral traffic, as social media (Facebook in particular) becomes more important. Referral traffic is harder to track, as more traffic comes from offline and encrypted referrers. Digital ad revenues — the raison d’être for Google’s search business — are under pressure, due to the lack of mobile cookies, audience cross channel hopping, and ad oversupply. In the face of these pressures, Google has iteratively increased the sophistication of its search. It has transformed search ranking from being a set of practices that could be partially reverse engineered, to a complex data structure that is not knowable to outside parties.

SEO consultants comment on these changes profusely, while maintaining that the key to success is to continue following the same advice they’ve always given. Good SEO advice is timeless, they’d have you believe. I have yet to read any SEO consultant admit they’ve changed their mind about what’s effective. Tactics go out of fashion when Google publicly belittles them, changes priorities, or chooses to make an example out of a party that audaciously believes it can crack the system. The consultants tend to deny, when pressed, that they ever advocated now unfashionable tactics such as link building or keyword stuffing. But the power of keywords remains a core belief of the SEO industry.

Old Formulas are Broken

I was recently at a content strategy conference that featured a speaker who is an SEO specialist. I wasn’t previously familiar with her, but judging from her social media profile she’s well known with a large following. She provided a standard menu of recommendations about content:

  1. Research keywords you can use in your content
  2. Write content using your keywords
  3. Get a bigger audience.

Most of the talk focused on researching keywords. There are numerous tools trying to estimate or simulate what is happening in the web universe, and slice this data in various ways to provide insights. I admire the inventiveness of the data brokers in offering information that — on the surface at least — looks like it should be valuable. Who wouldn’t want to know which keywords are popular, or what ad words competitors are bidding for? Tools proliferate to provide pieces of data you don’t know. But rarely do SEO consultants debate how important it is to know these things, or how accurate the data is. Most of the data simply isn’t relevant.

SEO is driven by the herd instinct. What are other people doing? Let’s do what other people are doing! The practice of SEO is the practice of mimicry. Follow trends, rather than pursue one’s own goals or be by guided by one’s own results. When Google activity acts as the North Star guiding decisions, then the interests of brands and audiences consequently become a secondary priority.

Brands hope that if they rank high in a search by finding the perfect combination of popular but underutilized keywords, that audiences will want to engage with them. Brands can blame low engagement on poor search rank visibility due to a few poor keyword choices. With an SEO focus, the underlying quality of the content is never questioned.

SEO has been described, with justification, as the practice of “writing for Google.” Writing for Google is not the same as writing for audiences. It’s dangerous to assume that Google keywords reflect the content needs of audiences you want to reach.

Let’s imagine a small craft beer company. They take pride in the fine ingredients they use, and the attention they give to their beer making. But an SEO consultant tells them they are missing out on valuable web traffic. His research indicates that people online are searching for beer in combination with the topic of the beach. Moreover, one of big beer companies is running a beach themed marketing campaign relating to their beer. So the craft brewer develops some beach themed content featuring games with people in swimsuits, and does find that web traffic increases. But sales don’t improve: actually some core fans are turned off by the beach stuff. Turns out that the new web visitors are largely 14 year olds. By the time they can legally purchase the product, they will consider the brand too juvenile for their tastes.

This parable illustrates the two core fallacies of SEO.

SEO fallacy #1: Treating All Page Views as Equally Valuable

The logic of SEO is beguilingly simple: Better performing search terms result in higher rankings that result in more page views. This narrative is PowerPoint and Excel friendly. It’s easy to digest, because it avoids discussion of a messy variable called people.

Where is the audience? They are hiding behind the search terms, and the page views. SEO consultants presume there is one mass audience that is typing search terms and one mass audience viewing pages, and that these audiences are one and the same. Then, in an even bigger leap of faith, they assume that the audience viewing a page is the same as the audience you are looking to attract because, after all, they clicked on your page.

Even if we assume SEO can deliver a larger audience, it doesn’t follow that it delivers the right audience. SEO lacks an effective concept of audience segmentation. It may be able to tell you what terms are being used in searches, but can’t tell you who is using these terms. Even ad words only offer crude segmentation data, and provide no real ability to parse how different audience segments use words in different ways. The terms that are most popular in a general sense aren’t necessarily the terms popular with your target audience.

The limitations of keywords are apparent when one starts from the end goal and works backwards. Suppose you produce an expensive ceiling fan that looks amazing and sells for several thousand dollars. You want to attract an audience prepared to spend that amount of money on a fan and appreciate it. What keywords do you use? Ideally you want to use the keywords that would be used by the real customers of your product. But the mass audience of SEO doesn’t help you pinpoint which keywords are right. You don’t know if high-end shoppers really search using such terms as “luxury” or “designer,” or if those are down-market terms used by people who aspire to products they believe are fancier than they really are. A flawed keyword focus might end up driving traffic to your website from people looking for a designer fan they saw at Walmart. You’ve won the page view sweepstakes, but haven’t succeeded attracting serious prospects. Rather than try to second guess search terms, it could be more effective to talk authentically about the product and rely on the content to provide the connections to search engines.

SEO Fallacy #2: Having Search Terms Drive Brand and Content Strategy

Chasing what’s popular means you are hostage to fads. Planning content around popular keywords is not strategic. Popularity changes. You may believe you “own” a keyword until a bigger competitor starts using it and wipes out your search position. Keywords are rarely decisive in determining search rank, which is heavily influenced by the general authority of the hosting website.

Your content should reflect consistent and enduring priorities for your brand and your content strategy. What ranks well on Google today may not rank well in six months, if keywords are the decisive reason. Google is changing its ranking algorithm continuously, and it is foolish to try to shape your content to fit it if you want your content to be valuable over the longer term.

With keyword-driven content, you surrender control over what you talk about. You start creating content because it is popular, not because it is relevant to your brand or to the specific audience you want to attract. You loose control of your brand voice and message, since keywords reflect a generic, lowest common denominator mode of expression, a modern form of caveman talk. People may use primitive vocabulary in a search box, but they don’t necessarily want the content they see to be as dumbed down as the search they have parsimoniously entered. Emphasizing the most popular keywords in your content can undermine your brand’s credibility with the audiences you most want to attract.

You Don’t Control Semantic Search

There are signs that some SEO consultants are starting to pivot on keywords. As Google search increasingly relies on identifying semantic and linguistic relationships, SEO consultants have turned their attention to unlocking how semantic search works.

Even though Google has redefined how they retrieve and rank search items, the idea that you can, and should, write for Google refuses to die in the SEO industry. What remains true is that the ability to gain a competitive advantage by writing for search engines is limited. Making search engines the priority of your writing is ultimately counterproductive. If you adopt some of the latest SEO thinking, you will make your content operations less efficient, and baffle your audience in the process.

Various SEO consultants in recent months have offered explanations of semantic search, making it sound fiendish. If fear of the unknown animated prior discussion of SEO mysteries, semantic search is presented as even more cryptic; SEO consultants seem eager to detail its complexities. But rather than admit they don’t know exactly how Google weights the numerous factors they use, SEO consultants imply the black box of Google search can still be reverse engineered. The advice being offered can border on comical. Instead of suggesting repetitively using keywords (the so-called keyword density tactic), SEO consultants now suggest using many synonyms in what you write about, since Google considers synonyms in its search results. The theory is that using lots of synonyms will make the content appear “less thin” to Google.

We find SEO consultants urging clients to develop topic modelling of their content so they can improve “on page optimization.” How toying with topic modelling (the computer modelling of thematically related words) is supposed to improve search ranking is never clear; presumably it is based on the idea that if the brand talks the same way that Google’s algorithms evaluate pages, then it will rank more highly. Like much other semantic SEO advice, its value is taken on faith. The advice is not actionable by authors, who have no practical means to implement it.

A writer on Moz asks: “What is this page about? As marketers, helping search engines answer that basic question is one of our most important tasks.” He recommends clients evaluate their “term frequency–inverse document frequency” to “help” Google. Here is another example of expropriating a technical concept from the science of information retrieval, and assuming that content authors can somehow usefully apply these insights to better serve audiences.

Much of the new wave of semantic SEO advice is warmed-over keyword stuffing. Instead of stuffing keywords, they urge clients to stuff “concepts.” Writers are supposed to add pointless words to their content to bulk up the number of explicit conceptual associations mentioned. Never mind if the audience finds this verbiage superfluous. The semantic SEO advice implies that all pages should look like Wikipedia: brimming with as many concrete nouns as possible so that they rank highly according to what they imagine Google’s semantic search is looking for.

If brands embrace this new talk about concept stuffing, it is only a matter of time before Google identifies and penalizes black hat semantic markup that is superfluous and not reflective of the genuine substance of an article.

It may be a shock to the SEO industry, but Google doesn’t need their help to understand what a page is about. Google is famous for developing a driverless car. They certainly don’t need back seat drivers directing their search engines. Google has been trying to shake off the influence of SEO consultants for some time. They’d rather collect ad money from brands directly, instead of having SEO consultants volunteer confusing guidance that makes brands wary of Google.

Google Doesn’t Care about Keywords, but Humans Do

For better or worse, Google search has entered a post-literal phase. In the past, one could type a phrase with a unique combination of words, and retrieve a document containing those search terms. The “Googlewhack” became a source of amusement and fascination, discovering what mysteries were hidden in the vast ocean of content. Today, Google will reinterpret your Googlewhack search and spoil any fun. So many factors influence search today that one is never sure what results will return highly when entering a search. The relationship between what a brand writes, and what a user types in a search box, has never been less clear.

This is not to imply that language doesn’t matter. It matters to people. Content professionals should be concerned about what words mean to people, instead of what they mean to search engines. According to its original meaning in corpus linguistics, keywords refer to the words a specific group of people use most frequently in their speech or writing relative to other groups. It is important to use the keywords of your audience: just don’t expect to find them from Google searches. Most people rely on a small set of words in daily conversation and writing. I have a handy dictionary on my iPhone called the Longman Keywords Dictionary that lists the 3000 most frequent words in spoken and written English. It also provides common collocations of words (words that tend to be used together). While intended for learners of English as a second language, it provides a white list of words you should be using if trying to reach a broad audience. These are the words people use and know without thinking twice. You can save more unique words for special situations or ideas where you want to bring attention to what you are discussing, and make people notice a less common word or phrase. The goal should be to focus audience attention on what’s novel and interesting, not to bludgeon them with repetition.

Don’t worry about how Google manages your content — worry about how you manage it

SEO consultants at times highlight interesting information from Google such as academic research and patent applications. Google is a clever and fascinating company, and people who use Google search are naturally curious about what the search giant is doing. But apart from a small quantity of Google-published materials, people who do not work at Google can’t possibly explain with any confidence what is happening inside an impenetrable, proprietary product. So instead we get speculation about what Google is doing, opinion surveys of consultants that rank order their opinions, and experimental tests that generally can’t be reproduced over time by different people.

While impressive, Google search is far from perfect. It will continue to evolve. Semantic search will continue to play a central role, but contextual data relating to personal behavior will probably become more prominent in future releases of Google search. Google search is a moving target: there’s little point trying to subdue it by orienting your content to suit its changing characteristics.

Rather than worry about how Google manages their content, brands should worry how they manage their content themselves. The needs of human audiences are straightforward compared with the ever-shifting priorities of Google search algorithms. Brands should focus on audience needs, and resist the distractions of fickle popularity of search rankings. They need to make a sustained effort understanding and serving the needs of core customers.

One benefit of all the chatter about semantic search is the growing awareness of semantic technologies. Many of the same technical approaches Google uses to index and evaluate content can be used by any brand for their own content operations. Such open source tools as Mallet, NLTK, Solr and elasticsearch offer amazing capabilities to improve the discoverability and distribution of content within the brand’s own content platforms. Critically, brands that make investments in their own platforms gain valuable knowledge of audiences from the data they generate, in stark contrast to the black box of Google.

SEO’s Value and Future

The primary value of SEO is promoting clean metadata. SEO consultants provide a service when they highlight the potential problems arising from lacking proper metadata. Due to the size of the SEO industry, they have become, through the twists of fate, the door-to-door sales force explaining the concept of metadata to ordinary marketers. Many organizations learn about metadata through their engagement of an SEO consultant.

Unfortunately, because SEO consultants talk selectively about metadata such as Schema.org, people who are not content professionals can erroneously assume that search engine metadata is the only metadata that matters. Most marketers mistakenly believe that Schema.org markup is useful only for search. They do not realize that it can be used in conjunction with APIs to make content available to resellers, or provide dynamic updates. Metadata can play a far larger role than supporting search. Metadata is essential to enable the effective utilization of content for many different purposes.

The future of SEO is uncertain. Google’s de-emphasis of links and keywords has rendered it largely irrelevant. It is becoming a side show to search marketing and other “in bound” marketing techniques. As a branch of marketing, the SEO industry is engulfed by the ethos of pay-to-play: to perform better than the competition requires spending more ad dollars.

For SEO consultants who are genuinely interested in the power of content quality to improve organic engagement, I hope they will apply their knowledge of metadata and analytics more broadly to the field of content strategy. Much SEO knowledge is highly transferrable, and is far more impactful when applied to all dimensions of content, not just search.

— Michael Andrews