Auditing Metadata Serialized in JSON-LD

As websites publish more metadata, publishers need ways to audit what they’ve published. This post will look at a tool called jq that can be used to audit metadata.

Metadata code is invisible to audiences. It operates behind the scenes. To find out what metadata exists entails looking the source code, squinting at a jumble of div tags, css, javascript and other stuff. Glancing at the source code is not a very efficient way to see what metadata is included with the content. Publishers need easy ways for their web teams to find out what metadata they’ve published.

This discussion will focus on metadata that’s serialized in the JSON-LD format. One nice thing about JSON-LD is that it separates the metadata from other code, making it easier to locate. For those not familiar with JSON-LD, a brief introduction. JSON-LD is the latest format for encoding web metadata, especially the widely-used vocabulary. JSON-LD is still less pervasive than microdata and RDFa, which are described within HTML elements. But JSON-LD has quickly emerged as preferred the syntax for many websites. It is more developer-friendly than HTML syntaxes, and shares a common heritage with the widely-used JSON data format.

According to statistics, around 225,000 websites are using JSON-LD. That’s about 21% of all websites globally, and is nearly 30% of English language websites. Some major sites using JSON-LD for metadata include Apple,, Ebay, LinkedIn, and Yelp.

Why Audit Metadata?

I’ve previously touched on the value of auditing metadata in my book, Metadata Basics for Web Content. For this discussion, I want to highlight a few specific benefits.

For those who work with SEO, the value of knowing what metadata exists is obvious: it influences discovery through search. But content creators will also want to know the metadata profile of their content. It can yield important insights useful for editorial planning.

Metadata provides a useful summary of the key information within published content. Reviewing metadata can provide a quick synopsis of what the content is about. At the same time, if metadata is missing, that means that machines can’t find the key information that audiences will want to know when viewing the content.

Auditing can reveal:

  • what key information is included in the content
  • if any important properties are missing that should be included

Online publishers should routinely audit their own metadata. And they may decide they’d benefit by auditing their competitor’s metadata as well. Generally, the more detailed and complete the metadata is, the more likely a publisher will be successful with their content. So seeing how well one’s own metadata compares with one’s competitors can reveal important insights into how readily audiences can access information.

How to Audit JSON-LD metadata

Metadata is code, written for machines. So how can members of web teams, whether writers or SEO specialists, get a quick sense of what metadata they have currently? Since I have mission to evangelize the benefits of metadata to all content stakeholders, including less technical ones, I’ve been looking for light-weight ways to help all kinds of people discover what metadata they have.

For metadata encoded in HTML tags, the simplest way to explore it is using XPath, a simple filter query that searches down the DOM tree to find the relevant part containing the metadata. XPath is not too hard to learn (at least for basic needs), and is available within common tools such as Google Sheets.

Unfortunately, XPath can’t be used for metadata in JSON-LD. But happily, there is an equivalent to XPath that can be used to query JSON-based metadata. It is called jq.

The first step to doing an audit is to extract the JSON-LD from the website you want to audit. It lives within the element <script type= application/ld+json></script>. Even if you need to manually extract the JSON-LD, it is easy to find in the source code (use CTR-F and search for ld+json). Be aware that there may be more than one JSON-LD metadata statement available. For example, when looking at the source code of a webpage on Apple’s website, I notice three JSON-LD script elements representing three different statements: one covering product information (Offer), one covering the company (Organization), and another covering the website structure (BreadcrumbList). Some automated tools have been known to stop harvesting JSON-LD statements after finding the first one, so make sure you get them all, especially the ones with information unique to the webpage.

Once you have collected the JSON-LD statements, you can begin to audit them to see what information they contain. Much like a content audit, you can set up a spreadsheet to track metadata for specific URLs.

Exploring JSON-LD with jq

jq is a “command line” application, which can present a hurdle for non-developers. But an online version of it exists called jq Play that is easy to use.

Although jq was designed for filtering ordinary plain JSON, it can also be used for JSON-LD. Just paste your JSON-LD statement in jq Play, and add a filter.

Let’s look at some simple filters that can identify important information in JSON-LD statements.

The first filter can tell us what properties are mentioned in the metadata. We can find that out using the “keys” filter. Type keys and you will get a list of properties at the highest level of the tree. Some of these have an @ symbol, indicating the are structural properties (for example "@context", "@id", "@type"). Don’t worry about those for now. Others will resemble words and be more understandable, for example, “contactPoint”, “logo”, “name”, “sameAs”, and “url”. These keys, from Apple’s Organization statement, tells us the kinds of information Apple includes about itself on its website.

JSON-LD statements on

Let’s suppose we have JSON-LD for an event. An event has many different kinds of entities associated with it, such as a location, the event’s name, and the performer. It would be nice to know what entities are mentioned in the metadata. All kinds of entities use a common property: name. Filtering on the name property can let us know what entities are mentioned in the metadata.

Using jq, we find out entities by using the filter ..|.name? which provides a list of names. When applied to a JSON-LD code sample from the website, we get the names associated with the Event: the name of the orchestra, the auditorium, the conductor, and the two symphonic works.

The filter was constructed using a pattern ..|.foo? (foo is a jibberish name to indicate any property you want to filter on.) JSON-LD stores information in a tree that may be deeply nested: entities can refer to other entities. The pattern lets the filtering move through the tree and keep looking for potential matches.

results from jq play when filtering by name

Finally, let’s make use of the structural information encoded with the @ symbol. Because lots of different entities have names, we also want to know the type of entity something is. Is the “Chicago Symphony” the name of a symphonic work, or the name of an orchestra? In JSON-LD, the type of entity is indicated with the @type property. We can use jq to find what types of entities are include in the metadata. To do this, the filter would be ..|."@type"? . It follows the same ..|.foo? pattern, except that structural properties that have a @ prefix need to be within quotes, because ordinary JSON doesn’t use the @ prefix and so jq doesn’t recognize it unless it’s in quotes.

When we use this filter for an Event, we learn that the statement covers the following types of entities:

  • “MusicEvent”
  • “MusicVenue”
  • “Offer”
  • “MusicGroup”
  • “Person”
  • “CreativeWork”

That one simple query reveals a lot about what is included. We can confirm that the star of the show (type Person) is included in the metadata. If not, we know to add the name of the conductor.

Explore Further

I’m unable here to go into the details of how JSON-LD and metadata statements are constructed — though I do cover these basics in my book. To use jq in an audit, you will need some basic knowledge of important entities and properties, and know how JSON-LD creates objects (the curly braces) and lists (the brackets). If you don’t know these things yet, they can be learned easily.

The patterns in jq can be sophisticated, but at times, they can be fussy to wrangle. JSON-LD statements are frequently richer and more complex than simple statements in plain JSON. If you want to extract some specific information within JSON-LD, don’t hesitate to ask a friendly developer to help you set up a filter. Once you have the pattern, you can reuse it to retrieve similar information.

JSON-LD is still fairly new. Hopefully, purpose-built tools will emerge to help with auditing JSON-LD metadata. Until then, jq provides a light weight option for exploring JSON-LD statements.

— Michael Andrews

Ranking in Content Marketing

Content marketing rests on a simple premise: Great content will attract interest from readers.  It sounds simple — but the ingredients of great content, on closer inspection, seem ineffable.  We can come up with any number of criteria necessary for great content.  But satisfying these criteria won’t necessarily result in lots of people finding your content, and using it.  It is possible to have great writing about useful topics that is promoted diligently, and still find that the content fails to generate expected interest.  Hard work alone doesn’t explain outcomes.

How then does content marketing rank highly? I’m not an SEO, so I’m not going to offer SEO advice here.  I’m using the SEO term “ranking” in a more general sense of gaining visibility based on audience expressions of interest.  It may be  ranking in SERPs, or in social media shares, or bookmarks, or another metric that indicates how people vote for what content they find most useful.  The key to the ranking question is to think about online content as a market, where there are buyers and sellers.  Unfortunately, it is not a simple market, where there is a perfect match for everyone.  Some sellers never find buyers, and some buyers never find the right seller either, and have to settle for something less than optimal.  Online content is sometimes efficient, but very often is prone to market failure.

Navigating through the Content Glut

Like many other people who work with online content, I believe we face a content glut.  There’s too much content online. Too much content is ignored by audiences.   Many organizations consider it acceptable to create content that only receives one or two hundred views.  A shocking amount of content that’s created is never viewed at all!  It would be easy to dismiss all this content as low quality content, but that would not capture the full story.  It’s more accurate to say that this content doesn’t match the needs of audiences.  Not all content needs to generate high numbers of views — if it is intended for a narrow, specific audience.  But most content that’s created has a potential audience that’s far larger than the actual audience it attracts.  It gets lost in the content glut.

To understand how audiences select content, it helps to consider content as being traded in one of two different markets.  One market involves people who all have the same opinion about what is great.  The other market involves people who have different ideas about what is great.  It’s vitally important not confuse which group you are writing for, and hoping to attract.

More formally, I will describe these two markets as a “winner-takes-all” market, and as an “auction” market. I’m borrowing (and repurposing) these terms from Cal Newport, a Georgetown computer science professor who wrote a career advice book called So Good They Can’t Ignore You. His distinction between winner-takes-all verses auction markets is very relevant to how online content is accessed, and valued, by audiences.

Winner-Takes-All Markets

When a large audience segment all want the same thing — applying the same standards —  it can create a race to determine who provides the best offering.  It gives rise of a winners-take-all market.  

Let’s illustrate the concept with a non-content example. Sport stars are a classic winner-takes-all market.  Fans like players who score exceptionally, so the player who scores most generally win the most fans.  The top players make much more money than those who are just short of being as good as them.  Fans only want so many stars.

Many content topics have a homogenous user preference profile.  Nearly everyone seeking health information wants up-to-date, accurate, comprehensive, authoritative information.  The US National Institutes of Health is the gold standard for that kind of information.  Other online publishers, such as the Mayo Clinic or WebMD, are being judged in comparison to the NIH.  They may be able to provide slightly friendlier information, or present emerging advice that isn’t yet orthodox.   But they need to have thoroughness and credibility to compete.  Lesser known sources of health information will be at a disadvantage.  Health information is a winner-takes-all market.  The best-regarded sources get the lion’s share of views.  Breaking into the field is difficult for newly established brands.  When everyone wants the same kind of information, and all the content is trying to supply the same kind of information, only the best content will preferred.  Why settle for second best?

 How do you know when a topic is a winners-takes-all market? A strong signal is when all content about the topic, no matter by whom it is published, has the same basic information, and often even sounds the same.  It is hard to be different under such circumstances, and to rank more highly than others.

Another example of a winner-takes-all market for content is SEO advice.  If you want to learn about (say) the latest changes Google announced last month, you will find hundreds of blog posts by different local SEO agencies, all of which will have the same information.  Only a few sources will rank highly, such as Moz or Search Engine Land.  The rest will be add to the content glut.

It is extremely hard to win the game of becoming the most authoritative source of information about a topic that is widely covered and has a uniformity of views.  Generally, the first-movers in such a topic gain a sustained advantage, as they develop a reputation of being the go-to source of information.  

There are a couple of tactics sellers of content use in winner-takes-all markets.  The first is to set-up a franchise, so that the publisher develops a network of contributors to increase their scale and visibility.  This is the approach used, for example, by Moz and other large SEO websites.  Contributors get some visibility, and build some reputation, but may not develop solid brand recognition.

The second tactic, advocated by some content marketing specialists, is to develop “pillar” content.  The goal of this tactic is to build up a massive amount of content about a topic, so that no one else has the opportunity to say something that you haven’t already addressed.  You can think of this approach as a “build your own Wikipedia”.  Some proponents advocate articles of 5000 words or more, cross-linked to other related articles.  It’s an expensive tactic to pursue, with no guarantees.  In certain cases, pillar content might work, for a topic that is not well covered currently, and for which there is a strong demand for extremely detailed information.  But otherwise, it can be a folly.  Pillar content tactics can trigger an arms race of trying to out-publish competitors with more and longer content.  In the race to become authoritative, the publisher can lose sight of what audiences want.  Do they really want 5000 word encyclopedic articles?  Generally they don’t.

Winner-takes-all applies to a competitive (non-captive) market.  If you have a captive audience (like your email list) you can be more successful with generic topics. But you will still be competing with the leaders.

Don’t forget: the characteristic of winner-takes-all markets is that there are few winners, and many losers.  Make sure you aren’t competing with a topic you are unprepared to win with.

Auction Markets

The defining characteristic of an auction market is that different people price stuff differently.  There’s no single definition of what the best is.  People value content differently, according to what they perceive as what’s unique or special about it.

A non-content example of an auction market is hiring an interior decorator.  It’s a very diverse market: decorators serve different segments of buyers (rich, budget, urban, suburban,…), and within segments people have widely different tastes (eclectic, mid-century modern, cutting edge, traditional…).  Different decorators are “right” for different people.  But that doesn’t mean there’s no competition.  Far more decorators want to design interiors that could be featured in Architectural Digest than there are clients looking to hire such decorators.  There’s an overabundance of decorators who favor the white sofa look that gets featured in Architectural Digest.  And budget buyers may have trouble finding a budget decorator who has sophisticated taste and who can hire affordable and reliable contractors.  It’s hard to get the niche right, where buyers want what you can offer.  

 The value that audiences assign to content in auctions depends on the parameters they most care about. A broad topic that has wide interest can potentially be discussed in different ways, by tailoring topic so that it is targeted at segment, offering a unique point of view (POV), or by accommodating a specific level of prior knowledge about the topic. 

Many areas of content marketing are auction markets.  Some consumers are enthusiastic about learning the details of  products;  others are reluctant buyers worried about costs or reliability.   For example, home repair is a topic of fascination for a handyman. It’s a chore and headache for an exasperated homeowner dealing with an emergency.  

Auction markets rank on the basis of differentiation.  Brands make an appeal: We are right for you! Others are wrong for you! And by extension: We are wrong for people who aren’t like you!  Brands aim for what  could be called the audience-content-brand fit.  The moment a brand tries to become a multi-audience pleaser, it risks losing relevance.  It is then playing the winner-takes-all strategy.

Audience segments most value content that addresses specific needs that seem unique, and is not offered by others.  This places a premium on differentiation.  Segmentation is based on approach.  How content addresses a topic will mirror how audience segments coalesce around themes, interests or motivations.

Many marketers have trouble addressing fuzzy segments.  Groups of people may be drawn to a combination of overlapping interests, be looking for fresh points of view, and have varying levels of knowledge.   Such segments are fiendishly hard to define quantitatively.  How many people fit in each box?  It can be more productive to define the box as a idea to test, rather than as a fixed number.  Auctions discover segments; they don’t impose them.  People vote their priorities in auctions.  One can’t know what people want in an auction before it happens.  By their nature, auctions are meant to surprise us.  

Auctions are fluid.  People’s interests shift.  Their knowledge may grow, or their willingness to learn may lessen.  It is even possible for an auction market to morph into a winner-takes-all market.  Today’s hottest debates can turn into tomorrow’s best practice orthodoxy. 

Matching the Content to the Audience

Ranking is fundamentally about being relevant.  Brands must offer content that is relevant.  Yet in the end, it is audiences who judge the relevance.

Marketers will find their content lost in the content glut if they fail to understand whether the audience segment they want to reach wants content that’s unique in some way, or wants the kind of content that everyone agrees is the best available.  

Brands should aim for share of mind, not just raw numbers.  Many marketers start by thinking about hypothetical reach.  They imagine all the people who, in theory, might be interested in the topic abstractly, and then work to improve their yield.  They create content they think a million people might want to read, without testing whether such an assumption is realistic.  They then try to improve on the minuscule portion of people viewing the content. That approach rarely builds a sustained audience. 

 It’s better to garner 20% of a potential audience of 1000 people, than 1% of a segment of 20,000 people, even if the raw numbers are the same (200 views).   A well-defined segment is essential to figure out how to improve what you offer them.  If everyone want exactly the same thing, then knowing what people want is that much easier.  But being the best when delivering to them is that much harder,

— Michael Andrews

Predicting Content Attention and Behavior

Audiences, at times, seem inscrutable. We want to know how audiences will respond to content, but audiences don’t behave consistently.  Sometimes they skim content; sometimes they read it closely.  Even if we wish it were otherwise, we can’t escape the messy reality that audiences in many respects don’t behave in a single consistent way.  But do they behave predictably?  Can we predict want kind of content will engage online audiences, if we could account for known variables? To date, progress untangling this problem has been limited.  But we have reason to be optimistic it won’t always be this way.  A more data-centric approach to content strategy could help us understand what variables influence audience behavior.  

The biggest weakness in content strategy today is that it lacks predictive explanatory power.   Whenever someone advances a proposition about what audiences want or will do, it is easy to find counter examples of when that doesn’t hold.  Nearly all categorical assertions about what people want from content don’t survive even minimal scrutiny.  Do audiences want more or less content?  Do they want simple or detailed explanations?  Do they want to engage with the content, or get the content in the most digestible form possible? Such binary questions seem reasonable to ask, and call for reasonable  answers in return.  But such questions too often prompt simplistic answers that promise a false sense of certainty.  Content behavior is complex — just like human behavior in general.  Yet that doesn’t mean it is not possible to learn some deeper truths — truths that may not be complete and exhaustive, but are nonetheless accurate and robust.  What we need is better data that can explain complexity.  

To provide predictive explanatory power, content strategy guidelines should be based on empirical data that can be reproduced by others.  Guidelines should be based on data that covers a breadth of situations, and has a depth of description.  That’s why I was so excited to read the new study presented last week at 2018 World Wide Web Conference by Nir Grinberg of Northeastern University, entitled “Identifying Modes of User Engagement with Online News and Their Relationship to Information Gain in Text.”  The research provides a rare large scale empirical analysis of content, which reveals many hidden dimensions that will be useful to apply and build on.  I encourage you to read the study, though I caution that the study at times can be dense, filled with academic and statistical terminology.  I will summarize some of its highlights, and how they can be useful to content strategy practitioners.  

Grinbert’s study looked at “a large, client-side log dataset of over 7.7 million page views (including both mobile and non-mobile devices) of 66,821 news articles from seven popular news publishers.”   By looking at content on such a large scale (nearly 8 million page views), we can transcend the quirks of the content we deal with in our own projects.  We want  to understand if the features of our content are typical of content generally, or are characteristics that apply to only some kinds of content.

The study focused content from news websites that specialize in different topics.  It does not represent the full spectrum of content that professionals in content strategy address, but it does cover a range of genres than are commonly discussed.  The study covered seven distinct genres:

  • Financial news
  • Technology
  • How To
  • Science
  • Women
  • Sports
  • Magazine features

Grinbert was motivated by a desire to improve the value of content.  “Post-hoc examination of the extent to which readers engaged with articles can enable editors to better understand their audience interests, and inform both the coverage and writing style of future articles.”

Why do analytics matter? Content that audiences use is content that audiences value.  The question is how to measure the audience use of content, after they click on a link.  Page views are not a meaningful metric, since many views “bounce”.  Other metrics draw controversy.  Is a long time on a page desirable or not desirable?  With simple metrics, the metric can become hostage to one’s own ideological worldview about what’s best for users, instead being a resource to learn what users are really trying to accomplish.

First, how can we measure attention?  The study considered six metrics available in analytics relating to attention:

  1. Depth — how far scrolled in an article, a proxy for how much of the content was viewed or read
  2. Dwell time — total user time on a page (good for non-reading engagement such as watching a video)
  3. Engagement — how much interaction happens on a page (for example, cursor movements, highlighting)
  4. Relative depth — how much of an article was visible on a user’s screen
  5. Speed — speed of scrolling,  a proxy of how quickly the readers “read” the content
  6. Normalized engagement — engagement relative to article length

The metrics that are “relative” and “normalized” attempt to control for differences between the absolute values of shorter and longer content.  

Next, what might these metrics say about audience behavior?  Through a cluster analysis, the study found these indicators interact to form five content engagement patterns:

  • Shallow  (not getting far in an article)
  • Idle (short period of activity followed by period of inactivity, followed by more activity)
  • Scan (skimming an article quickly)
  • Read (reading the article for comprehension)
  • Long read (engaging with supplementary materials such as comments)

So how do specific behaviors relate to engagement patterns?  The study showed that the indictors were associated with specific engagement patterns.

Depth  (ranked from low to high depth of scrolling)

  1. Shallow
  2. Idle
  3. Scan
  4. Read
  5. Long read

Dwell time (ranked from short to long dwell time)

  1. Scan
  2. Read 
  3. Long read 
  4. Ide
  5. Shallow

Engagement (ranked to low to high engagement)

  1. Shallow
  2. Scan
  3. Idle
  4. Read
  5. Long read

Relative depth (ranked for short to long relative depth)

  1. Shallow
  2. Idle
  3. Scan
  4. Read
  5. Long read

Speed (ranked from fast to slow)

  1. Scan
  2. Read
  3. Long read
  4. Idle
  5. Shallow

Normalized engagement (ranked from low to high)

  1. Shallow
  2. Idle
  3. Scan
  4. Read
  5. Long read

So what does this mean for different kinds of content? “We found substantially more scanning in Sports, more idling in “How To”, and more extensive reading for long-form magazine content.”  That may not sound like a profound conclusion, but it feels valid, and it’s backed by real world data. This gives us markers to plan with.  We have patterns to compare. Is your content more like sports, a how-to, or a feature?   

For sports, readers scan, often just check scores or other highlights, rather than read the full text.  They are looking for some specific information, rather than a complete explanation.  Sports is a genre is closely associated with scanning.  When sports is presented in long form, such as done on the now defunct Grantland website, it only appeals to a niche.  ESPN found Grantland unprofitable.  Grantland violated the expectations of the genre.  

Magazines were most likely to be read shallowly, where only the first few sentences are read, as well as the most likely to be read thoroughly, where even comments are read.  This shows that the reader makes investment decision about whether the content looks sufficiently interesting to read in depth.  They may leave a tab open, hoping to get back to the article, but never doing so.  But sometimes, a preview summary such as an abstract can provide sufficient detail for most people, and only some will want to read the entire text.   

The study found  a “relatively high percent of Idle engagements in How To articles. The few articles we examined from this site gave instructions for fixing, making, or doing something in the physical world. It is therefore plausible that people disengage from their digital devices to follow instructions in the physical world.”  

How the Study Advances our Practice

The study considers how reading characteristics converge into common reading patterns, and how different genres are related to distinct reading patterns.  

The study brings more a sophisticated use of metrics to infer content attention.  It shows how features of content influence attention and behavior.  For example “total dwell time on a page is associated with longer articles, having more images and videos.”   Not all content is text.  How to measure use of video or images, or exploring data, are important considerations.

We have concrete parameters to define engagement patterns.  We may casually talk about skimming, but what does that mean exactly?  Once we define it and have a way to measure it, we can test whether content is skim-able, and compare it to less skim-able content.  

Solid detailed data helps us separate what is happening from why it may be happening.  Slow reading speed is necessarily not an indication that they material is difficult to read.  Fast reading speed doesn’t necessarily indicate the topic is boring. Readers may be involved with other activities.  They may have different knowledge already that allows them to skim.  Instead of debating what is happening, we can focus on the more interesting topic of why it might be happening, and how to address it.   And with benchmark data, teams can test alternative content designs and see how the performance changes.  

How Content Strategy can build on the study

The study shows that more robust analytics can allow us to directly compare utilization characteristics of content from different sources, and compare the utilization characteristics of different genres and formats of content.  Standardized data allows for comparisons.

The study suggests more sophisticated ways to measure attention, and suggests that attention patterns can depend on the genre of content.  It also identified six content behaviors that could be useful for classifying content utilization.  These elements could contribute to a more rigorous approach to using analytics to assess audience content needs.

A framework using detail metrics and patterns can help use baseline what’s actually happening, and compare it with what might be desirable.

For example, what kinds of content elicit shallow engagement?  Is shallow engagement ever a good, or at least an opportunity?  Perhaps people start then abandon an article because it is the wrong time for them to view it. Maybe they’d benefit from a “save for later” feature.  Or alternatively, maybe the topic is valuable, but the content is uninviting, which grinds the engagement to a halt.  With more a sophisticated ability to describe content behavior, we can consider alternative explanations and scenarios.  

The study also opens up the issue of whether content should conform to typical behavior, or whether content should try to encourage a more efficient behavior.  If How To content involves idle periods, should the content be designed so that people can start and stop reading it easily?  Or should the content be designed so that the viewer knows everything they need to do before they begin (perhaps by watching a video that drills how to do the critical steps), so they can complete the task without interruption?   I’m sure many people already have opinions about this issue.   More precise analytics can allow those opinions to become testable hypotheses.

The big opportunity is the ability to compare data between content professionals, something that’s not possible with qualitative feedback.    Today, we have conferences where different people present case studies.  But it is hard to compare the learnings of these case studies because there are no common metrics to compare.  Cases studies can also be hard to generalize, because they tend to focus on process rather than focus on common features of content.  Two people can follow the same process, but have different outcomes, if the features of their content are different.  

Like the field of usability, content strategy has the opportunity to build a set of evidence-based best practices.  For example, how does having a summary paragraph at the start of an article influence whether the entire article is read?  Different content professionals, looking at different content topics, could each test such a question and compare their results.  That could lead to evidence-backed advice concerning how audiences will likely react to a summary.  

The first step toward realizing this vision is having standard protocols for common analytics tools like Google Analytics, so that different website data are comparable.  It’s a fascinating opportunity for someone in the content strategy community to move forward with.  I’m too deep in the field of metadata to be able to work on it myself, but I hope others will become interested in development of a common analytics framework .

— Michael Andrews