Categories
Content Effectiveness

Predicting Content Attention and Behavior

Audiences, at times, seem inscrutable. We want to know how audiences will respond to content, but audiences don’t behave consistently.  Sometimes they skim content; sometimes they read it closely.  Even if we wish it were otherwise, we can’t escape the messy reality that audiences in many respects don’t behave in a single consistent way.  But do they behave predictably?  Can we predict want kind of content will engage online audiences, if we could account for known variables? To date, progress untangling this problem has been limited.  But we have reason to be optimistic it won’t always be this way.  A more data-centric approach to content strategy could help us understand what variables influence audience behavior.  

The biggest weakness in content strategy today is that it lacks predictive explanatory power.   Whenever someone advances a proposition about what audiences want or will do, it is easy to find counter examples of when that doesn’t hold.  Nearly all categorical assertions about what people want from content don’t survive even minimal scrutiny.  Do audiences want more or less content?  Do they want simple or detailed explanations?  Do they want to engage with the content, or get the content in the most digestible form possible? Such binary questions seem reasonable to ask, and call for reasonable  answers in return.  But such questions too often prompt simplistic answers that promise a false sense of certainty.  Content behavior is complex — just like human behavior in general.  Yet that doesn’t mean it is not possible to learn some deeper truths — truths that may not be complete and exhaustive, but are nonetheless accurate and robust.  What we need is better data that can explain complexity.  

To provide predictive explanatory power, content strategy guidelines should be based on empirical data that can be reproduced by others.  Guidelines should be based on data that covers a breadth of situations, and has a depth of description.  That’s why I was so excited to read the new study presented last week at 2018 World Wide Web Conference by Nir Grinberg of Northeastern University, entitled “Identifying Modes of User Engagement with Online News and Their Relationship to Information Gain in Text.”  The research provides a rare large scale empirical analysis of content, which reveals many hidden dimensions that will be useful to apply and build on.  I encourage you to read the study, though I caution that the study at times can be dense, filled with academic and statistical terminology.  I will summarize some of its highlights, and how they can be useful to content strategy practitioners.  

Grinbert’s study looked at “a large, client-side log dataset of over 7.7 million page views (including both mobile and non-mobile devices) of 66,821 news articles from seven popular news publishers.”   By looking at content on such a large scale (nearly 8 million page views), we can transcend the quirks of the content we deal with in our own projects.  We want  to understand if the features of our content are typical of content generally, or are characteristics that apply to only some kinds of content.

The study focused content from news websites that specialize in different topics.  It does not represent the full spectrum of content that professionals in content strategy address, but it does cover a range of genres than are commonly discussed.  The study covered seven distinct genres:

  • Financial news
  • Technology
  • How To
  • Science
  • Women
  • Sports
  • Magazine features

Grinbert was motivated by a desire to improve the value of content.  “Post-hoc examination of the extent to which readers engaged with articles can enable editors to better understand their audience interests, and inform both the coverage and writing style of future articles.”

Why do analytics matter? Content that audiences use is content that audiences value.  The question is how to measure the audience use of content, after they click on a link.  Page views are not a meaningful metric, since many views “bounce”.  Other metrics draw controversy.  Is a long time on a page desirable or not desirable?  With simple metrics, the metric can become hostage to one’s own ideological worldview about what’s best for users, instead being a resource to learn what users are really trying to accomplish.

First, how can we measure attention?  The study considered six metrics available in analytics relating to attention:

  1. Depth — how far scrolled in an article, a proxy for how much of the content was viewed or read
  2. Dwell time — total user time on a page (good for non-reading engagement such as watching a video)
  3. Engagement — how much interaction happens on a page (for example, cursor movements, highlighting)
  4. Relative depth — how much of an article was visible on a user’s screen
  5. Speed — speed of scrolling,  a proxy of how quickly the readers “read” the content
  6. Normalized engagement — engagement relative to article length

The metrics that are “relative” and “normalized” attempt to control for differences between the absolute values of shorter and longer content.  

Next, what might these metrics say about audience behavior?  Through a cluster analysis, the study found these indicators interact to form five content engagement patterns:

  • Shallow  (not getting far in an article)
  • Idle (short period of activity followed by period of inactivity, followed by more activity)
  • Scan (skimming an article quickly)
  • Read (reading the article for comprehension)
  • Long read (engaging with supplementary materials such as comments)

So how do specific behaviors relate to engagement patterns?  The study showed that the indictors were associated with specific engagement patterns.

Depth  (ranked from low to high depth of scrolling)

  1. Shallow
  2. Idle
  3. Scan
  4. Read
  5. Long read

Dwell time (ranked from short to long dwell time)

  1. Scan
  2. Read 
  3. Long read 
  4. Ide
  5. Shallow

Engagement (ranked to low to high engagement)

  1. Shallow
  2. Scan
  3. Idle
  4. Read
  5. Long read

Relative depth (ranked for short to long relative depth)

  1. Shallow
  2. Idle
  3. Scan
  4. Read
  5. Long read

Speed (ranked from fast to slow)

  1. Scan
  2. Read
  3. Long read
  4. Idle
  5. Shallow

Normalized engagement (ranked from low to high)

  1. Shallow
  2. Idle
  3. Scan
  4. Read
  5. Long read

So what does this mean for different kinds of content? “We found substantially more scanning in Sports, more idling in “How To”, and more extensive reading for long-form magazine content.”  That may not sound like a profound conclusion, but it feels valid, and it’s backed by real world data. This gives us markers to plan with.  We have patterns to compare. Is your content more like sports, a how-to, or a feature?   

For sports, readers scan, often just check scores or other highlights, rather than read the full text.  They are looking for some specific information, rather than a complete explanation.  Sports is a genre is closely associated with scanning.  When sports is presented in long form, such as done on the now defunct Grantland website, it only appeals to a niche.  ESPN found Grantland unprofitable.  Grantland violated the expectations of the genre.  

Magazines were most likely to be read shallowly, where only the first few sentences are read, as well as the most likely to be read thoroughly, where even comments are read.  This shows that the reader makes investment decision about whether the content looks sufficiently interesting to read in depth.  They may leave a tab open, hoping to get back to the article, but never doing so.  But sometimes, a preview summary such as an abstract can provide sufficient detail for most people, and only some will want to read the entire text.   

The study found  a “relatively high percent of Idle engagements in How To articles. The few articles we examined from this site gave instructions for fixing, making, or doing something in the physical world. It is therefore plausible that people disengage from their digital devices to follow instructions in the physical world.”  

How the Study Advances our Practice

The study considers how reading characteristics converge into common reading patterns, and how different genres are related to distinct reading patterns.  

The study brings more a sophisticated use of metrics to infer content attention.  It shows how features of content influence attention and behavior.  For example “total dwell time on a page is associated with longer articles, having more images and videos.”   Not all content is text.  How to measure use of video or images, or exploring data, are important considerations.

We have concrete parameters to define engagement patterns.  We may casually talk about skimming, but what does that mean exactly?  Once we define it and have a way to measure it, we can test whether content is skim-able, and compare it to less skim-able content.  

Solid detailed data helps us separate what is happening from why it may be happening.  Slow reading speed is necessarily not an indication that they material is difficult to read.  Fast reading speed doesn’t necessarily indicate the topic is boring. Readers may be involved with other activities.  They may have different knowledge already that allows them to skim.  Instead of debating what is happening, we can focus on the more interesting topic of why it might be happening, and how to address it.   And with benchmark data, teams can test alternative content designs and see how the performance changes.  

How Content Strategy can build on the study

The study shows that more robust analytics can allow us to directly compare utilization characteristics of content from different sources, and compare the utilization characteristics of different genres and formats of content.  Standardized data allows for comparisons.

The study suggests more sophisticated ways to measure attention, and suggests that attention patterns can depend on the genre of content.  It also identified six content behaviors that could be useful for classifying content utilization.  These elements could contribute to a more rigorous approach to using analytics to assess audience content needs.

A framework using detail metrics and patterns can help use baseline what’s actually happening, and compare it with what might be desirable.

For example, what kinds of content elicit shallow engagement?  Is shallow engagement ever a good, or at least an opportunity?  Perhaps people start then abandon an article because it is the wrong time for them to view it. Maybe they’d benefit from a “save for later” feature.  Or alternatively, maybe the topic is valuable, but the content is uninviting, which grinds the engagement to a halt.  With more a sophisticated ability to describe content behavior, we can consider alternative explanations and scenarios.  

The study also opens up the issue of whether content should conform to typical behavior, or whether content should try to encourage a more efficient behavior.  If How To content involves idle periods, should the content be designed so that people can start and stop reading it easily?  Or should the content be designed so that the viewer knows everything they need to do before they begin (perhaps by watching a video that drills how to do the critical steps), so they can complete the task without interruption?   I’m sure many people already have opinions about this issue.   More precise analytics can allow those opinions to become testable hypotheses.

The big opportunity is the ability to compare data between content professionals, something that’s not possible with qualitative feedback.    Today, we have conferences where different people present case studies.  But it is hard to compare the learnings of these case studies because there are no common metrics to compare.  Cases studies can also be hard to generalize, because they tend to focus on process rather than focus on common features of content.  Two people can follow the same process, but have different outcomes, if the features of their content are different.  

Like the field of usability, content strategy has the opportunity to build a set of evidence-based best practices.  For example, how does having a summary paragraph at the start of an article influence whether the entire article is read?  Different content professionals, looking at different content topics, could each test such a question and compare their results.  That could lead to evidence-backed advice concerning how audiences will likely react to a summary.  

The first step toward realizing this vision is having standard protocols for common analytics tools like Google Analytics, so that different website data are comparable.  It’s a fascinating opportunity for someone in the content strategy community to move forward with.  I’m too deep in the field of metadata to be able to work on it myself, but I hope others will become interested in development of a common analytics framework .

— Michael Andrews

Categories
Content Efficiency

Content & Decisions: A Unified Framework

Many organizations face a chasm between what they say they want to do, and what they are doing in practice.  Many say they want to transition toward digital strategy.  In practice, most still rely on measuring the performance of individual web pages, using the same basic approach that’s been around for donkey’s years. They have trouble linking the performance of their digital operations to their high level goals. They are missing a unified framework that would let them evaluate the relationship between content and decisions.

Why is a Unified Framework important?

Organizations, when tracking how successful they are doing, tend to focus on web pages: abandonment rates, clicks, conversions, email opening rates, likes, views, and so on. Such granular measurements don’t reveal the bigger picture of how content is performing within the publishing organization. Even multi-page measurements such as funnels are little more than an arbitrary linking of discrete web pages.

Tracking the performance of specific web pages is necessary, but not sufficient. But because each page is potentially unique, summary metrics of different pages don’t explain variations in performance.   Page-level metrics tell how specific pages perform, but they don’t address important variables that transcend different pages, such as which content themes are popular, or which design features are being adopted.

Explaining how content fits into digital business strategy is a bit like trying to describe an elephant without being able to see the entire animal. Various people within an organization focus on different digital metrics. How all these metrics interact gets murky.  Operational staff commonly track lower level variables about specific elements or items. Executives track metrics that represent higher level activities and events, which have resource and revenue implications that don’t correspond to specific web pages.

Metadata can play an important role connecting information about various activities and events, and transcend the limitations of page-level metrics.  But first, organizations need a unified framework to see the bigger picture of how their digital strategy relates to their customers.

Layers of Activities and Decisions

To reveal how content relates to other decisions, we need to examine content at different layers. Think of these layers as a stack. One layer consists of the organization publishing content.  Another layer comprises the customers of the organization, the users of the organization’s content and products.  At the center is the digital interface, where organizations interact with their users.

We also need to identify how content interacts with other kinds of decisions within each layer.  Content always plays a supporting role.  The challenge is to measure how good a job it is doing supporting the goals of various actors.

Diagram showing relationships between organizations, their digital assets, and users/customers, and the interaction between content and platforms..

First let’s consider what’s happening within the organization that is publishing content.  The organization makes business decisions that define what the business sells to its customers, and how it services its customers.  Content needs to support these decisions.  The content strategy needs to support the business strategy.  As a practical matter, this means that the overall publishing activity (initiatives, goals, resources) needs to reflect the important business decisions that executives have made about what to emphasize and accomplish.  For example, publishing activity would reflect marketing priorities, or branding goals.  Conversely, an outsider could view the totality of an organization’s content, by viewing their website, and should get a sense of what’s important to that organization.  Publishing activity reveals an organization’s brand and priorities.

The middle layer is composed of assets that the organization has created for their customers to use.  This layer has two sides: the stock of content that’s available, and digital platforms customers access.  The stock of content reflects the organization’s publishing activity .  The digital platforms reflect the organization’s business decisions.  Digital platforms are increasingly an extension of the products and services the organization offers.  Customers need to access the digital platforms to buy the product or service, to use the product or service, and to resolve any problems after purchase.  Content provides the communications that customers need to access the platform.  Because of this relationship, the creation of content assets and the designs for digital platforms are commonly coordinated during their implementation.

Within the user layer, the customer accesses content and platforms.  They choose what content to view, and make decisions about how to buy, use, and maintain various products and services.  The relationship between content activity and user decisions is vital, and will be discussed shortly.  But its importance should not overshadow the influence of the other layers.  The user layer should not be considered in isolation from other decisions and activities that an organization has made.

Feedback loops Between and Within Layers

Let’s consider how the layers interact.  Each layer has a content dimension, and a platform dimension, at opposite ends.  Content dimensions interact with each other within feedback loops, as do platform dimensions.  The content and platform dimensions ultimately directly interact with each other in a feedback loop within the user layer.

On the content side, the first feedback loop, the publishing operations loop, relates to how publishing activity affects the stock of content.  The organization decides the broad direction of its publishing. For many organizations, this direction is notional, but more sophisticated organizations will use structured planning to align their stock of content with the comprehensive goals they’ve set for the content overall.  This planning involves not only the creation of new content, but the revision of the existing stock of content to reflect changes in branding, marketing, or service themes.   The stock of content evolves as the direction of overall publishing activity changes.  At the same time, the stock of content reflects back on the orientation of publishing activity.  Some content is created or adjusted outside of a formal plan.  Such organic changes may be triggered in response to signals indicating how customers are using existing content. Publishers can compare their plans, goals, and activities, with the inventory of content that’s available.

The second content feedback loop, the content utilization loop, concerns how audiences are using content.  Given a stock of content available, publishers must decide what content to prioritize.  They make choices concerning how to promote content (such as where to position links to items), and how to deliver content (such as which platforms to make available for customers to access information).  At the same time, audiences are making their own choices about what content to consume.  These choices collectively suggest preferences of certain kinds of content that are available within the stock of content.

When organizations consider the interaction between the two loops of feedback, they can see the connection between overall publishing activity, and content usage activity.  Is the content the organization wants to publish the content that audiences want to view?

Two feedback loops are at work on the platform side as well.  The first, the business operations loop, concerns how organizations define and measure goals for their digital platforms.  Product managers will have specific goals, reflecting larger business priorities, and these goals get embodied in digital platforms for customers to access.  Product metrics on how customers access the platform provide feedback for adjusting goals, and inform the architectural design of platforms to realize those goals.

The second platform loop, the design optimization loop, concerns how the details of platform designs are adjusted.  For example, designs may be composed of different reusable web components, which could be tied to specific business goals.  Design might, as an example, feature a chatbot that provides a cost savings or new revenue opportunity. The design optimization loop might look at how to improve the utilization of that chatbot functionality.  How users adopt that functionality will influence the optimization (iterative evolution) of its design. The architectural decision to introduce a chatbot, in contrast, would have happened within the business operations loop.

As with the content side, the two feedback loops on the platform side can be linked, so that the relationship between business decisions and user decisions is clearer.  User decisions may prompt minor changes within the design optimization loop, or if significant, potentially larger changes within the business operations loop.  Like content, a digital platform is an asset that requires continual refinement to satisfy both user and business goals.

The two parallel sides, content and design, meet at the user layer.  User decisions are shaped both by the design of the platforms they are accessing, as well as content they are consuming while on the platform.  Users need to know what they can do, and want to do it.  Designs need to support users access to content they need when making a decision. That content needs to provide users with the knowledge and confidence for their decision.

The relationship between content and design can sometimes seem obvious when looking at a web page.  But in cases where content and design don’t support each other, web pages aren’t necessarily the right structure to fix problems.  User experiences can span time and devices.  Some pages will be more about content, and other pages more about functionality. Relevant content and functionality won’t always appear together.  Both content and designs are frequently composed from reusable components.  Many web pages may suffer from common problems stemming from faulty components, or the wrong mix of components. The assets (content and functionality) available to customers may be determined by upstream decisions that can’t be fixed on a page level. Organizations need ways to understand larger patterns of user behavior, to see how content and designs support each other, or fail to.

Better Feedback

Content and design interact across many layers of activities and decisions. Organizations must first decide what digital assets to create and offer customers, and then must refine these so that they work well for users.  Organizations need more precise and comprehensible feedback on how their customers access information and services.  The content and designs that customers access are often composed from reusable components that appear in different contexts. In such cases, page-level metrics are not sufficient to provide situational insights.  Organizations need usage feedback that can be considered at the strategic layer.  They need the ability to evaluate global patterns of use to identify broad areas to change.

In a future post, I will draw on this framework to return to the topic of how descriptive, structural, technical and administrative metadata can help organizations develop deeper insights into the performance of both their content and their designs.  If you are not already familiar with these types of metadata, I invite you to learn about them in my recent book, Metadata Basics for Web Content, available on Amazon.

— Michael Andrews