Categories
Content Effectiveness

Don’t build your personalization on data exhaust

A lot of content that looks like it’s just for you, isn’t just for you.  You are instead seeing content for a category segment in which you have been placed.  Such targeting is a useful and effective approach for marketers, but it shouldn’t be confused with personalization.   The choice of what people see rests entirely with the content provider.

When providers both rely on exclusively their own judgments, and base those judgments on how they read the behaviors of groups of people, they are prone to error.  Despite sophisticated statistical techniques and truly formidable computational powers, content algorithms can appear to individuals as clueless and unconcerned.  To understand why the status quo is not good enough, we first need to understand the limitations of current approaches based on web usage mining.

Targeting predefined outcomes

Increasingly, different people see different views of content.   Backend systems use rules to make decisions concerning what to present to offer such variation.  The goal is a simple one: to increase the likelihood that content presented will be clicked on.  It is assumed that if the content is clicked on, everyone is happy.  But depending on the nature of the content, the provider may be more happy — get more benefit —  than the viewer by the act of clicking, and as a consequence present content with only a minor chance of being clicked.

A business user who is viewing a vendor sales website may see specific content, based on the vendor’s ability to recognize the user’s IP address.  The vendor could decide to present content about how the business user’s competitor is using the vendor’s product.  The targeted user is in a segment: a sales prospect in a certain industry.  Such a content presentation reflects the targeting of a type of customer based on their characteristics.  It may or may not be relevant to the viewer coming to the site (the viewer may be looking for something else, and does not care about what’s being presented).  The content presentation does not reflect any declared preference by the site visitor.  Indeed, officially, the site visitor is anonymous, and it is only through the IP address combined with database information from a product such as Demandbase that the inference of who is visiting is made.  This is a fairly common situation: guessing who is looking for content, and then guessing what they want, or at least, what they might be willing to notice.

Targeted ads are often described as personalized, but a targeted ad is simply a content variation that is presented when the viewer matches certain characteristics.  Even when the ad you see tested better with others in a segment of people who are like you, the ad you see is merely optimized (the option that scored highest) not personalized, reflecting your preferences.   In many respects it is silly to talk about advertising as personalized, since it is rare for individuals to state advertising preferences.

The behavioral mechanisms behind content targeting resemble in many respects other content ranking and filtering techniques used for prioritizing search results and making recommendations.  These techniques, whether they involve user-user collaborative filtering, or page-ranking, aim to prioritize the content based on other people’s use of the content. They employ web usage mining to guess what will get most clicked.

What analytics measure

It is important to bear in mind that analytics measure actions that matter to brands, and not actions that matter to individuals.  The analytics discipline tends to provide the most generous interpretation of a behavior to match the story the brand wants to hear, rather than the story the audience member experiences.  Take the widely embraced premise that every click is an expression of interest.  Many people may click on a link, but quickly abandon the page they are taken to.  The brand will think: they are really interested in what we have, but the copy was bad so they left, so we need to improve the copy.  The audience may think: that was a misleading link title and the brand wasted my time; it needs to be more honest.  The link was clicked, but we can’t be sure of the intent of the clicking, so we don’t know what the interest was.

Even brands that practice self awareness are susceptible to misreading analytics.  The signals analyzed are by-products of activity, but the individual’s mind is a black box.  More extensive tracking and data won’t reliably deliver to individuals what they seek when individual preferences are ignored.

Why behavioral modeling can be tenuous

There are several important limitations of behavioral data.  The behavioral data can be thin, misleading, flattened, or noisy.

Thin data

One of the major weaknesses of behavioral data is when there isn’t sufficient data on which to base content prioritization or recommendations.  Digital platforms are supposed to enable access to the “long tail” of content, the millions of items that physical media couldn’t cope with.  But discovery of that content is a problem unsolved by behavioral data, since most of it has little or no history of activity by people similar to any one individual.  If only 20 per cent of content accounts for 80 per cent of activity, then 80 per cent of content has little activity on which to base recommendations.  It may nonetheless be of interest to individuals. Significantly, the content that is most likely to matter to an individual may be what is most unique to them, since special interests strongly define the identity of the individual.  But what matters most to an individual can be precisely what matters least to the crowd overall.  Content providers try to compensate for thin data by aggregating categories and segments at even higher levels, but the results are often widely off the mark.

Misleading signals

Even when there is sufficient data, it can be misleading.  The analytics discipline confuses matters by equating traffic volume with “popularity.”  Content that is most consumed is not necessarily most popular, if we take popularity to mean liked rather than used.  A simple scroll through YouTube confirms this.  Some widely viewed videos draw strong negative comments due to their controversy.  Other may get a respectable number of views but little reaction from likes or dislikes.  And sometimes a highly personal video, say a clip of someone’s wedding, will appeal to only a small segment but will get an enthusiastic response from its viewers.

Analytics professionals may automatically assume that content that is not consumed is not liked, but that isn’t necessarily true.  Behavioral data can tell us nothing about whether someone will like content when a backend system has no knowledge of it having been consumed previously.  We don’t know their interests, only their behavior.

Past behavior does not always indicate current intent.  Log into Google and search intensively about a topic, and you may find Google wants to keep offering content results you no longer want, because it prioritizes items similar to ones you have viewed previously.  The person’s interests and goals have evolved faster than the algorithm’s ability to adapt to those changes.

Perversely, sometimes people consume content they are not satisfied with because they’ve been unable to find anything better.  The data signal assumes they are happy with it, but they may in fact be wanting something more specific.  This problem will be more acute as content consumption becomes increasingly driven by automatic feeds.

Flattened data

People get “averaged” when they are lumped into segment categories.  Their profile is flattened in the process — the data is mixed with other people’s data to the point that it doesn’t reflect the individual’s interests.  Not only can their individual interests be lost, but spurious interests can be presumed of them.

Whether segmentation is demographic or behavioral, individuals are grouped into segments that share characteristics.  Sometimes people with shared characteristics will be more likely to share common interests and content preferences.   But there is plenty of room to make mistaken assumptions.  That luxury car owners over-index on interest in golf does not translate into a solid recommendation for an individual.  Some advertisers have explored the relationship between music tastes and other preferences.  For example, country music lovers have a stronger than average tendency to be Republican voters in the United States.  But it can be very dangerous for a brand to present potentially loaded assumptions to individuals when there’s a reasonable chance it’s wrong.

Even people who exhibit the same content behaviors may have different priorities.  Many people check the weather, but not all care about the same level of detail.  As screens proliferate, the intensity of engagement diminishes, as attention gets scattered across different devices.  Observable behavior becomes a weaker signal of actual attention and interest.  Tracking what one does, does not tells us whether to give an individual more or less content, so the system assumes the quantity is right.

Noisy social data

Social media connections are a popular way to score users, and social media platforms argue that people who are connected are similar, like similar things, and influence each other.  Unfortunately, these assumptions are more true for in-person relationships than for online ones.  People have too many connections to other people in social channels for there to be a high degree of correlation of interests, or influence between them.  There is of course some, but it isn’t as strong as the models would hope.  These models mistake tendencies observable at an aggregated level, with predictability at the level of an individual.

Social grouping can be a basis for inferring the interests of a specific individual, provided people you know share your interests to a high degree, so you will want to view things they have viewed or recommend viewing.  That is most true for common, undifferentiated interests.  Some social groups, notably among teens, can have a strong tendency toward herd behavior.  But the strength and relevance of social ties cannot be assumed without knowing the context of the relationship.  One’s poker buddies won’t necessarily share one’s interests in religion or music.  Unless both the basis of the group and the topic of content are the same, it can be hard to assume an overlap.  And even when interests are similar, they intensity of interest can vary.

Social targeting of content considers the following:

  • how much you interact with a social connection
  • how widely viewed an item is, especially for people deemed similar to you
  • what actions your social connections take with respect to different kinds of content
  • what actions you take relating to a source of content

While it is obvious that these kinds of information can be pertinent, they are often only weakly suggestive of what an individual wants to view.  It is easy for unrelated inputs to be summed together to prioritize content that has no intrinsic basis for being relevant: your social connection “liked” this photo of a cat, and you viewed several photos last week and talk often to your friend, so you are seeing this cat photo.

At the level of personalization, it’s flawed to assume that one’s friends interests are the same as one’s own.  There can be a correlation, but in many cases it will be a very weak one. Social behavioral researchers are now exploring a concept of social affinity instead of social distance to strengthen the correlation.  But the weakness of predicting what you want according to who your acquaintances are will remain.

Mind-reading is difficult

The most recent hope for reading into the minds of individuals involves contextualization.  The assumption behind contextualization is that if everything is known about an individual, then their preferences for content can be predicted.  Not surprisingly, this paradigm is presented in a way that highlights the convenience of having information you need readily available.  It is, of course, perfectly possible to take contextual information and use this against the interests of an individual.  Office workers are known to ask for urgent decisions from their bosses knowing their boss is on her way to a meeting and can’t wait to provide a more considered analysis.  Any opportunistic use of contextual information about an individual by someone else is clearly an example of the individual losing control.

Contextual information can be wrong or unhelpful.  The first widespread example of contextual content was the now infamous Microsoft Clippy, which asked “it looks like you are about to write a letter…”   Clippy was harmless, but hated, because people felt a lack of control over his appearance.

Even with the best of intentions, brands have ample room to misjudge the intentions of an individual.

Can content preferences be predicted?

The problem with relying on behavior to predict individual content preferences comes down to time frame.  Because targeting treats individuals as members of a category of people, it ignores the specific circumstances that time introduces.  People may be interested in content on a topic, but not necessarily at the time the provider presents it.  The provider responds by trying again, or trying some other topic, but in either case may have missed an opportunity to understand the individual’s real interest in the content presented.  People may pass on viewing content they have a general interest in.  They think “not now” (it’s not the best time) or “not yet” (I have more urgent priorities).  Often times readiness comes down to the mood of the individual, which even contextualization can’t factor in.  Over time a person may desire content about something, but they don’t care to click when the provider is offering it too them.

If the viewer doesn’t have a choice over what they see, it’s not personalized.

A better way

There are better approaches to personalization.  The big data approach of aggregating lots of behavioral data has been widely celebrated as mining gold from “data exhaust.”  Data exhaust can have some value, but is a poor basis for a brand’s relationship with its customers.  People need to feel some control, and not as if they are being tracked for their exhaust.  Brands need an alternative approach to personalization not only to build better relationships, but to increase their understanding of their audiences so they can serve them more profitably.  In the following post, I will discuss how to put the person back into personalization.

— Michael Andrews

Categories
Content Effectiveness Content Experience

Seven examples of content behavior design

Content behavior design promotes the discovery of content.  It is different from information architecture, which is focused on global information organization and navigation, and on offering users tools to specify what they are seeking.  Content behavior design anticipates what content might be interesting for users, and decides what to display based on that.  Rather than assume the user is necessarily looking to find a piece of information, content design assumes that the user may not be consciously looking for a piece of information, but would be happy to have it available if it were relevant to core content she is viewing.

In some cases, content behavior design can help people discover things they were not seeking.  In other cases, additional content provides more clarity.  Effective designs give audiences more context, making the content richer.  Here are six examples to illustrate how content behavior can work for audiences.

Real time content aggregation (Kayak)

kayak flight

Many bits of information are associated with a single label (a flight number) representing a single object (a plane).  This example brings together real time information about the flight, showing information about three locations (departure, current, and arrival) and timing information about events associated with these locations.  The aggregation of many pieces of real time information makes this powerful.  Real time information is compelling because it changes and gives audiences a reason to check for updates.  One could imagine this example being even more useful if it included weather related information affecting the flight, especially any weather conditions at the arrival destination that could impact the projected arrival time.

Content about related actions (ESPN)

espn tickets

In interaction design, it is helpful to highlight a next action, instead of making the user look elsewhere for it.  In this example from ESPN, the column on the far right allows the user to order tickets for a basketball game.  But instead of simply saying “order tickets,” it provides information about how many seats are available and the costs.  Incorporating this content is successful for two reasons: 1. It gives people interested in ordering tickets an idea of their availability; and 2. It gives people not interested in attending the game in person a sense of how anticipated the game is in terms of attendance.  Based on the number of tickets sold, and the prices of tickets, do fans expect an exciting game?

Tracking components of collections (Paprika)

paprika shopping

Digital content curation is an important development.  People collect content items that have associated metadata.  As they assemble items into collections, the metadata can be combined as well.  In this example from the recipe manager app Paprika, the ingredients from two recipes are combined into one shopping list, so that the user knows how many eggs in total he needs to make both dishes.  The content is smart enough to anticipate a need of the user, and perform that task without prompting.   Another example is the app Delicious Library, which can track the replacement costs of books one owns.  Designers use content behavior for applications focused on the “quantified self”— the collection of information about yourself.  For example, a design could tell the user what night of the week she typically sleeps best.

Audience activity insights (Economist)

economist readers

What audiences are interested in is itself interesting.  The Economist has adapted the concept of a tag cloud to listen to reader comments on their articles.  The program listens for keywords, newsworthy proper nouns or significant phrases, and shows relative frequency and extent they coincide.   It’s a variation of the “most commented” article list, but shifts the focus to the discussion itself.  Audiences can see what topics specifically are being discussed, and can note any relationships between topics.  For example, Apple is being discussed in the context of China, rather than in the context of Samsung.  Users can hover over to see the actual comments.  It provides a discovery mechanism for seeing the topicality of the week’s news, and provides enough ambiguity to tempt the reader to explore more to understand why something unfamiliar is being discussed.

Data on content facets (Bloomberg)

bloomberg

Content can have many facets. Faceted navigation, which takes the user to other content related to that facet, is a well established navigation technique.  This example from Bloomberg, in contrast, brings the content to the user.  As the interview is happening, users can get more information about things mentioned in the interview.  Without leaving the interview, the user can get more context, viewing real time information about stock prices discussed, or browsing headlines about companies or industries mentioned.  The viewer can even see how often the person speaking has appeared on the show previously to get a sense of their credibility or expertise.  Even though some of this information is hidden by collapsible menus, the user does not need to request the system to pull this information – it is provided by default.

Data-driven leaderboards (IMDb)

imdb leaderboard

Lists are a helpful navigation tool, but they are more valuable when they have interesting data behind them.  Unlike tables of data, which require users to sort, leaderboards provide automatic ranking by key criteria.  In this example from IMDb, animation series and titles are ranked by user rating and gross revenues.  The ranking provides the casual viewer a chance to gauge relative popularity before clicking on one for more information, while the core fan might check the list to see if their favorite film has moved up in the rankings.

Content recommendations (Netflix)

netflix recommendation

There are growing numbers of content recommendation engines, covering articles, books, music, videos and even data.  They rely on different inputs, such as user ratings, user consumption, peer ratings, peer consumption, and imputed content similarity.  In many respects, content recommendation engines represent the holy grail of content behavior design.  The chief problem for users is understanding and trusting the algorithm.  Why am I being told I would like this?  Netflix provides a rationale, based on prior activity by the user.  It’s probably a simplification of the actual algorithm, but it provides some basis for the user to accept or reject the recommendation.  I expect recommendation engines will evolve further to provide better signals that suggestions are a good fit (no risk), and that they aren’t too narrow (the filter bubble problem).

Ideas for thinking about behavior

In choosing what content to present, it helps to ask:

  • what else might someone find helpful that is related to what is being presented?
  • what aspects of content are notable, changing, and newsworthy, and how can you highlight these aspects?
  • how can you present content elements so they are interesting, rather than simply informative?
  • if you are trying to encourage audiences to act, how can real time content to used to support that?
  • how do different audiences relate to the content, and can you provide something that appeals to different segments?
  • what content could the system automatically provide that is laborious for someone to do themselves?

Designing content behavior is central to content engagement.  Try out ideas, and test them to see what works for your audiences.

— Michael Andrews