Categories
Content Effectiveness

Don’t build your personalization on data exhaust

A lot of content that looks like it’s just for you, isn’t just for you.  You are instead seeing content for a category segment in which you have been placed.  Such targeting is a useful and effective approach for marketers, but it shouldn’t be confused with personalization.   The choice of what people see rests entirely with the content provider.

When providers both rely on exclusively their own judgments, and base those judgments on how they read the behaviors of groups of people, they are prone to error.  Despite sophisticated statistical techniques and truly formidable computational powers, content algorithms can appear to individuals as clueless and unconcerned.  To understand why the status quo is not good enough, we first need to understand the limitations of current approaches based on web usage mining.

Targeting predefined outcomes

Increasingly, different people see different views of content.   Backend systems use rules to make decisions concerning what to present to offer such variation.  The goal is a simple one: to increase the likelihood that content presented will be clicked on.  It is assumed that if the content is clicked on, everyone is happy.  But depending on the nature of the content, the provider may be more happy — get more benefit —  than the viewer by the act of clicking, and as a consequence present content with only a minor chance of being clicked.

A business user who is viewing a vendor sales website may see specific content, based on the vendor’s ability to recognize the user’s IP address.  The vendor could decide to present content about how the business user’s competitor is using the vendor’s product.  The targeted user is in a segment: a sales prospect in a certain industry.  Such a content presentation reflects the targeting of a type of customer based on their characteristics.  It may or may not be relevant to the viewer coming to the site (the viewer may be looking for something else, and does not care about what’s being presented).  The content presentation does not reflect any declared preference by the site visitor.  Indeed, officially, the site visitor is anonymous, and it is only through the IP address combined with database information from a product such as Demandbase that the inference of who is visiting is made.  This is a fairly common situation: guessing who is looking for content, and then guessing what they want, or at least, what they might be willing to notice.

Targeted ads are often described as personalized, but a targeted ad is simply a content variation that is presented when the viewer matches certain characteristics.  Even when the ad you see tested better with others in a segment of people who are like you, the ad you see is merely optimized (the option that scored highest) not personalized, reflecting your preferences.   In many respects it is silly to talk about advertising as personalized, since it is rare for individuals to state advertising preferences.

The behavioral mechanisms behind content targeting resemble in many respects other content ranking and filtering techniques used for prioritizing search results and making recommendations.  These techniques, whether they involve user-user collaborative filtering, or page-ranking, aim to prioritize the content based on other people’s use of the content. They employ web usage mining to guess what will get most clicked.

What analytics measure

It is important to bear in mind that analytics measure actions that matter to brands, and not actions that matter to individuals.  The analytics discipline tends to provide the most generous interpretation of a behavior to match the story the brand wants to hear, rather than the story the audience member experiences.  Take the widely embraced premise that every click is an expression of interest.  Many people may click on a link, but quickly abandon the page they are taken to.  The brand will think: they are really interested in what we have, but the copy was bad so they left, so we need to improve the copy.  The audience may think: that was a misleading link title and the brand wasted my time; it needs to be more honest.  The link was clicked, but we can’t be sure of the intent of the clicking, so we don’t know what the interest was.

Even brands that practice self awareness are susceptible to misreading analytics.  The signals analyzed are by-products of activity, but the individual’s mind is a black box.  More extensive tracking and data won’t reliably deliver to individuals what they seek when individual preferences are ignored.

Why behavioral modeling can be tenuous

There are several important limitations of behavioral data.  The behavioral data can be thin, misleading, flattened, or noisy.

Thin data

One of the major weaknesses of behavioral data is when there isn’t sufficient data on which to base content prioritization or recommendations.  Digital platforms are supposed to enable access to the “long tail” of content, the millions of items that physical media couldn’t cope with.  But discovery of that content is a problem unsolved by behavioral data, since most of it has little or no history of activity by people similar to any one individual.  If only 20 per cent of content accounts for 80 per cent of activity, then 80 per cent of content has little activity on which to base recommendations.  It may nonetheless be of interest to individuals. Significantly, the content that is most likely to matter to an individual may be what is most unique to them, since special interests strongly define the identity of the individual.  But what matters most to an individual can be precisely what matters least to the crowd overall.  Content providers try to compensate for thin data by aggregating categories and segments at even higher levels, but the results are often widely off the mark.

Misleading signals

Even when there is sufficient data, it can be misleading.  The analytics discipline confuses matters by equating traffic volume with “popularity.”  Content that is most consumed is not necessarily most popular, if we take popularity to mean liked rather than used.  A simple scroll through YouTube confirms this.  Some widely viewed videos draw strong negative comments due to their controversy.  Other may get a respectable number of views but little reaction from likes or dislikes.  And sometimes a highly personal video, say a clip of someone’s wedding, will appeal to only a small segment but will get an enthusiastic response from its viewers.

Analytics professionals may automatically assume that content that is not consumed is not liked, but that isn’t necessarily true.  Behavioral data can tell us nothing about whether someone will like content when a backend system has no knowledge of it having been consumed previously.  We don’t know their interests, only their behavior.

Past behavior does not always indicate current intent.  Log into Google and search intensively about a topic, and you may find Google wants to keep offering content results you no longer want, because it prioritizes items similar to ones you have viewed previously.  The person’s interests and goals have evolved faster than the algorithm’s ability to adapt to those changes.

Perversely, sometimes people consume content they are not satisfied with because they’ve been unable to find anything better.  The data signal assumes they are happy with it, but they may in fact be wanting something more specific.  This problem will be more acute as content consumption becomes increasingly driven by automatic feeds.

Flattened data

People get “averaged” when they are lumped into segment categories.  Their profile is flattened in the process — the data is mixed with other people’s data to the point that it doesn’t reflect the individual’s interests.  Not only can their individual interests be lost, but spurious interests can be presumed of them.

Whether segmentation is demographic or behavioral, individuals are grouped into segments that share characteristics.  Sometimes people with shared characteristics will be more likely to share common interests and content preferences.   But there is plenty of room to make mistaken assumptions.  That luxury car owners over-index on interest in golf does not translate into a solid recommendation for an individual.  Some advertisers have explored the relationship between music tastes and other preferences.  For example, country music lovers have a stronger than average tendency to be Republican voters in the United States.  But it can be very dangerous for a brand to present potentially loaded assumptions to individuals when there’s a reasonable chance it’s wrong.

Even people who exhibit the same content behaviors may have different priorities.  Many people check the weather, but not all care about the same level of detail.  As screens proliferate, the intensity of engagement diminishes, as attention gets scattered across different devices.  Observable behavior becomes a weaker signal of actual attention and interest.  Tracking what one does, does not tells us whether to give an individual more or less content, so the system assumes the quantity is right.

Noisy social data

Social media connections are a popular way to score users, and social media platforms argue that people who are connected are similar, like similar things, and influence each other.  Unfortunately, these assumptions are more true for in-person relationships than for online ones.  People have too many connections to other people in social channels for there to be a high degree of correlation of interests, or influence between them.  There is of course some, but it isn’t as strong as the models would hope.  These models mistake tendencies observable at an aggregated level, with predictability at the level of an individual.

Social grouping can be a basis for inferring the interests of a specific individual, provided people you know share your interests to a high degree, so you will want to view things they have viewed or recommend viewing.  That is most true for common, undifferentiated interests.  Some social groups, notably among teens, can have a strong tendency toward herd behavior.  But the strength and relevance of social ties cannot be assumed without knowing the context of the relationship.  One’s poker buddies won’t necessarily share one’s interests in religion or music.  Unless both the basis of the group and the topic of content are the same, it can be hard to assume an overlap.  And even when interests are similar, they intensity of interest can vary.

Social targeting of content considers the following:

  • how much you interact with a social connection
  • how widely viewed an item is, especially for people deemed similar to you
  • what actions your social connections take with respect to different kinds of content
  • what actions you take relating to a source of content

While it is obvious that these kinds of information can be pertinent, they are often only weakly suggestive of what an individual wants to view.  It is easy for unrelated inputs to be summed together to prioritize content that has no intrinsic basis for being relevant: your social connection “liked” this photo of a cat, and you viewed several photos last week and talk often to your friend, so you are seeing this cat photo.

At the level of personalization, it’s flawed to assume that one’s friends interests are the same as one’s own.  There can be a correlation, but in many cases it will be a very weak one. Social behavioral researchers are now exploring a concept of social affinity instead of social distance to strengthen the correlation.  But the weakness of predicting what you want according to who your acquaintances are will remain.

Mind-reading is difficult

The most recent hope for reading into the minds of individuals involves contextualization.  The assumption behind contextualization is that if everything is known about an individual, then their preferences for content can be predicted.  Not surprisingly, this paradigm is presented in a way that highlights the convenience of having information you need readily available.  It is, of course, perfectly possible to take contextual information and use this against the interests of an individual.  Office workers are known to ask for urgent decisions from their bosses knowing their boss is on her way to a meeting and can’t wait to provide a more considered analysis.  Any opportunistic use of contextual information about an individual by someone else is clearly an example of the individual losing control.

Contextual information can be wrong or unhelpful.  The first widespread example of contextual content was the now infamous Microsoft Clippy, which asked “it looks like you are about to write a letter…”   Clippy was harmless, but hated, because people felt a lack of control over his appearance.

Even with the best of intentions, brands have ample room to misjudge the intentions of an individual.

Can content preferences be predicted?

The problem with relying on behavior to predict individual content preferences comes down to time frame.  Because targeting treats individuals as members of a category of people, it ignores the specific circumstances that time introduces.  People may be interested in content on a topic, but not necessarily at the time the provider presents it.  The provider responds by trying again, or trying some other topic, but in either case may have missed an opportunity to understand the individual’s real interest in the content presented.  People may pass on viewing content they have a general interest in.  They think “not now” (it’s not the best time) or “not yet” (I have more urgent priorities).  Often times readiness comes down to the mood of the individual, which even contextualization can’t factor in.  Over time a person may desire content about something, but they don’t care to click when the provider is offering it too them.

If the viewer doesn’t have a choice over what they see, it’s not personalized.

A better way

There are better approaches to personalization.  The big data approach of aggregating lots of behavioral data has been widely celebrated as mining gold from “data exhaust.”  Data exhaust can have some value, but is a poor basis for a brand’s relationship with its customers.  People need to feel some control, and not as if they are being tracked for their exhaust.  Brands need an alternative approach to personalization not only to build better relationships, but to increase their understanding of their audiences so they can serve them more profitably.  In the following post, I will discuss how to put the person back into personalization.

— Michael Andrews

Categories
Personalization

The urgency of genuine personalization

Personalization may be the most misused phrase relating to content.  It’s not hard to understand why people want to talk about personalization: it’s appealing to think you’ll see exactly what you want, especially has we get deluged with content.  But paradoxically most techniques of personalization actually involve tailoring content based on what other people do, rather than your own interests.  As a result, people miss out on content they might most want to view.

To get a sense of why personalization is so urgent, and so troublesome, consider the situation of Facebook.

Mark Zuckerberg said last year that Facebook wants to be “the best personalized newspaper in the world.”  Notwithstanding Facebook’s popularity, its users complain about the lack of relevancy for much of the content they see.  Facebook is optimized for promotion of content sharing, not for filtering of content based on individual preferences.  Those two goals to a large extent are in conflict with each other.  Facebook has chosen to fund its revenues through advertising and related services, which  accounts for about 90% of total revenues.  Brands want their content viewed and shared and their ads seen, so keeping them happy is a huge priority.   These various pressures come to a head with Facebook’s News Feed.  On average, a person may have 1500 potential messages Facebook considers relevant to them, and Facebook needs to prioritize these into a manageable quantity (they’ve decided that’s about 300).  “The News Feed algorithm responds to signals from you” Facebook explains.  But many signals seem to have little to do with the individual, and more to do with other parties: the interests of friends, strangers, and advertisers.   Factors influencing ranking include “number of comments, who posted the story, and what type of post it is (ex: photo, video, status update, etc.)” and “promoted posts.”  Facebook routinely revises its algorithm based on what tests show people in general like best.  But the options for the individual to choose what he or she wants specifically are few.  Facebook decides what types of content, and items of content, are most relevant, and individuals don’t get much choice in the matter.

What do we mean by personalization?

In the digital content arena, personalization lacks any widely agreed definition. As Aria Haghighi of Prismatic notes: “personalization is really young. I still think we don’t all agree necessarily on what personalization means.”   Part of the lack of agreement involves how to implement personalization on a technical level, but also it reflects a lack of common vision from providers about what they want personalization to offer.

Digital marketers generally refer to personalization as behavioral targeting, and often use the terms interchangeably.  Big data researchers typically see personalization as adapting content results on the basis of machine learning.  Curiously, the protagonist of the story, the person seeking content, is missing from the personalization discussion.  Instead, the discussion is centered on how to improve click through rates.

Serving people better should be the core reason for personalization, and it’s important to build a commonsense definition, rather than a mathematical one.  Merrium Webster defines personalize as “to mark (something) in a way that shows it belongs to a particular person.”  The key elements are that it is individual to a person, that the person owns that something.  Ownership implies having control.

My definition of personalization is when a person gets unique content that reflects their individual preferences.  Targeting, in contrast, is when a person gets non-unique content based on characteristics they share with others.  In both cases, the content provider prioritizes what content is delivered, but in the first case it is based on first-hand knowledge of what an individual is interested in, where in the second case, it is based on second-hand assumptions about what seems relevant to the individual.

Personalization is based on explicit individual preferences, not assumptions

To understand personalization, we need to separate two dimensions:

  • whether the “signal” is about the individual himself, or about the interests of a crowd who are assumed to be similar to the individual
  • whether the “signal” is an explicit expression of interest, or an implicit assumptions based on prior behaviors

The following table shows how different signals can be aligned with either individual or crowd, and can be either explicit or implicit:

Table showing how different kinds of explicit and implicit preferences and behaviors can influence content delivery
Table showing how different kinds of explicit and implicit preferences and behaviors can influence content delivery

I’ve simplified the classic approaches of Facebook, Amazon, and Google to highlight elements that are most salient in their respective approaches.  In practice, each uses a mixture of signals to rank and filter content, crunching hundreds of different largely behavioral signals.  These high volume content providers, and others who are far smaller,  offer individuals the impression that the results are personalized (described as being “for you”) when they are primarily based on the aggregation of data across users, rather than individual feedback.  While these aggregation techniques improve general relevance (fewer inappropriate items), I don’t believe these behavior-driven approaches are sufficient to give individuals what’s most relevant to them personally.

How to implement personalization

Personalization matters because it is the only way individuals will be able to cope with the volume of content they face now and in the future.  Too much information is the problem, and genuine personalization needs to be the solution.  Brands need to help individuals connect with the content they most want, and not simply content that’s an adequate fit.  To do that, they need to ask questions, and not just make assumptions.

Lots of content providers talk about offering personalization, but the techniques they rely on have big weaknesses.  In future posts, I will discuss why big data approaches can’t solve personalization, and why small data using individual feedback is essential.

—Michael Andrews