Don’t build your personalization on data exhaust

A lot of content that looks like it’s just for you, isn’t just for you. You are instead seeing content for a category segment in which you have been placed. Such targeting is a useful and effective approach for marketers, but it shouldn’t be confused with personalization. The choice of what people see rests entirely with the content provider.

When providers both rely on exclusively their own judgments, and base those judgments on how they read the behaviors of groups of people, they are prone to error. Despite sophisticated statistical techniques and truly formidable computational powers, content algorithms can appear to individuals as clueless and unconcerned. To understand why the status quo is not good enough, we first need to understand the limitations of current approaches based on web usage mining.

Targeting predefined outcomes

Increasingly, different people see different views of content. Backend systems use rules to make decisions concerning what to present to offer such variation. The goal is a simple one: to increase the likelihood that content presented will be clicked on. It is assumed that if the content is clicked on, everyone is happy. But depending on the nature of the content, the provider may be more happy — get more benefit — than the viewer by the act of clicking, and as a consequence present content with only a minor chance of being clicked.

A business user who is viewing a vendor sales website may see specific content, based on the vendor’s ability to recognize the user’s IP address. The vendor could decide to present content about how the business user’s competitor is using the vendor’s product. The targeted user is in a segment: a sales prospect in a certain industry. Such a content presentation reflects the targeting of a type of customer based on their characteristics. It may or may not be relevant to the viewer coming to the site (the viewer may be looking for something else, and does not care about what’s being presented). The content presentation does not reflect any declared preference by the site visitor. Indeed, officially, the site visitor is anonymous, and it is only through the IP address combined with database information from a product such as Demandbase that the inference of who is visiting is made. This is a fairly common situation: guessing who is looking for content, and then guessing what they want, or at least, what they might be willing to notice.

Targeted ads are often described as personalized, but a targeted ad is simply a content variation that is presented when the viewer matches certain characteristics. Even when the ad you see tested better with others in a segment of people who are like you, the ad you see is merely optimized (the option that scored highest) not personalized, reflecting your preferences. In many respects it is silly to talk about advertising as personalized, since it is rare for individuals to state advertising preferences.

The behavioral mechanisms behind content targeting resemble in many respects other content ranking and filtering techniques used for prioritizing search results and making recommendations. These techniques, whether they involve user-user collaborative filtering, or page-ranking, aim to prioritize the content based on other people’s use of the content. They employ web usage mining to guess what will get most clicked.

What analytics measure

It is important to bear in mind that analytics measure actions that matter to brands, and not actions that matter to individuals. The analytics discipline tends to provide the most generous interpretation of a behavior to match the story the brand wants to hear, rather than the story the audience member experiences. Take the widely embraced premise that every click is an expression of interest. Many people may click on a link, but quickly abandon the page they are taken to. The brand will think: they are really interested in what we have, but the copy was bad so they left, so we need to improve the copy. The audience may think: that was a misleading link title and the brand wasted my time; it needs to be more honest. The link was clicked, but we can’t be sure of the intent of the clicking, so we don’t know what the interest was.

Even brands that practice self awareness are susceptible to misreading analytics. The signals analyzed are by-products of activity, but the individual’s mind is a black box. More extensive tracking and data won’t reliably deliver to individuals what they seek when individual preferences are ignored.

Why behavioral modeling can be tenuous

There are several important limitations of behavioral data. The behavioral data can be thin, misleading, flattened, or noisy.

Thin data

One of the major weaknesses of behavioral data is when there isn’t sufficient data on which to base content prioritization or recommendations. Digital platforms are supposed to enable access to the “long tail” of content, the millions of items that physical media couldn’t cope with. But discovery of that content is a problem unsolved by behavioral data, since most of it has little or no history of activity by people similar to any one individual. If only 20 per cent of content accounts for 80 per cent of activity, then 80 per cent of content has little activity on which to base recommendations. It may nonetheless be of interest to individuals. Significantly, the content that is most likely to matter to an individual may be what is most unique to them, since special interests strongly define the identity of the individual. But what matters most to an individual can be precisely what matters least to the crowd overall. Content providers try to compensate for thin data by aggregating categories and segments at even higher levels, but the results are often widely off the mark.

Misleading signals

Even when there is sufficient data, it can be misleading. The analytics discipline confuses matters by equating traffic volume with “popularity.” Content that is most consumed is not necessarily most popular, if we take popularity to mean liked rather than used. A simple scroll through YouTube confirms this. Some widely viewed videos draw strong negative comments due to their controversy. Other may get a respectable number of views but little reaction from likes or dislikes. And sometimes a highly personal video, say a clip of someone’s wedding, will appeal to only a small segment but will get an enthusiastic response from its viewers.

Analytics professionals may automatically assume that content that is not consumed is not liked, but that isn’t necessarily true. Behavioral data can tell us nothing about whether someone will like content when a backend system has no knowledge of it having been consumed previously. We don’t know their interests, only their behavior.

Past behavior does not always indicate current intent. Log into Google and search intensively about a topic, and you may find Google wants to keep offering content results you no longer want, because it prioritizes items similar to ones you have viewed previously. The person’s interests and goals have evolved faster than the algorithm’s ability to adapt to those changes.

Perversely, sometimes people consume content they are not satisfied with because they’ve been unable to find anything better. The data signal assumes they are happy with it, but they may in fact be wanting something more specific. This problem will be more acute as content consumption becomes increasingly driven by automatic feeds.

Flattened data

People get “averaged” when they are lumped into segment categories. Their profile is flattened in the process — the data is mixed with other people’s data to the point that it doesn’t reflect the individual’s interests. Not only can their individual interests be lost, but spurious interests can be presumed of them.

Whether segmentation is demographic or behavioral, individuals are grouped into segments that share characteristics. Sometimes people with shared characteristics will be more likely to share common interests and content preferences. But there is plenty of room to make mistaken assumptions. That luxury car owners over-index on interest in golf does not translate into a solid recommendation for an individual. Some advertisers have explored the relationship between music tastes and other preferences. For example, country music lovers have a stronger than average tendency to be Republican voters in the United States. But it can be very dangerous for a brand to present potentially loaded assumptions to individuals when there’s a reasonable chance it’s wrong.

Even people who exhibit the same content behaviors may have different priorities. Many people check the weather, but not all care about the same level of detail. As screens proliferate, the intensity of engagement diminishes, as attention gets scattered across different devices. Observable behavior becomes a weaker signal of actual attention and interest. Tracking what one does, does not tells us whether to give an individual more or less content, so the system assumes the quantity is right.

Noisy social data

Social media connections are a popular way to score users, and social media platforms argue that people who are connected are similar, like similar things, and influence each other. Unfortunately, these assumptions are more true for in-person relationships than for online ones. People have too many connections to other people in social channels for there to be a high degree of correlation of interests, or influence between them. There is of course some, but it isn’t as strong as the models would hope. These models mistake tendencies observable at an aggregated level, with predictability at the level of an individual.

Social grouping can be a basis for inferring the interests of a specific individual, provided people you know share your interests to a high degree, so you will want to view things they have viewed or recommend viewing. That is most true for common, undifferentiated interests. Some social groups, notably among teens, can have a strong tendency toward herd behavior. But the strength and relevance of social ties cannot be assumed without knowing the context of the relationship. One’s poker buddies won’t necessarily share one’s interests in religion or music. Unless both the basis of the group and the topic of content are the same, it can be hard to assume an overlap. And even when interests are similar, they intensity of interest can vary.

Social targeting of content considers the following:

how much you interact with a social connection
how widely viewed an item is, especially for people deemed similar to you
what actions your social connections take with respect to different kinds of content
what actions you take relating to a source of content

While it is obvious that these kinds of information can be pertinent, they are often only weakly suggestive of what an individual wants to view. It is easy for unrelated inputs to be summed together to prioritize content that has no intrinsic basis for being relevant: your social connection “liked” this photo of a cat, and you viewed several photos last week and talk often to your friend, so you are seeing this cat photo.

At the level of personalization, it’s flawed to assume that one’s friends interests are the same as one’s own. There can be a correlation, but in many cases it will be a very weak one. Social behavioral researchers are now exploring a concept of social affinity instead of social distance to strengthen the correlation. But the weakness of predicting what you want according to who your acquaintances are will remain.

Mind-reading is difficult

The most recent hope for reading into the minds of individuals involves contextualization. The assumption behind contextualization is that if everything is known about an individual, then their preferences for content can be predicted. Not surprisingly, this paradigm is presented in a way that highlights the convenience of having information you need readily available. It is, of course, perfectly possible to take contextual information and use this against the interests of an individual. Office workers are known to ask for urgent decisions from their bosses knowing their boss is on her way to a meeting and can’t wait to provide a more considered analysis. Any opportunistic use of contextual information about an individual by someone else is clearly an example of the individual losing control.

Contextual information can be wrong or unhelpful. The first widespread example of contextual content was the now infamous Microsoft Clippy, which asked “it looks like you are about to write a letter…” Clippy was harmless, but hated, because people felt a lack of control over his appearance.

Even with the best of intentions, brands have ample room to misjudge the intentions of an individual.

Can content preferences be predicted?

The problem with relying on behavior to predict individual content preferences comes down to time frame. Because targeting treats individuals as members of a category of people, it ignores the specific circumstances that time introduces. People may be interested in content on a topic, but not necessarily at the time the provider presents it. The provider responds by trying again, or trying some other topic, but in either case may have missed an opportunity to understand the individual’s real interest in the content presented. People may pass on viewing content they have a general interest in. They think “not now” (it’s not the best time) or “not yet” (I have more urgent priorities). Often times readiness comes down to the mood of the individual, which even contextualization can’t factor in. Over time a person may desire content about something, but they don’t care to click when the provider is offering it too them.

If the viewer doesn’t have a choice over what they see, it’s not personalized.

A better way

There are better approaches to personalization. The big data approach of aggregating lots of behavioral data has been widely celebrated as mining gold from “data exhaust.” Data exhaust can have some value, but is a poor basis for a brand’s relationship with its customers. People need to feel some control, and not as if they are being tracked for their exhaust. Brands need an alternative approach to personalization not only to build better relationships, but to increase their understanding of their audiences so they can serve them more profitably. In the following post, I will discuss how to put the person back into personalization.

— Michael Andrews