Monthly Archives: January 2014

Putting choice into personalization

In previous posts, I noted that the deluge of content makes personalization a necessity for individuals, but that big data approaches that aggregate segment data can’t deliver personalization successfully.  People are moving away from hunting for content, to expecting it to be delivered to them through a feed.  Brands need to offer content that reflects the actual interests of individuals.

Fortunately, there is growing evidence that some content providers are considering individual needs, not just averaged needs.  Most content providers continue to assume they know what individuals want more than individuals themselves.  But some recent services are giving individuals a chance to express their interests directly, instead of hoping that big data will be smart enough to do that on its own.  The paradigm emerging will involve computers responding to your needs, so you can train the provider to give you what you really want.

Why precision matters again

For content to have value, it can’t be treated as a commodity.

Accessing content through a service such as DIALOG could cost $200 an hour when I started working professionally with online content in the 1980s.  Due to the cost, precision was important.  One would create an SDI (selective dissemination of information) query profile to deliver content on topics that one was specifically interested in, without having to spend much time online.  SDI was effective for someone familiar with how to construct complex Boolean queries, but was not a viable option for untrained users.

When the Worldwide Web made information free, content publishers focused on getting the most traffic, either through search engine rankings, or later, social media.  The needs of specific individuals became secondary to the rankings of traffic.  If a person didn’t find what she wanted, she could spend more time “surfing” for it.  Over time surfing lost its luster, as most individuals would only look at content results that were most easily accessible.  Content providers correctly noted that individuals didn’t want to spend effort trying to “pull” content, so they offered more channels that “push” content to people, based on big data.  But the move to pushing content has not reduced the effort for individuals, because they still find themselves having to filter through extraneous content.  The cost to individuals of “free” content is that their attention is depleted and time wasted.

Now more genuine personalization is becoming a reality, thanks to the rise of mobile and tablet apps that are centered on personal usage, and cloud-based data management.

The rise of personal curation

In recent years, content services and applications have appeared that enable individuals to curate content themselves, so that their feed of content matches their specific interests.  The Pandora music service was one of the first major examples.  Tim Westergreen of Pandora told the late Bill Moggeridge: “We learned that because of Pandora’s personalization capabilities, it causes people to interact with it a lot.  You get rewarded for going in and thumbing songs, engaging with the listening.  And as a result people come back steadily, about six times an hour, to do something: whether it’s to create a new station, thumb a song, skip a song, add a new artist, or find out about an artist they’ve heard but don’t know.” (Moggeridge, Designing Media, p. 145)  Pandora’s thumbs up or down approach has been used 35 billion times, which provides a lot of feedback.

A notable approach to personalization comes from Trapit, a consumer content discovery iPad app that was briefly available before it shut down earlier this month.  “Trapit’s AI-driven approach goes completely counter to the dominant trend in news curation today, which emphasizes the power of social networking and collaborative filtering” one news story explained. “You can also train Trapit manually by clicking on the thumbs-up or thumbs-down buttons—and the more you do this, the faster the software will learn your preferences.”  Commenting on the end of consumer service, Trapit’s CEO noted “We challenged this belief — our mantra: ‘You are not the crowd.’ We are all individuals with our own beliefs, tastes, and principles.”

Most recently, National Public Radio (NPR), a leader in content innovation, is preparing a new personalization app.  NPR hopes to present its content “to people in different ways so people can pick and choose based on what they’re doing.”  An innovation will be a DVR-like feature to enable time shifting, so that the stream of content can be paused and picked up when the individual wants to use it.

While these examples differ in their specifics, they are part of a growing wave of personalization efforts that give individuals genuine choice over what content they receive.

Feedback is the basis of choice

Content providers that tout the powers of big data presume to know the best interests of the audience.  To some ordinary individuals, this presumption may feel like a rationalization for collecting all the data involved.  Many platforms have business drivers that involve getting users to make recommendations or expand their range of activity, and as a result, they promote doing these things in the name of the self interest of the users.

Even if big data is not as magical as it is presented, it has a role in personalization provided it is coupled with data on the choices made by individuals.  But among big data’s promoters, the concept of soliciting individual input to shape content personalization is widely resisted.  I have seen a range of objections, most of which are unconvincing.  I’ll paraphrase some objections I’ve seen content providers make:

  • viewers don’t know what they really want, and they say they want more than they use
  • viewers don’t want the burden of having to articulate what they want
  • providing feedback is kludgy and ruins the user experience
  • viewer preferences aren’t reliable indicators of what they actually use
  • machine learning can tell people things they don’t realize they would like
  • viewer feedback is unnecessary, because social recommendations provide the same data

Objections like these treat the viewer as lazy and lacking self-awareness, and the data-rich content provider as wise and concerned.   I don’t want to underplay the limitations of individuals to state unambiguously what they want.  We are human, after all.  But the bigger risk here is of devaluing individuals by not asking them to express their choices.  And some of the problems cited reflect old or poorly done implementations of content choice, not current best practices.

In general, intelligent data should make it easier for people to express their interests, and be aware of what they want.  Even the basic act of declaring topics of interest is made easier through linked data, such as used in Google’s knowledge graph, and as a result, people don’t need to be as precise or complete in saying what they want.

The range of individual signals of content preferences available now to content providers is unprecedented, thanks to the app economy.  There are three main areas an individual can express what the want to see:

  1. the specific interests they declare
  2. feedback on what they see
  3. how they manage defaults, such as links to other services

Many content apps now let individuals choose what topics or themes they’d like to follow.  It may involve creating your own magazine or radio station, then indicating a mix of topics, artists, or sources of interest.  These can be changed at any point if they aren’t serving the individual’s needs.  But selections become richer by the micro feedback on specific content items.  Examples of such micro feedback include:

  • mute or skip
  • reorder prioritization of content streams
  • likes /  dislikes ( or more like this / less like this)
  • now /  later prioritization (viewed now verses read later)
  • most saved articles or videos

These user signals, by themselves, aren’t sufficient to find all pertinent content, and need to be combined with convential secondary data found through social, segment and collective usage.    Incorporating user signals fine-tunes the individual relevance of the content.  Sometimes the relevance to individuals can be about the qualities of the content, rather than whether the content is on-topic.  People interested in the same topic can differ about tastes, such as the content’s style (way content is presented), point of view a topic, and specific themes addressed.  These subtle dimensions are hard for individuals to articulate, but easy for individuals to notice and to react to.  By listening to what individuals say about these dimensions, brands can learn much about their emotional preferences.

How brands can benefit

In the early days of the Web, the concept of “intelligent agents” was a popular approach researchers hoped would help individuals find what they wanted.  In a representative article called “How to Personalize the Web,”  IBM researchers expressed optimism that “agents can personalize otherwise impersonal computational systems.”    Interest in agents faded because most users at the time were anonymous, and no one could figure out how to profit from agents when content was treated as a disposable commodity. Today content is king, and individuals consume their media on their personal devices.

The rise of streaming content, and the desire to control the fire hose it offers, has renewed attention to the need for reader-defined discovery and filtering of content.  Brands can capitalize on this.

Agents are making a stealth reemergence in the form of content personal aggregation apps.  As people aggregate content based on their own interests, they make statements about their preferences that can be used to offer content that matches their preferences.  Brands are also aggregating content through curation.  Such content curation can be an effective approach for connecting with audiences, but it is often based on hunches and crude analytics.  Insights into the actual interest of individuals, what they feel about content as expressed through their micro feedback, would be more effective.

The other promising area for micro feedback is in the area of discovery.  Content providers and consumers both recognize the discovering new content that one wasn’t consciously seeking is difficult to do well.  Big data can potentially offer some insights, but people want to feel they, not the machine, are driving the discovery.  Showing new things to people who have not shown prior interest in something is risky, and involves a lot of trial and error that looks clumsy to people.  People may push back on being typecast, or feel that such content is reflecting the provider’s interests, rather than their own.  It violates the idea that the individual has control over the content they view.  So brands that present discovery well, by introducing serendipity in a measured way that doesn’t seem forced, will earn credibility with audiences.  Individuals want the same choice and control.  Providing opportunities for micro feedback on suggestions is doubly important.

The convergence of curation, discovery and personalization presents many opportunities for brands.  An obvious opportunity would be to offer apps that focus on specific topics of interest to customers, and enable individuals to curate content from different sources, including the brand’s.  Such deep knowledge of a person’s interests is highly valuable for brands.  They can learn much about their customers at the same time they make their customers feel valued as individuals.  By putting choice at the center of the experience, the brand makes their customer the hero.

— Michael Andrews

Don’t build your personalization on data exhaust

A lot of content that looks like it’s just for you, isn’t just for you.  You are instead seeing content for a category segment in which you have been placed.  Such targeting is a useful and effective approach for marketers, but it shouldn’t be confused with personalization.   The choice of what people see rests entirely with the content provider.

When providers both rely on exclusively their own judgments, and base those judgments on how they read the behaviors of groups of people, they are prone to error.  Despite sophisticated statistical techniques and truly formidable computational powers, content algorithms can appear to individuals as clueless and unconcerned.  To understand why the status quo is not good enough, we first need to understand the limitations of current approaches based on web usage mining.

Targeting predefined outcomes

Increasingly, different people see different views of content.   Backend systems use rules to make decisions concerning what to present to offer such variation.  The goal is a simple one: to increase the likelihood that content presented will be clicked on.  It is assumed that if the content is clicked on, everyone is happy.  But depending on the nature of the content, the provider may be more happy — get more benefit —  than the viewer by the act of clicking, and as a consequence present content with only a minor chance of being clicked.

A business user who is viewing a vendor sales website may see specific content, based on the vendor’s ability to recognize the user’s IP address.  The vendor could decide to present content about how the business user’s competitor is using the vendor’s product.  The targeted user is in a segment: a sales prospect in a certain industry.  Such a content presentation reflects the targeting of a type of customer based on their characteristics.  It may or may not be relevant to the viewer coming to the site (the viewer may be looking for something else, and does not care about what’s being presented).  The content presentation does not reflect any declared preference by the site visitor.  Indeed, officially, the site visitor is anonymous, and it is only through the IP address combined with database information from a product such as Demandbase that the inference of who is visiting is made.  This is a fairly common situation: guessing who is looking for content, and then guessing what they want, or at least, what they might be willing to notice.

Targeted ads are often described as personalized, but a targeted ad is simply a content variation that is presented when the viewer matches certain characteristics.  Even when the ad you see tested better with others in a segment of people who are like you, the ad you see is merely optimized (the option that scored highest) not personalized, reflecting your preferences.   In many respects it is silly to talk about advertising as personalized, since it is rare for individuals to state advertising preferences.

The behavioral mechanisms behind content targeting resemble in many respects other content ranking and filtering techniques used for prioritizing search results and making recommendations.  These techniques, whether they involve user-user collaborative filtering, or page-ranking, aim to prioritize the content based on other people’s use of the content. They employ web usage mining to guess what will get most clicked.

What analytics measure

It is important to bear in mind that analytics measure actions that matter to brands, and not actions that matter to individuals.  The analytics discipline tends to provide the most generous interpretation of a behavior to match the story the brand wants to hear, rather than the story the audience member experiences.  Take the widely embraced premise that every click is an expression of interest.  Many people may click on a link, but quickly abandon the page they are taken to.  The brand will think: they are really interested in what we have, but the copy was bad so they left, so we need to improve the copy.  The audience may think: that was a misleading link title and the brand wasted my time; it needs to be more honest.  The link was clicked, but we can’t be sure of the intent of the clicking, so we don’t know what the interest was.

Even brands that practice self awareness are susceptible to misreading analytics.  The signals analyzed are by-products of activity, but the individual’s mind is a black box.  More extensive tracking and data won’t reliably deliver to individuals what they seek when individual preferences are ignored.

Why behavioral modeling can be tenuous

There are several important limitations of behavioral data.  The behavioral data can be thin, misleading, flattened, or noisy.

Thin data

One of the major weaknesses of behavioral data is when there isn’t sufficient data on which to base content prioritization or recommendations.  Digital platforms are supposed to enable access to the “long tail” of content, the millions of items that physical media couldn’t cope with.  But discovery of that content is a problem unsolved by behavioral data, since most of it has little or no history of activity by people similar to any one individual.  If only 20 per cent of content accounts for 80 per cent of activity, then 80 per cent of content has little activity on which to base recommendations.  It may nonetheless be of interest to individuals. Significantly, the content that is most likely to matter to an individual may be what is most unique to them, since special interests strongly define the identity of the individual.  But what matters most to an individual can be precisely what matters least to the crowd overall.  Content providers try to compensate for thin data by aggregating categories and segments at even higher levels, but the results are often widely off the mark.

Misleading signals

Even when there is sufficient data, it can be misleading.  The analytics discipline confuses matters by equating traffic volume with “popularity.”  Content that is most consumed is not necessarily most popular, if we take popularity to mean liked rather than used.  A simple scroll through YouTube confirms this.  Some widely viewed videos draw strong negative comments due to their controversy.  Other may get a respectable number of views but little reaction from likes or dislikes.  And sometimes a highly personal video, say a clip of someone’s wedding, will appeal to only a small segment but will get an enthusiastic response from its viewers.

Analytics professionals may automatically assume that content that is not consumed is not liked, but that isn’t necessarily true.  Behavioral data can tell us nothing about whether someone will like content when a backend system has no knowledge of it having been consumed previously.  We don’t know their interests, only their behavior.

Past behavior does not always indicate current intent.  Log into Google and search intensively about a topic, and you may find Google wants to keep offering content results you no longer want, because it prioritizes items similar to ones you have viewed previously.  The person’s interests and goals have evolved faster than the algorithm’s ability to adapt to those changes.

Perversely, sometimes people consume content they are not satisfied with because they’ve been unable to find anything better.  The data signal assumes they are happy with it, but they may in fact be wanting something more specific.  This problem will be more acute as content consumption becomes increasingly driven by automatic feeds.

Flattened data

People get “averaged” when they are lumped into segment categories.  Their profile is flattened in the process — the data is mixed with other people’s data to the point that it doesn’t reflect the individual’s interests.  Not only can their individual interests be lost, but spurious interests can be presumed of them.

Whether segmentation is demographic or behavioral, individuals are grouped into segments that share characteristics.  Sometimes people with shared characteristics will be more likely to share common interests and content preferences.   But there is plenty of room to make mistaken assumptions.  That luxury car owners over-index on interest in golf does not translate into a solid recommendation for an individual.  Some advertisers have explored the relationship between music tastes and other preferences.  For example, country music lovers have a stronger than average tendency to be Republican voters in the United States.  But it can be very dangerous for a brand to present potentially loaded assumptions to individuals when there’s a reasonable chance it’s wrong.

Even people who exhibit the same content behaviors may have different priorities.  Many people check the weather, but not all care about the same level of detail.  As screens proliferate, the intensity of engagement diminishes, as attention gets scattered across different devices.  Observable behavior becomes a weaker signal of actual attention and interest.  Tracking what one does, does not tells us whether to give an individual more or less content, so the system assumes the quantity is right.

Noisy social data

Social media connections are a popular way to score users, and social media platforms argue that people who are connected are similar, like similar things, and influence each other.  Unfortunately, these assumptions are more true for in-person relationships than for online ones.  People have too many connections to other people in social channels for there to be a high degree of correlation of interests, or influence between them.  There is of course some, but it isn’t as strong as the models would hope.  These models mistake tendencies observable at an aggregated level, with predictability at the level of an individual.

Social grouping can be a basis for inferring the interests of a specific individual, provided people you know share your interests to a high degree, so you will want to view things they have viewed or recommend viewing.  That is most true for common, undifferentiated interests.  Some social groups, notably among teens, can have a strong tendency toward herd behavior.  But the strength and relevance of social ties cannot be assumed without knowing the context of the relationship.  One’s poker buddies won’t necessarily share one’s interests in religion or music.  Unless both the basis of the group and the topic of content are the same, it can be hard to assume an overlap.  And even when interests are similar, they intensity of interest can vary.

Social targeting of content considers the following:

  • how much you interact with a social connection
  • how widely viewed an item is, especially for people deemed similar to you
  • what actions your social connections take with respect to different kinds of content
  • what actions you take relating to a source of content

While it is obvious that these kinds of information can be pertinent, they are often only weakly suggestive of what an individual wants to view.  It is easy for unrelated inputs to be summed together to prioritize content that has no intrinsic basis for being relevant: your social connection “liked” this photo of a cat, and you viewed several photos last week and talk often to your friend, so you are seeing this cat photo.

At the level of personalization, it’s flawed to assume that one’s friends interests are the same as one’s own.  There can be a correlation, but in many cases it will be a very weak one. Social behavioral researchers are now exploring a concept of social affinity instead of social distance to strengthen the correlation.  But the weakness of predicting what you want according to who your acquaintances are will remain.

Mind-reading is difficult

The most recent hope for reading into the minds of individuals involves contextualization.  The assumption behind contextualization is that if everything is known about an individual, then their preferences for content can be predicted.  Not surprisingly, this paradigm is presented in a way that highlights the convenience of having information you need readily available.  It is, of course, perfectly possible to take contextual information and use this against the interests of an individual.  Office workers are known to ask for urgent decisions from their bosses knowing their boss is on her way to a meeting and can’t wait to provide a more considered analysis.  Any opportunistic use of contextual information about an individual by someone else is clearly an example of the individual losing control.

Contextual information can be wrong or unhelpful.  The first widespread example of contextual content was the now infamous Microsoft Clippy, which asked “it looks like you are about to write a letter…”   Clippy was harmless, but hated, because people felt a lack of control over his appearance.

Even with the best of intentions, brands have ample room to misjudge the intentions of an individual.

Can content preferences be predicted?

The problem with relying on behavior to predict individual content preferences comes down to time frame.  Because targeting treats individuals as members of a category of people, it ignores the specific circumstances that time introduces.  People may be interested in content on a topic, but not necessarily at the time the provider presents it.  The provider responds by trying again, or trying some other topic, but in either case may have missed an opportunity to understand the individual’s real interest in the content presented.  People may pass on viewing content they have a general interest in.  They think “not now” (it’s not the best time) or “not yet” (I have more urgent priorities).  Often times readiness comes down to the mood of the individual, which even contextualization can’t factor in.  Over time a person may desire content about something, but they don’t care to click when the provider is offering it too them.

If the viewer doesn’t have a choice over what they see, it’s not personalized.

A better way

There are better approaches to personalization.  The big data approach of aggregating lots of behavioral data has been widely celebrated as mining gold from “data exhaust.”  Data exhaust can have some value, but is a poor basis for a brand’s relationship with its customers.  People need to feel some control, and not as if they are being tracked for their exhaust.  Brands need an alternative approach to personalization not only to build better relationships, but to increase their understanding of their audiences so they can serve them more profitably.  In the following post, I will discuss how to put the person back into personalization.

— Michael Andrews

The urgency of genuine personalization

Personalization may be the most misused phrase relating to content.  It’s not hard to understand why people want to talk about personalization: it’s appealing to think you’ll see exactly what you want, especially has we get deluged with content.  But paradoxically most techniques of personalization actually involve tailoring content based on what other people do, rather than your own interests.  As a result, people miss out on content they might most want to view.

To get a sense of why personalization is so urgent, and so troublesome, consider the situation of Facebook.

Mark Zuckerberg said last year that Facebook wants to be “the best personalized newspaper in the world.”  Notwithstanding Facebook’s popularity, its users complain about the lack of relevancy for much of the content they see.  Facebook is optimized for promotion of content sharing, not for filtering of content based on individual preferences.  Those two goals to a large extent are in conflict with each other.  Facebook has chosen to fund its revenues through advertising and related services, which  accounts for about 90% of total revenues.  Brands want their content viewed and shared and their ads seen, so keeping them happy is a huge priority.   These various pressures come to a head with Facebook’s News Feed.  On average, a person may have 1500 potential messages Facebook considers relevant to them, and Facebook needs to prioritize these into a manageable quantity (they’ve decided that’s about 300).  “The News Feed algorithm responds to signals from you” Facebook explains.  But many signals seem to have little to do with the individual, and more to do with other parties: the interests of friends, strangers, and advertisers.   Factors influencing ranking include “number of comments, who posted the story, and what type of post it is (ex: photo, video, status update, etc.)” and “promoted posts.”  Facebook routinely revises its algorithm based on what tests show people in general like best.  But the options for the individual to choose what he or she wants specifically are few.  Facebook decides what types of content, and items of content, are most relevant, and individuals don’t get much choice in the matter.

What do we mean by personalization?

In the digital content arena, personalization lacks any widely agreed definition. As Aria Haghighi of Prismatic notes: “personalization is really young. I still think we don’t all agree necessarily on what personalization means.”   Part of the lack of agreement involves how to implement personalization on a technical level, but also it reflects a lack of common vision from providers about what they want personalization to offer.

Digital marketers generally refer to personalization as behavioral targeting, and often use the terms interchangeably.  Big data researchers typically see personalization as adapting content results on the basis of machine learning.  Curiously, the protagonist of the story, the person seeking content, is missing from the personalization discussion.  Instead, the discussion is centered on how to improve click through rates.

Serving people better should be the core reason for personalization, and it’s important to build a commonsense definition, rather than a mathematical one.  Merrium Webster defines personalize as “to mark (something) in a way that shows it belongs to a particular person.”  The key elements are that it is individual to a person, that the person owns that something.  Ownership implies having control.

My definition of personalization is when a person gets unique content that reflects their individual preferences.  Targeting, in contrast, is when a person gets non-unique content based on characteristics they share with others.  In both cases, the content provider prioritizes what content is delivered, but in the first case it is based on first-hand knowledge of what an individual is interested in, where in the second case, it is based on second-hand assumptions about what seems relevant to the individual.

Personalization is based on explicit individual preferences, not assumptions

To understand personalization, we need to separate two dimensions:

  • whether the “signal” is about the individual himself, or about the interests of a crowd who are assumed to be similar to the individual
  • whether the “signal” is an explicit expression of interest, or an implicit assumptions based on prior behaviors

The following table shows how different signals can be aligned with either individual or crowd, and can be either explicit or implicit:

Table showing how different kinds of explicit and implicit preferences and behaviors can influence content delivery
Table showing how different kinds of explicit and implicit preferences and behaviors can influence content delivery

I’ve simplified the classic approaches of Facebook, Amazon, and Google to highlight elements that are most salient in their respective approaches.  In practice, each uses a mixture of signals to rank and filter content, crunching hundreds of different largely behavioral signals.  These high volume content providers, and others who are far smaller,  offer individuals the impression that the results are personalized (described as being “for you”) when they are primarily based on the aggregation of data across users, rather than individual feedback.  While these aggregation techniques improve general relevance (fewer inappropriate items), I don’t believe these behavior-driven approaches are sufficient to give individuals what’s most relevant to them personally.

How to implement personalization

Personalization matters because it is the only way individuals will be able to cope with the volume of content they face now and in the future.  Too much information is the problem, and genuine personalization needs to be the solution.  Brands need to help individuals connect with the content they most want, and not simply content that’s an adequate fit.  To do that, they need to ask questions, and not just make assumptions.

Lots of content providers talk about offering personalization, but the techniques they rely on have big weaknesses.  In future posts, I will discuss why big data approaches can’t solve personalization, and why small data using individual feedback is essential.

—Michael Andrews