Categories
Content Engineering

Lumping and Splitting in Taxonomy

Creating a taxonomy — noting which distinctions matter — often seems more art than science.  I’ve been interested in how to think about taxonomy more globally, instead of looking at it as a case-by-case judgment call.  Part of my interest here is a spin off from my interest of birding.  I’m no ornithologist, but I try to learn what I can about the nature of birds.  And species of birds, of course, are classified according to a taxonomy.  

The taxonomy for birds is among the most rigorous out there.  It is debated and litigated, sometimes over decades.  The process involves a progression of “lumps” and “splits” that recalibrate which distinctions are considered significant.  Recently the taxonomy underwent a major revision that reordered the kingdom of birds. 

In the mid-2010s, scientists changed the classification of birds to consider not only anatomical features, but DNA.  In the new ordering, eagles and falcons are not as closely related as was previously assumed. Eagles are closer to vultures, while falcons are closer to parrots.  And pigeons and flamingos are more closely related than thought previously.  Appearance alone is not enough on which to base similarity.

More closely related than you might think (Both produce milk to feed their young)

Taxonomy and Information Technology

Taxonomy doesn’t receive the attention it deserves in the IT world.  It seems subjective: vague, hard to predict, potentially the source of arguments.  Taxonomy resembles content: it may be necessary, but it is something to work around — “place taxonomy here when ready.”

But taxonomy can’t be avoided. Even though semantic technologies are becoming richer in describing the characteristics of entities, the properties of entities alone may not be enough to distinguish between types of entities.  Many entities share common properties, and even common values, so it becomes important to be able to indicate what type of entity something is.  We can describe something in terms of its physical properties such as weight, height, color and so on, and still have no idea what it is we are describing.  It can resemble the parlor game of twenty questions: a prolonged discourse that’s prone to howlers.

Classification is the bedrock of algorithms: they drive automated decisions.  Yet taxonomies are human designed.  Taxonomies lack the superficial impartiality of machine-oriented linked data or machine learning classification.  But taxonomies are useful because of their perceived limitations. They require human attention and human judgment.  That helps make data more explainable.  

Humans decide taxonomies — even when machines provide assistance finding patterns of similarity. Users of taxonomies need to understand the basis of similarity.  No matter how experienced the taxonomist or sophisticated the text analysis, the basis of a taxonomy should be explainable and repeatable ideally.  Machine-driven clustering approaches lack these qualities.  

To be durable, a taxonomy needs a reasoned basis and justification.  Business taxonomies can borrow ideas from scientific taxonomies.   

Four approaches can us help decide how to classify categories:

  1. Homology
  2. Analogy
  3. Differentia
  4. Interoperability

Homology and analogy deal with “lumping” — finding commonality among different items.  Differentia and interoperability help define “splitting” — where to break out similar things.

Homology: Discovering shared origins

Homology is a phrase taxonomists use to describe when features, while appearing different, have a common origin and original intent.  For example, mammals have limbs, but the limb could be manifested as an arm or as a flipper.  

Homology refers to cases where things start the same but go in different directions.  It can get at the core essence of a feature: what it enables, without worrying so much how it appears or precisely what it does.  Homology is helpful to find larger categories that link together different things.

There are two ways we can use homology when creating a taxonomy. 

First, we can look at the components or features of items.  We look for what they share in common that might suggest a broader capability to pay attention to.  Lots of devices have embedded microprocessors, even though these devices play different roles in our lives.  Microprocessors provide a common set of capabilities of that even allow different kinds of items to interact with one another, such as in the case of the Internet of Things (IoT).  Homology is not limited to physical items.  Many business models get copied and modified by different industries, but they share common origins and drivers. We can speak of a class of businesses using an online subscription model, for example.

Second, we can consider whole items and how they are used.  Homology can be useful when a distinct thing has more than one use, especially when it doesn’t have a single primary purpose.  Baking soda is advertised as having many purposes and some consumers like products that contain baking soda.  Here we have a category of baking soda-derived products.  In the kitchen, there are many small appliances that have a rotator on which one can attach implements.  They may be called a food processor, a blender, a mixer, or some trademarked proprietary name.  What can they do?  Many tasks: chopping vegetables, making dough, making soups, smoothies, spreads…the list is endless.  But the most seem to be about pulverizing and mixing ingredients.  It’s a broad class of gadgets that share many capabilities, though they scatter in what they offer as they seek to differentiate themselves.

But there’s another approach to lumping things: analogy.    

Analogy: Discovering shared functions

We use analogies all the time in our daily conversation.  Taxonomists focus on what analogies reveal.  

Analogy helps identify things that are functionally similar, and might share a category as a result.

Analogy is the opposite of homology. With analogy, two things start from a different place, but produce a similar result.  For example, the wings of bees and wings of birds are analogous.  They are similar in their function, but different in their origin and details.  Analogies capture common affordances: where different things can be used in similar ways

Analogies are most useful when defining mental categories, such as devices to watch video, or places to go on a first date.  It’s the most subjective kind of taxonomy: different people need to hold similar views in order for these categories to be credible.

Contrasting homology and analogy, we can see two concepts, which represent notions of convergence (from differences to similarity) and divergence (from similarity to differences).

The other end of taxonomy is not about lumping things into broader categories, but splitting them into smaller ones.

Differentia: Defining Segments

Taxonomists talk about differentia (Latin for difference), which is broadly similar to what marketers refer to as segmentation.

Aristotle defined humans as animals capable of articulated speech. His formulation provided a structural pattern still used in taxonomy today:

  • A species equals a genus plus differentia

That is, the differences within a genus define individual species.  

To put it in more general terms: 

  • A segment is a group plus its distinguishing characteristics (its epithet)

A group gets divided into segments based on distinguishing characteristics.  The differentia separates members from other members.  

One of the most popular marketing segmentations relates to generational differences. In the United States, people born after the Second World War are segmented into 4 groups by age.  Other countries use similar segments, but it is not a universal segmentation so I will focus specifically on US nationals.  A common segmentation (with the exact years sometimes varying slightly) is:

  • Generation W (aka “Boomers”): American nationals born between 1946 and 1964
  • Generation X: American nationals born between 1965 and 1980
  • Generation Y (aka “Millennials”): American nationals born between 1981 and 1996
  • Generation Z: American nationals born since 1997

Such segmentation has the virtue of creating category segments that are comprehensive (no item is without a category) and mutually exclusive (no item belongs to more than one category).  It’s clean, though it is not necessarily correct — in the sense that the categories identify what most matters.  

Segments won’t be valuable if the distinctions on which they are based aren’t that important.  A segment could comprise things with a common characteristic that are otherwise quite diverse.  It’s possible for segment to be designed around an incidental characteristic that makes different things seem similar.

The point of differentia is to represent a defining characteristic. Differentia is valuable when it helps us think through which distinctions matter and are valid.  For example, we might segment people by eye color.  But that hardly seems an important way to segment people. Such segmentation encourages us to refine the group we are segmenting.  Eye color is of interest to makers of tinted contact lenses.  But even then, eye color is not a defining characteristic of a potential contact lens customer, even if were a relevant one.

While differentia can be hard to define durably, it can play a useful role in taxonomies.  It seems reasonable to segment aircraft according to the number of passengers they carry, for example.  It can capture one key aspect that represents many important issues.

Interoperability: Distinctions within commonality

A related issue is deciding when things are similar enough to say they are the same, and when we can say they are related but different.

Our final perspective comes from nature. The similarity of species is partly defined by their ability to mate.  Some closely related species of birds, for example, will cross breed.  Other pairs of less similar species lack that ability.  

A similar situation exists with languages.  Where are the distinctions and boundaries between similar languages? And when are differences just dialects and not actually different languages?  In language, mutual-intelligibility plays a role.  (Language also involves convergence and divergence — but we’ll consider their interoperability here).

The presence or absence of connection between distinct things is associated with two overlapping but distinct concepts: 

  1. Interoperability 
  2. Substitution

Both these concepts address ways in which distinct things might be consider the “same.”

Interoperability is most often associated with technology, though it can be applied to other areas, for example, cultural norms such as religions as well.  The presence of interoperability — the ability of distinct things to connect together easily because they follow a common standard or code of operation — is an indication of their similarity.  If things interoperate — they require no change in set up to work together — then they belong to the same “family,” even if the things come from different sources. The absence of interoperability is a sign that these things may not belong together and need to be split.   

Being part of the same family does not imply they are the same.   Any distinctions would relate to the role of each thing in the family (same family, different roles).   Things that follow the same standard may be similar (same role), or they may be complements (different roles).  

If things can be substituted — they are interchangeable but require a different set up to use — they may belong to the same category, but that category may need to be broken down further.  Windows, Linux and MacOS computers can be substituted with one another  — they serve the same role — so they belong to the broader personal computer category (same role, different families).  But they are separate categories because they don’t interoperate.

The value of taxonomies

Defining taxonomies is not easy.  Interpretation is needed to spot the differences that make a difference. We can improve the discovery process by using heuristic perspectives for lumping and splitting. 

Taxonomy is valuable because it can provide a succinct way to express the significance of an entity in relation to another entities.  Sometimes we need a quick summary to boil down the essence of a thing: what’s distinctive about it, so we can see how it relates to a given situation.  Taxonomies help us overcome the fragmentation of information.  

— Michael Andrews

Categories
Content Engineering

Designing Multi-Purpose Content

Publishers can do more with content when content is able to serve more than one purpose.  This post will provide a short introduction to how to structure content so that it’s multi-purpose. 

First let’s define what multi purpose means. Multi-purpose refers to when core information supports more than one content type. A content type is the structure of content relating to a specific purpose.  Each content type should have a distinct structure reflecting its unique purpose. But often certain essential information may be relevant to different content types. A simple example would be a company address.  The address is a content element used in many different content types such as an “About Us” profile or an event announcement about a meetup hosted by the company.  The same content element can be used in different content types. The address is a multi-purpose content element.

Scenarios where purposes overlap

Publishers have many opportunities to use the same content for different purposes. Another simple scenario can show us how this would work.

Imagine a company is about to release a new product to the market. The product is currently in beta.  The company wants to build awareness of the forthcoming product. There are three audience segments who are interested in the product:

  1. Existing customers of the company
  2. People who follow the sector the company is in, such as journalists, industry analysts, or Wall Street analysts
  3. People who are not current customers of the company but who may be interested in knowing about the company’s future plans

All these groups might be interested in information about the new product.  But each of these three groups has a slightly different reason for being interested in the information.  Even though they will all want to see mostly the same content, they each want to see something different as well.  By breaking content into components, we can separate which audience purposes are identical, and which are similar but different.  

Modeling commonalities 

One use of a content model is to indicate what information is delivered to which audience segment. For some aspects of a topic, audiences will see the same information, while for other aspects different audience segments see information that is specific to them.   

A close relationship exists between the segment for whom the content is designed, and the content type which represents the purpose of the content.   A prospective buyer of a product is probably not interested in a troubleshooting page, but an owner of the product might be.  

Even when different audience segments gravitate toward different content types, they may still share common interests and be seeking some of the same information.  

Different audience segments may have different reasons for being interested in the same basic information.  They may need to see slightly different versions because of their differences in their motivations, which could influence messages framing the significant of the information to the audience segment, and differences in the actions they may wan to take.  

Content teams can plan around what different audience segments want to do after reading the content. 

In  our example, the same basic content about the forthcoming product release can be used in three different content types. They can be used in a customer announcement, in a press release, and in a blog post. The descriptive body of each of these will be the same, conveying basic information about the forthcoming product.  

Three different content types drawing on a common, multi-purpose content element

Identifying motivations and managing these as components

When designing content, content teams should have a clear idea who is interested in this information and why.

In our example, the content presented to each segment has a different call-to-action at the end of the body. The customer announcement will include a sign-up call-to-action so that customers can try out the beta version. The press release would include a point of contact, which would provide a name, an email and a telephone number that journalist and others could reach.  The blog post wouldn’t include an active call-to-action, but it might embed social media discussion on Twitter concerning the forthcoming product release — perhaps tweets from beta customers crowing about how marvelous the new product is.  

The motivations of each audience segment can also be managed with distinct content elements in the content model.  Content teams can use content elements to plan and manage specific actions or considerations pertaining to different audience segments.

Thinking about purpose globally

Content teams tend to plan content around tasks. But when content is planned individually to support individual tasks, content teams can miss the opportunity to design the content more efficiently and effectively.  They may create content that addresses a specific audience segment and specific task.  But they’ve created single-purpose content that is difficult to manage and optimize.  

Tasks and information are related but not always tightly coupled.  Different audience segments may have common tasks, even though the information they need to support those tasks could vary in coverage or detail.  In such cases, why different segments are interested in a task could be different, or else their level of knowledge or interest could be different.  The instructions describing how to complete task could be global, but the supporting background content would be unique for different audience segments.  

Conversely, different audience segments may rely on the same information to support different tasks, as in our example.  

Content teams have an opportunity to plan the design of content using a common content model, built around common components that could descriptions, explanations, or actions.    A key aspect of designing multi-purpose content is to separate what information everyone is interested in from information that only certain segments are interested in.  Content will need to adjust to different audience segments depending on the motivations of a segment, and the opportunity the segment offers the organization publishing the content.

The design of content should consider two dimensions affecting multi-purpose content elements:

  1.  What brings these readers to view the content?  (The framing of elements that define the content type where information appears) 
  2.  What do these readers want to do next?  (The framing of the call-to-action or task instructions)

When the answers to those questions are specific to a segment, they will be unique element within the content type.  When several segments share common motivations, the component they view will be the same.

In summary, the same content can be useful to different audiences and in different situations.   Multi-purpose content can be considered the flip-side of personalization. We can separate what everyone needs to know (the multi-purpose part) from what only some people need to know (custom-purpose part).  To design multi-purpose content, one is looking for common elements to share with different segments. In personalization one is looking for specific elements targeted at specific segments.  The design of multi-purpose content considers in close detail what different segments need or want to view, and why.

— Michael Andrews 

Categories
Content Engineering

User Centric Content Models

Content doesn’t organize itself.  That’s why we have content models.  

A lot of advice about creating content models misses an important dimension: how the user fits in. Many content models are good at describing content.  But not many are very user centric.  I want to suggest some simple steps to help make content models more centered on user needs.

Two popular ways of thinking about content models are (1) that the content model is like a database for content (the technical approach), or (2) that the content model is a structural representation of a massive document (the structured authoring approach).  When combined, these approaches transform a content model into a picture of documents-as-a-database.   

Content models generally focus on showing what information is relevant to various topics.  Some models can be very sophisticated at representing the publisher’s perspective, and all the details it might want to manage.  But even in sophistical models, the needs and motivations of audiences are hard to see.  

Content models show numerous fields and values.  Each topic could become a screen that could be configured in various ways.  One CMS vendor says of content modeling: “it’s very similar to database modeling.” 

But actually, designing content to support user goals very different from designing  database to store records.  Databases are a bad analogy for how to model content.

Audiences don’t want to read a database. Even if they are interested in the topic.  A database is fine for scanning for short bits of information to get quick answers. It’s less good for integrating different fragments of information together into a meaningful whole. People need support bridging different fragments of information. 

A content model should aim to do more than show a picture of how topics can be broken into chunks.  

Neither are content models about navigation paths, as if they were a site map.   True, different chunks, when linked together, can allow users click between them.  It’s nice when users can jump between topics.  But it’s not clear why users are looking at this content to begin with. Many models may look like a collection of linked Wikipedia articles about baseball teams, baseball players, and pennant races.  It’s a model of what we could call brochure-ware.  It’s a database of different articles that reference one another.  The connections between chunks are just hyperlinks. There’s no obvious task associated with the content.  

What Users need to Know

Most explanations of content models advise publishers to model stuff that people might want to know about. I call this the stuff-to-know-about perspective. It’s a good starting point.  But it should not be the end point of the content model, as it often is.

When we look at stuff people might want to know about we start with topics. We identify topics of interest and then look at how these topics are connected to each other.

Suppose you and I are going to take a trip to a place we have never visited. Let’s imagine we are going to Yerevan in Armenia. We’d want to consult a website that presents content about the city. What might the content model look like?  As a thought experiment, we are going to simultaneously think about this situation both as content modeler and as a prospective tourist.  We’ll see if we can blend both these perspectives together.  (This technique is known as wearing two hats: switching roles, just like we do all the time in real life.)   

As content modelers, we will start with stuff we as tourists will need to know about.  We’ve never been to Yerevan and so we need to know some very basic information.

If we are going to travel there, we will want to:

  • Find a place to stay, probably a hotel 
  • Take transport within the city
  • Find restaurants to eat at
  • Visit tourist sites
  • Check out local entertainment

These user needs provide the basis for the content model. We can see five different topics that need to be covered. There needs to be profiles of:

  • Hotels
  • Transport options 
  • Tourist sites
  • Restaurants
  • Entertainment venues  

Each profile will break out specific aspects of the topic that are of most interest to readers.  Someone will need to figure out if each hotel profile will mention whether or not a pillow menu is available.  But for the moment, we will assume each profile for each topic covers  important information users are looking for, such as opening times.  

We have some topics to make into content types. But the relationship between them isn’t yet clear.

As modelers, we have identified a bunch of stuff that tourists want to know about.  But it’s not obvious how these topics are connected to one another.  It’s like we have several piles of tourism brochures: a pile on hotels, a pile on tourist sites, and so on, each stacked side by side, but separate from each other.  If you’ve ever walked into the tourist information center in a city you are visiting, and walked out with a pile of brochures, you know that this experience is not completely ideal.  There’s loads of material to sort through, and decisions to coordinate.  

Modeling to Help Users Make Decisions

If we only adopt a topic perspective, we don’t always see how topics relate to one another from the users’ perspective. It’s important for the model not only to represent stuff people need to know about. We also need the model to account for how audiences will use the content. To do this we need to dig a little deeper. 

As modelers, we need to look at the choices that users will be making when consulting the content. What decisions will users make? On what basis will users make these decisions?  We need to account for our decision criteria in the content model.

As a prospective tourist, I’ve decided that three factors influence my choices. I want to do things that are the best value, the best experience, and the most convenient.  This translates into three criteria: price, ratings, and location. 

It turns out that these factors are dimensions of most all of the topics.  As a result, information about these dimensions can connect the topics together. 

I want to go to places that are convenient to where I am staying or spending time.  All the different venues have a location.  Different venues are related to one another through their location. But we don’t have any content that talks about locations in general. This suggests to new content type: one on neighborhoods.  This content type can help to integrate content about different topics, revealing what they have in common. People both want to know what’s nearby and get a sense of what a neighborhood feels like based on what’s there.

The user’s decision criteria helps to identify additional content types, and to form connections

Many venues also so have ratings and prices. This information also presents an opportunity to connect different types of content. We can create a new content type profile for the “best of” highlights in the city.   It can show the top rated restaurants according to price category. And they can show the top rated attractions. This could be a list that links to the more detailed profiles.   We now have a way to decide how to prioritize things to do.  This content type helps users compare information about different items. 

Modeling to Help Users Act

As tourists, we now know what we want to do. But are we able to do it? 

Remember, we’ll be in Armenia. We don’t know if the familiar apps on our phones will work there. Neither of us speak Armenian, so making phone calls seems intimidating. We need a way to make sure we can actually do the things we’ve decided we want to do.

For the content to really support our visit, we want the content to give us peace of mind about the risks and disappointments we worry about. We don’t want to waste our time unnecessarily — or worse, find that we can’t do things that we had planned on doing.  We want to avoid a long queue at the museum. We want to make sure that we can get a table at a well-known restaurant.  We went to go to a show, without having to visit the venue before hand to buy the ticket.

When we consider the actions users want to take after consulting the content, we can find additional points of integration.

These needs suggest additional features that can be added to the content model. We want the ability to buy tickets after we decide to visit a museum or club. We want to be able to make reservations for a restaurant.  We want a booking widget.  A tourist website can create a widget that connects to outside services that enable these actions.  In some cases, the website can pull content from other sources to give readers the ability to see whether or not and option is available at a particular point in time.

Helping people act sometimes entails thinking about content beyond the content you’ve created yourself.   It can involve integrating with partners.

The Three Steps of Content Modeling

 This post is necessarily a very high-level and incomplete overview of content modeling.  There are many more possibilities that could be added, such as including a calendar of events and special offers.  But my goal here has been to provide some simple guidance about how to model content. 

The three steps to creating a user centric content model are:

  1. Identify the topics  that users need to know about, and what specifically about those topics matter to users
  2. Identify the criteria that users have when making decisions while consulting this content
  3. Identify what actions users want to take after consulting the content, and what additional information or features can be added to help them

This process can surface connections between different chunks of information, and help to ensure that the content model supports the customer’s journey.

— Michael Andrews