Categories
Content Engineering

Taxonomy makes the world go round

Taxonomy, like the money supply or the ozone layer, is a stealthy — yet fundamental — concept in our lives that, unfortunately, only a few people know much about.  Taxonomy makes the world go around: all kinds of things would stop moving without it.  But only a small number of people — either inside or outside of the world of the web that manages our lives online — can explain what a taxonomy actually does. It’s generally a vague concept to designers and developers.  Business people often have little idea what it is or why it matters. Even many information architects, and other people with backgrounds in library science, tend to ignore the social and economic significance of taxonomies.

Taxonomy matters because it is the basis of numerous decisions that affect us. But it can be misunderstood when seen as being primarily about making individual decisions: specifically, looking up information.  Taxonomy’s true influence comes from how it supports collective, repeatable decisions and enables the aggregation of data and actions.  Rule-based decisions, whether applied by humans or machines, depend on taxonomies.  

Taxonomies shape many of the most important decisions in our world — those that are made again, and again.  Four taxonomies in particular influence the basic dimensions of our lives:

  1. What we eat
  2. How money is made and used
  3. What goods and services are produced
  4. How we evaluate our impact on the environment

All these taxonomies share a common focus on process changes as forming the basis for categorization. Why? When items change state or are susceptible to change, distinctions emerge that are useful to classify.  

Codex Alimentarius: Classifying what we eat

Eating is an elemental need: it supplies nourishment.  Something so basic seems far removed from the abstraction of taxonomy.  And few would expect a taxonomy with a Latin name to be important to the modern world economy.  

The Codex Alimenatrius means “food code.”  It is defined by a United Nations organization based in Rome (a joint effort of the FAO and WHO, in case you care about the acronyms, instead of the Latin).  The “Codex” provides a “classification of foods and feeds.”    

Please don’t get turned off by the jargon — the stakes here are important.

If you have ever wondered how we can safely eat food items produced from around the world, it is thanks to the Codex Alimentarius. It defines in great detail criteria for food commodities, specifying their composition and safety.   The Codex provides a detailed set of guidelines relating to food covering:

  • Standards covering processed, semi-processed or unprocessed foods 
  • Hygienic and technological codes of practice
  • Food additives 
  • Maximum levels for pesticide residues 
  • Guidelines for contaminants

It promotes the harmonization of national regulations and encourages food producers to adopt common standards.  

The scope of food products is enormous.  To provide useful guidance, the Codex should be able to identify the universe of products needing standards.  The Codex Classification of Foods and Animal Feeds provides this needed classification.   It divides food into classes based on three dimensions:

  • How basic or changed, is it? Primary v Processed
  • Who is it for? Food for people v Feed for animals
  • What’s it made of? Plant-origin v Animal-origin

In all, five classes exist:

  1. Primary food commodities of plant-origin 
  2. Primary food commodities of animal-origin
  3. Primary animal feed commodities
  4. Processed food of plant-origin
  5. Processed food of animal-origin

Each of the five classes is divided into types that are “based on physical characteristics and traditional use and to a lesser extent on botanical or zoological associations.”  

Types are subdivided into groups and subgroups “whose members show similarities in their behavior with respect to residues and in the nature of the agricultural practices to which they are subjected.”  

Finally, within each group, different commodities are enumerated with a code identifier having a 2-letter preface followed with a 4-number suffix.   

The standards themselves are narrative documents. But what the standards address is determined by the taxonomic classification.  They can be relevant to individual commodities — but also higher-level classifications.  

Let’s explore how the Codex’s classification works by reviewing how it classifies anise, one of my favorite flavorings, an herb that has been used since ancient Roman times. Anise commodities belong to “Class A: Primary Food Commodities of Plant Origin.” 

Anise commodities more specifically belong, within Class A, to Type 05: “Herbs and Spices.”  

Multiple anise-related commodities exist (a few of them flagged in bold below), which get classified in various ways:

  • Herbs (group 27), whose commodities have a code prefix of “HH”
    • 027A Herbs (herbaceous plants), broken into commodities with 6 character codes, for example:
      • HH 3191 Anise, leaves
    • 027B Leaves of woody plants (leaves of shrubs and trees)
    • 027C Edible flowers
  • Spices (group 28), whose commodities have a code prefix of “HS”
    • 028A Spices, seeds
      • HS 0771 Anise, seed
    • 028B  Spices, fruit or berry
      • HS 3303 Anise pepper
    • 028C Spices, bark
    • 028D Spices, root or rhizome
    • 028E Spices, buds
    • 028F Flower or stigma
    • 028G Spices, aril
    • 028H Citrus peel

The Codex refers to its enumeration as a classification rather than as a taxonomy.  Within the Codex, the term “taxonomy” is reserved for biological descriptions of living plants, animals, and microbes rather than for non-living or man-made food items. Biological taxonomies are also based on identifying distinctions arising from the process of change — distinctions which are increasingly determined through DNA analysis.

Because the Codex is based on science, it is constantly evolving. Experts must account for a growing range of synthetic inputs and outputs that become available, such as lab-grown “meat.”  The classification undergoes revision as needed.

The Codex has been in place for over half a century, achieving something remarkable: it’s made possible the confident trade of food around the world.  We can buy imported food that’s labeled clearly and meets agreed standards. It’s enriched us all, whether we get access to better and less expensive food, or simply have more varied and interesting options.     

The GAAP taxonomy: Is a company profitable and viable?

If you work for a company or have invested your retirement savings in companies, you want to know if a company is profitable.  

And you need to know their statements are reliable. Several high-flying firms, including Wirecard in Germany, Greensill Capital in the UK, and Evergrande in China, have collapsed quite suddenly in recent months.  People need to trust financial data.  

The GAAP (Generally Accepted Accounting Principles) has been the accounting standard for many decades.  But only in 2013 was the GAAP transformed into taxonomy: the US GAAP Financial Reporting Taxonomy (UGT), based on the Extensible Business Reporting Language (XBRL) markup.  

If the GAAP has been around for decades, why is it suddenly a big deal that it’s now a taxonomy?

This development has promoted much greater transparency and global harmonization in financial reporting.    

Accounting, fundamentally, is about the categorization of income and expenses, and of assets and liabilities.  These can be split or lumped according to the granularity sought.  Accounting is taxonomic in its orientation.  Providing precision enables traceability. 

The GAAP taxonomy, when encoded with markup, transforms difficult-to-analyze documents into structured data that can be compared.  

Bob Vause, in his book the Guide to Analyzing Companies, notes the power of the taxonomy to describe individual financial items.  “This involves ‘tags’ — tagging all the individual items of information appearing in financial reports.”

The GAAP taxonomy is being adopted globally, sometimes unofficially, which is promoting a global convergence in approaches.  This is an important development, as different counties sometimes use divergent terminology when describing financial events. Those inconsistencies have hindered the comparison of companies operating in different markets.  

Today, the differences in terminology are less important than in the past because the taxonomy tags provide a standard way to present financial data.  Vause notes: “There will be an international standardised form of presentation; language or terminology will no longer be a barrier to analysis.” He adds: “Moves towards the standardisation of financial report information are leading to significant improvements in the quality, value and accessibility of corporate financial information.”

The GAAP taxonomy helps financial analysts track inputs and outputs, and follow transfers and allocations.  

NAICS taxonomy: what’s the composition of the economy?

It’s difficult to know what’s important in the economy, such as how big a sector is or how fast it is growing.  Some sectors seem to get more media attention than others.  And it’s not always clear what is different about sectors.  For example, is fintech different from banking?  Companies and governments need to understand changes in the economy when making investment decisions.  

Industrial classification taxonomies help analysts dissect the complex composition of an economy.  Since 1997, the United States, Canada, and Mexico have agreed to follow a common system to classify businesses called the NAICS (the North American Industry Classification System). The EU uses a similar classification called the NACE.  Governments define these taxonomies and use them to collect all kinds of data.  By being structured as a  taxonomy, with broader and narrower classifications, it’s possible to aggregate and dissect data about production, orders, hiring, and other activities.  

According to the official NAICS Manual, “NAICS divides the economy into 20 sectors. Industries within these sectors are grouped according to the production criterion.”  But industries could be services as well as manufacturing. For “activities where human capital is the major input… each [is] defined by the expertise and training of the service provider.” 

The NAICS uses hierarchical numeric codes, an approach referred to in the field of taxonomy as “expressive notation.”  These codes don’t look like the more familiar style of taxonomy that uses text labels.  But codes provide many benefits.  They provide a persistent identifier that allows the description to be revised when necessary.  And they enable expansion of the code from 2-6 digits to get the appropriate level of granularity.  They allow the retrieval of either root classes or subordinate classes (subclasses), and the comparison of equally ranked classes.  For example, the taxonomy supports retrieval of data on all performing arts groups, or it can allow the comparison of data for different kinds of performing arts groups (theater companies, dance companies, or music groups). 

The NAICS taxonomy

When you look at the NAICS, you might disagree with how industries are categorized.  You may think the balance or emphasis is wrong.  But the NAICS isn’t based on any single person’s opinion or even a card sort by a group of people.  It’s a highly governed taxonomy.  Persistence in categories is necessary to allow data to be tracked over time.  Even when a sector is losing importance, it is still useful to understand how it has declined.  And sometimes, obscure sectors can be more important than non-specialists realize.  They aren’t scoring headlines in the news, but they still matter.  

The NAICS is the product of a well-defined methodology. It classifies sectors according to how products or services are created.  “Economic units that have similar production processes are classified in the same industry, and the lines drawn between industries demarcate, to the extent practicable, differences in production processes. This supply-based, or production-oriented, economic concept was adopted for NAICS.”  Much like the GAAP, the NAICS allows analysts to track inputs and outputs.  

The Green taxonomy: classifying climate impacts

Lastly, we turn our attention to another thing humans produce, but one that’s less desirable: greenhouse emissions.  

As I write this, world leaders will soon gather in Glasgow for COP26, the UN Climate Change Conference.  Countries and companies are pledging to be carbon-neutral or even carbon-negative by specific dates.  But what do these commitments really mean?  How will emissions and outsets be calculated?  And how will everyone agree these commitments are measured?  

These important questions will require a slightly longer explanation. 

Over the past year, momentum has been growing for climate change-focused taxonomies. These initiatives classify energy usage according to their emissions. They are discussed using various terms:

  • Green taxonomy
  • Climate taxonomy 
  • Sustainable finance taxonomy   

But how can a taxonomy influence the climate?  In many of the same ways that taxonomies influence other human activities such as food, finance, and the economy.  Taxonomies can help individuals understand processes relating to what we produce and consume. And they can help institutions follow common standards to track and direct energy usage behavior.  

A major motivation for green taxonomies is capital allocation: both the capital used in market debt financing of infrastructure projects and the institutional equity investments into energy-consuming enterprises.  Power plants and grids are expensive.  Converting existing infrastructure to be more green forms is as well. The aspiration is to create new financial products such as green bonds or sustainable index funds to promote such investments.

EU environmental taxonomy

Multiple green taxonomy initiatives are underway globally, which vary in their designs.  At the moment, the bulk of media attention has been focused on the EU’s environmental taxonomy, which is currently in draft.  Some reports suggest it may be finalized by the end of 2021 and that China might join the EU initiative in some capacity.  The ASEAN group of Southeast Asian countries is working on their own taxonomy that is heavily modeled on the EU one.

The EU’s environmental taxonomy is based on the NACE, their industrial classification system. It looks at various economic sectors and  identifies “sustainable” activities within each sector.  

The Commission states: “A common language and a clear definition of what is ‘sustainable’ is needed. This is why the action plan on financing sustainable growth called for the creation of a common classification system for sustainable economic activities, or an ‘EU taxonomy’.”

“The Taxonomy sets performance thresholds (referred to as ‘technical screening criteria’) for economic activities.”

Member states and financial institutions within the EU will be required to report on the sustainability of projects they sponsor.  Those activities deemed sustainable will presumably be more attractive to long-term investors.    

The Commission’s “green list” of those activities it has decided are sustainable has created some controversy.

Notably, the Commission has been unable to decide whether nuclear power and natural gas are environmentally desirable.

In the United States, interest in green taxonomies has also been growing.  Compared with the EU, it is more driven by interest from institutional investors, though US government regulators have also called for better and more consistent standards.

The EU’s approach to a green taxonomy is prescriptive — indicating what activities are sustainable.  The US approach, in contrast, is more descriptive — classifying a range of activities that could support or hinder sustainability and promoting reporting of harms and well as benefits.  

The SASB taxonomy in the United States

In the United States, green taxonomies fall under the broader umbrella of “ESG” (Environment, Social, and Governance) — that is, non-financial information relating to company performance.  US investors increasingly favor ESG-positive firms and funds.  But ESG indicators haven’t been standardized across companies, which has triggered concerns from the US General Accountability Office. The ESG landscape has emerged organically, which partly explains their uneven application.  But some enterprises and investment funds have been faulted by environmentalists and financial regulators for using imprecise standards and metrics to engage in “greenwashing.”  

“Investors are using ESG-related information to make investment decisions and to allocate capital more than ever before. They are increasingly looking for sustainable investments, albeit investors have different thoughts about what ‘sustainability’ means.” 

Securities and Exchange Commissioner Caroline A. Crenshaw

Commissioner Crenshaw adds: “To be useful to investors, disclosures need to be meaningful. That’s particularly true for ESG-related disclosures, as they are too often inconsistent and incomparable. What we should be working toward is a clear disclosure regime that yields consistent, comparable, reliable, and understandable ESG disclosures to investors.”

ESG is broader in scope than sustainability.  But environmental metrics so far have been the major driver of ESG interest.

Following a period of competing initiatives in the US, a unified approach has coalesced under the auspices of the Sustainability Accounting Standards Board (SASB), a nonprofit organization that cooperates with several other like-minded organizations.  

The SASB has chosen a different approach to define a green taxonomy.  They have re-imagined the NAICS to focus it on environmental performance.  It has created what it calls the Sustainable Industry Classification System® (SICS®). “The differences between SICS® and traditional industry classification systems can be categorized in three types: (1) new thematic sectors; (2) new industries with unique sustainability profiles; and (3) industries classified in different sectors.”  The SICS classifies 77 industries are grouped into 11 categories.  

The SASB taxonomy

“Unlike other industry classification systems—which use common financial and market characteristics— SICS® uses sustainability profiles to group similar companies within industries and sectors.” 

“SASB Standards identify the subset of environmental, social, and governance issues most relevant to financial performance in each of 77 industries.”

In addition to drawing inspiration from the NAICS, the SASB also modeled its work on the GAAP taxonomy.  

It worked with PwC to convert SASB’s taxonomy into the XBRL format (the “SASB XBRL taxonomy”) to allow data exchange between reporting companies and investors and regulators evaluating the reporting. 

As a recent news report concludes: “SASB Standards Taxonomy in XBRL format…make[s] digital reporting simpler for issuers of environmental, social, and governance (ESG) disclosures and to improve data aggregation and analytics for investors.”

“Having the taxonomy in XBRL, an open standard used in business reporting, will enable reported metrics to be machine-readable via digital tags, and improve usefulness and comparability of ESG reports.”

The next emerging area of focus for green taxonomies is looking at what are called “scope 3” emissions, which consider emissions across the value chain of products.  Instead of looking at individual industries or companies, this work will evaluate the emissions associated with the entire lifecycle of production, consumption, and disposal.  Companies will be more accountable for the actions of their suppliers and customers.  

Taxonomies support exchange

I learned about the value of taxonomies in the late 1980s during my first full-time job after leaving grad school.  I worked as an online researcher at the US Commerce Department.  This was before Tim Berners-Lee launched the World Wide Web — there were no browsers or Google then.  But already online information providers recognized the need to categorize information to make it accessible.  Accessing information at that time was extremely expensive.

Taxonomy is a form of infrastructure, much like the undersea fiber optic cable that transmits internet traffic is.  The scholar Elizabeth Cullen Dunn talks about the role of infrastructure by introducing a Greek word, oikodomi, which she defines as“the infrastructures from which standards emerge and that shape the way they actually effect production.”  Taxonomy is oikodomi, in the sense that it shapes decisions.  

Many people lately are talking about the importance of “systems” (or “systemic” influence) to understand how things work now and how they should in the future.  That’s exactly what taxonomy is doing: defining systems.

Taxonomy standards facilitate global exchange by

  • Promoting harmonization between stakeholders in different places
  • Connecting descriptions between different fields  

Taxonomy standards connect our diverse world. They can thread together different domains: food, which influences health, companies, the economy, and the environment. They provide the transparency that can reveal the relationships between these dimensions through their precise classifications.

— Michael Andrews

Categories
Content Engineering

Lumping and Splitting in Taxonomy

Creating a taxonomy — noting which distinctions matter — often seems more art than science.  I’ve been interested in how to think about taxonomy more globally, instead of looking at it as a case-by-case judgment call.  Part of my interest here is a spin off from my interest of birding.  I’m no ornithologist, but I try to learn what I can about the nature of birds.  And species of birds, of course, are classified according to a taxonomy.  

The taxonomy for birds is among the most rigorous out there.  It is debated and litigated, sometimes over decades.  The process involves a progression of “lumps” and “splits” that recalibrate which distinctions are considered significant.  Recently the taxonomy underwent a major revision that reordered the kingdom of birds. 

In the mid-2010s, scientists changed the classification of birds to consider not only anatomical features, but DNA.  In the new ordering, eagles and falcons are not as closely related as was previously assumed. Eagles are closer to vultures, while falcons are closer to parrots.  And pigeons and flamingos are more closely related than thought previously.  Appearance alone is not enough on which to base similarity.

More closely related than you might think (Both produce milk to feed their young)

Taxonomy and Information Technology

Taxonomy doesn’t receive the attention it deserves in the IT world.  It seems subjective: vague, hard to predict, potentially the source of arguments.  Taxonomy resembles content: it may be necessary, but it is something to work around — “place taxonomy here when ready.”

But taxonomy can’t be avoided. Even though semantic technologies are becoming richer in describing the characteristics of entities, the properties of entities alone may not be enough to distinguish between types of entities.  Many entities share common properties, and even common values, so it becomes important to be able to indicate what type of entity something is.  We can describe something in terms of its physical properties such as weight, height, color and so on, and still have no idea what it is we are describing.  It can resemble the parlor game of twenty questions: a prolonged discourse that’s prone to howlers.

Classification is the bedrock of algorithms: they drive automated decisions.  Yet taxonomies are human designed.  Taxonomies lack the superficial impartiality of machine-oriented linked data or machine learning classification.  But taxonomies are useful because of their perceived limitations. They require human attention and human judgment.  That helps make data more explainable.  

Humans decide taxonomies — even when machines provide assistance finding patterns of similarity. Users of taxonomies need to understand the basis of similarity.  No matter how experienced the taxonomist or sophisticated the text analysis, the basis of a taxonomy should be explainable and repeatable ideally.  Machine-driven clustering approaches lack these qualities.  

To be durable, a taxonomy needs a reasoned basis and justification.  Business taxonomies can borrow ideas from scientific taxonomies.   

Four approaches can us help decide how to classify categories:

  1. Homology
  2. Analogy
  3. Differentia
  4. Interoperability

Homology and analogy deal with “lumping” — finding commonality among different items.  Differentia and interoperability help define “splitting” — where to break out similar things.

Homology: Discovering shared origins

Homology is a phrase taxonomists use to describe when features, while appearing different, have a common origin and original intent.  For example, mammals have limbs, but the limb could be manifested as an arm or as a flipper.  

Homology refers to cases where things start the same but go in different directions.  It can get at the core essence of a feature: what it enables, without worrying so much how it appears or precisely what it does.  Homology is helpful to find larger categories that link together different things.

There are two ways we can use homology when creating a taxonomy. 

First, we can look at the components or features of items.  We look for what they share in common that might suggest a broader capability to pay attention to.  Lots of devices have embedded microprocessors, even though these devices play different roles in our lives.  Microprocessors provide a common set of capabilities of that even allow different kinds of items to interact with one another, such as in the case of the Internet of Things (IoT).  Homology is not limited to physical items.  Many business models get copied and modified by different industries, but they share common origins and drivers. We can speak of a class of businesses using an online subscription model, for example.

Second, we can consider whole items and how they are used.  Homology can be useful when a distinct thing has more than one use, especially when it doesn’t have a single primary purpose.  Baking soda is advertised as having many purposes and some consumers like products that contain baking soda.  Here we have a category of baking soda-derived products.  In the kitchen, there are many small appliances that have a rotator on which one can attach implements.  They may be called a food processor, a blender, a mixer, or some trademarked proprietary name.  What can they do?  Many tasks: chopping vegetables, making dough, making soups, smoothies, spreads…the list is endless.  But the most seem to be about pulverizing and mixing ingredients.  It’s a broad class of gadgets that share many capabilities, though they scatter in what they offer as they seek to differentiate themselves.

But there’s another approach to lumping things: analogy.    

Analogy: Discovering shared functions

We use analogies all the time in our daily conversation.  Taxonomists focus on what analogies reveal.  

Analogy helps identify things that are functionally similar, and might share a category as a result.

Analogy is the opposite of homology. With analogy, two things start from a different place, but produce a similar result.  For example, the wings of bees and wings of birds are analogous.  They are similar in their function, but different in their origin and details.  Analogies capture common affordances: where different things can be used in similar ways

Analogies are most useful when defining mental categories, such as devices to watch video, or places to go on a first date.  It’s the most subjective kind of taxonomy: different people need to hold similar views in order for these categories to be credible.

Contrasting homology and analogy, we can see two concepts, which represent notions of convergence (from differences to similarity) and divergence (from similarity to differences).

The other end of taxonomy is not about lumping things into broader categories, but splitting them into smaller ones.

Differentia: Defining Segments

Taxonomists talk about differentia (Latin for difference), which is broadly similar to what marketers refer to as segmentation.

Aristotle defined humans as animals capable of articulated speech. His formulation provided a structural pattern still used in taxonomy today:

  • A species equals a genus plus differentia

That is, the differences within a genus define individual species.  

To put it in more general terms: 

  • A segment is a group plus its distinguishing characteristics (its epithet)

A group gets divided into segments based on distinguishing characteristics.  The differentia separates members from other members.  

One of the most popular marketing segmentations relates to generational differences. In the United States, people born after the Second World War are segmented into 4 groups by age.  Other countries use similar segments, but it is not a universal segmentation so I will focus specifically on US nationals.  A common segmentation (with the exact years sometimes varying slightly) is:

  • Generation W (aka “Boomers”): American nationals born between 1946 and 1964
  • Generation X: American nationals born between 1965 and 1980
  • Generation Y (aka “Millennials”): American nationals born between 1981 and 1996
  • Generation Z: American nationals born since 1997

Such segmentation has the virtue of creating category segments that are comprehensive (no item is without a category) and mutually exclusive (no item belongs to more than one category).  It’s clean, though it is not necessarily correct — in the sense that the categories identify what most matters.  

Segments won’t be valuable if the distinctions on which they are based aren’t that important.  A segment could comprise things with a common characteristic that are otherwise quite diverse.  It’s possible for segment to be designed around an incidental characteristic that makes different things seem similar.

The point of differentia is to represent a defining characteristic. Differentia is valuable when it helps us think through which distinctions matter and are valid.  For example, we might segment people by eye color.  But that hardly seems an important way to segment people. Such segmentation encourages us to refine the group we are segmenting.  Eye color is of interest to makers of tinted contact lenses.  But even then, eye color is not a defining characteristic of a potential contact lens customer, even if were a relevant one.

While differentia can be hard to define durably, it can play a useful role in taxonomies.  It seems reasonable to segment aircraft according to the number of passengers they carry, for example.  It can capture one key aspect that represents many important issues.

Interoperability: Distinctions within commonality

A related issue is deciding when things are similar enough to say they are the same, and when we can say they are related but different.

Our final perspective comes from nature. The similarity of species is partly defined by their ability to mate.  Some closely related species of birds, for example, will cross breed.  Other pairs of less similar species lack that ability.  

A similar situation exists with languages.  Where are the distinctions and boundaries between similar languages? And when are differences just dialects and not actually different languages?  In language, mutual-intelligibility plays a role.  (Language also involves convergence and divergence — but we’ll consider their interoperability here).

The presence or absence of connection between distinct things is associated with two overlapping but distinct concepts: 

  1. Interoperability 
  2. Substitution

Both these concepts address ways in which distinct things might be consider the “same.”

Interoperability is most often associated with technology, though it can be applied to other areas, for example, cultural norms such as religions as well.  The presence of interoperability — the ability of distinct things to connect together easily because they follow a common standard or code of operation — is an indication of their similarity.  If things interoperate — they require no change in set up to work together — then they belong to the same “family,” even if the things come from different sources. The absence of interoperability is a sign that these things may not belong together and need to be split.   

Being part of the same family does not imply they are the same.   Any distinctions would relate to the role of each thing in the family (same family, different roles).   Things that follow the same standard may be similar (same role), or they may be complements (different roles).  

If things can be substituted — they are interchangeable but require a different set up to use — they may belong to the same category, but that category may need to be broken down further.  Windows, Linux and MacOS computers can be substituted with one another  — they serve the same role — so they belong to the broader personal computer category (same role, different families).  But they are separate categories because they don’t interoperate.

The value of taxonomies

Defining taxonomies is not easy.  Interpretation is needed to spot the differences that make a difference. We can improve the discovery process by using heuristic perspectives for lumping and splitting. 

Taxonomy is valuable because it can provide a succinct way to express the significance of an entity in relation to another entities.  Sometimes we need a quick summary to boil down the essence of a thing: what’s distinctive about it, so we can see how it relates to a given situation.  Taxonomies help us overcome the fragmentation of information.  

— Michael Andrews

Categories
Content Experience

Three Perspectives on Content Identity

If you put two things together side-by-side, what do they have in common? The answer depends on the point of view.  Alternative viewpoints mold content identity differently. Designers of content experiences, such as content strategists and information architects, can use these viewpoints to surface different kinds of content relationships.

Three actors shape the identity of content: the author or curator; the audience; and the thing or things discussed in the content. Each brings its own perspective to what content is about:

  • Content identity as interpreted by an author or curator
  • Content identity as interpreted by the audience
  • Content about things that reveal dimensions of themselves

Each perspective plays a different role in framing the content experience.

Scene setting: the Curatorial Perspective

Scene setting lets people understand common themes in content that aren’t obvious. An author or curator draws on their unique knowledge to construct a theme that unifies different content items. Such themes set expectations about the relationship of content to other content. It is didactic in orientation.

A common label used to announce a theme is the series — for instance, a TV series, or a narrative trilogy. Sometimes the series is just a way to divide up something into smaller parts, but keep them connected: an article becomes a two-part article.  A content series can express how different items are related according to the intentions of the author or the interpretation of a curator. They can be a sequence of items presented on a common theme. The series may present the evolution of the item over time, such as versions. A building architect might show a series of images starting with a sketch, then a foam model, and finally a photo of the finished building.

A series presents a collection of items and shows how they belong together.  The author/curator draws on their intimate knowledge of the content to point out connections between different content items, which may not be self-evident. We find this in the museum world: an item presented is said to originally belong with other items, that have since been dispersed. A curator might indicate how several items embody a common theme, such as when similar paintings express a recurrent motif.

Art curators identify series of related Van Gogh paintings (via Wikipedia). These three are more similar than others he painted on the same subject.
Art curators identify series of related Van Gogh paintings (via Wikipedia). These three are more similar than others he painted on the same subject.

Any time items are defined by the values and judgments of the author (or curator), the audience must be willing to accept that valuation as relevant.  So if a curator identifies items as “new and notable,” then the intended audience needs to buy that labeling.

Mirroring: the Audience Perspective

When mirroring, content reflects themes as seen by the audience.  It represents concepts the way audiences think about them to support attraction to the content.  Mirroring is different from the authorial perspective, which expresses the content’s intention.  The audience perspective expresses how content is imagined.

Brand names are perhaps the purest example of imagined content.  Brands have no intrinsic identity: they depend entirely on the perceptions of customers to define what they mean.  Even a conglomerate that sells many brand products can’t dictate how consumers view these brands.  The French brand house LVHM, which sells numerous luxury brand products, can’t control whether consumers consider Dior is more similar to Givenchy or to Louis Vuitton, even though it owns all three brands. In reality, Chinese consumers may have different opinions about these relationships than Italian consumers would.

Part of a dendrogram showing perceived similarities between different luxury brands, from a study at Woosuk University in Korea
Part of a dendrogram showing perceived similarities between different luxury brands, from a study at Woosuk University in Korea

High-level concepts that are meaningful to audiences should reflect how audiences perceive them. For example, people associate different kinds of experiences with different vacation activities. Is bungee jumping active-fun, adventurous, or extreme? It is best to work with the audiences’ framework of values, rather than trying to impose one on them. Card sorting is useful for eliciting subjective perceptions about the identity of things.  Yet card sorting is less reliable when defining the identity of concrete things, since it shifts the attention away from the object’s specific properties. Better, more empirical approaches are available to classify concrete items.

Discovery: Perspectives based on Item Properties

Features of items can suggest themes. Object-defined themes let the things featured in the content to speak for themselves. This involves more showing, and less telling. Properties can define identities, and reveal commonalities between different items. It promotes discovery of content relationships.

Faceted search interfaces, such as found on e-commerce sites, are the most familiar implementation of property-driven identification. People choose values for various facets (properties) of items, and get a list of items matching these values.  Using properties to identify items is especially valuable for non-text content. Some Digital Asset Management systems allow people to find images that match a certain shade of a color, regardless of what the subject of the image is.  Properties can identify similarities and relationships that might not be expected from a higher level label.   It can support more criteria-based consideration of identity.  For example, when we think of travel items — things to pack — we generally have standard things in mind: toiletries, articles of clothing, etc.  But if we start with properties, the universe of travel items expands.  We might define travel items as things that are both small and lightweight.  We discover small and lightweight versions of things we might not ordinarily pack for travel, but might enjoy having once we become aware of the option.

Generative classification of objects according to properties by P Harni, via Aalto University
Generative classification of objects according to properties by P Harni, screenshot via Aalto University

Leveraging Diverse Viewpoints

There’s more than one way to define the relationship between items of content. I sometimes see people try to make a single hierarchical taxonomy serve as both an authoritative or objective classification of content, and a user-centric classification that reflects the subjective perceptions of users, without realizing they are forcing together different kinds of content identities — one relatively stable, the other contextual and subject to change.

Content can be considered objectively as it is; authoritatively as it is intended; and subjectively as it seems to various audiences. These differences offer thematic lenses for looking at content. They can be used to help audiences connect different items of content together in different ways: setting the scene for audiences so they understand relationships better, reflecting their existing attitudes to promote attraction to items of interest, and helping them discover things they didn’t know.

— Michael Andrews