Categories
Content Engineering

All models reflect a point of view

We sometimes talk about having a “bird’s eye” perspective of a place or about a topic.  We can see how details connect to form a larger whole.  The distance highlights the patterns that may be hard to see up close.  In many ways, a content model is a bird’s eye view of content addressing topics or activities.  It’s a map of our content landscape. 

This year, many of us — having been shut indoors involuntarily and had our travel limited —  have marveled at the freedom of travel that birds enjoy.  Bird watching has become a popular pastime during the pandemic, encouraging people to consult content about birds to understand what they are newly noticing.

Content about birds provides an excellent view into the structure of content — even for people not interested in birds. Nearly 20 years ago JoAnne Hackos discussed what field guides to birds can teach us about structuring content in her book, Content Management for Dynamic Web Delivery.  Information about birds offers a rich topic to explore content structure.

 Last year, I conducted a workshop exploring content structuring decisions by comparing how field guides describe bird species.  I have a collection of field guides about Indian birds from my time living in India.  They all broadly cover the same material, but the precise coverage of each varies.  Some will talk about habitats, others will discuss diet.  They will get into different levels of detail.

Various guides to Indian birds
Various guides to Indian birds

As someone interested in birds, I noticed how these field guides differed when discussing the same bird species.  Even the images of a bird varied greatly: whether they are paintings or photos, if they are in context or not, and whether distinguishing features are described in the text or as annotations.  Image choices have profound implications for what information is conveyed.

All this variation reveals something that’s obvious when you see it: there’s no one way to describe a bird.  People make editorial choices when they structure content.  That’s a very different way of thinking about content structure than the notion of domain modeling, which assumes that things we describe have intrinsic characteristics that can be modeled objectively.  Domain modeling presumes there’s a platonic ideal that we can discover to describe things in our world. It can provide some conceptual scaffolding for identifying the relationships between concrete facts associated with a topic. Domain modeling is best approached from the viewpoint of different personas.  Why do they care about any of these facts?  And importantly, might they care about different facts?

Without doubt, it’s valuable to get outside of our subjective ways of seeing to consider other viewpoints.  But that doesn’t imply there’s a single viewpoint that is definitive or optimal.  Content modeling is not the same thing as data modeling, just as content isn’t simply data.  The limitation of domain modeling is that it doesn’t provide any means to distinguish more important facts from less important ones.  And it treats content as data, where facts are readily reduced to a few concrete objective statements rather than involving descriptive interpretations or analysis.  With a domain model, it’s not obvious why people care about any of the data.  Before we can build experiences from content, we need the basic content to be interesting and relevant. Relevance and interest have been overlooked in many discussions about content modeling.   

While I see richness in small editorial decisions among different field guides, a person not interested in birding may find these distinctions as unimportant.  To a casual viewer, all field guides look similar.  There’s a generic template that field guides seem to follow to describe birds.  Here, the editorial decisions have emerged over time as a common framework that’s widely accepted and expected.  Some people will view the task of structuring content as one of finding an existing framework that’s known, and copying it.  Instead of searching for the intrinsic properties of things that can be described, the search is finding patterns already used in the content.  Adopting vernacular structural patterns has much to commend it.  These patterns are familiar, and in many cases work fine.  But relying on habit can also cause us to miss opportunities to enrich how to describe things.  They may satisfy a basic need, but they don’t necessarily do so optimally.  

The most disruptive development to influence field guides is the smartphone app.  These apps shake up how they approach birds and can incorporate image and audio recognition to aid in identifying a species.   They can be marvelously clever in what they can do, but they too represent editorial decisions: a focus on a transactional task rather than a deeper look into context and comparison.  The more that content exists to support a repetitive transactional task, the more tightly prescribed the content will be in a model.  If you consider content as existing only to support narrow transactional needs, the structure of the model might seem obvious, because what else would someone care about?  Such a model presumes zero motivation on the part of the reader: they only want to view content that is necessary for completing a task.  

Field guides exist to aid the identification of birds.  Deciding what information is important — or is readily available — to aid the identification of birds still involves an editorial judgment.  

Topics don’t have intrinsic structures.  The structure depends in part on the tasks associated with the topic.  And the more that a topic requires the motivation of the reader — their interest, preferences, and choices outside of a narrow task supported within the UI — the more important the editorial dimensions of the content model become.    The model needs to express elements that tell readers why they would care about the topic: what they should learn or see as important and relevant.

Importantly, the discussion of a topic is not limited to a specific genre.  Species of birds can be discussed in ways other than field guides.  Within a genre,  you can change the structure associated with it.  But you can switch genres as well, and embrace a different structure entirely.

When we consider structures within genres, we may be inclined to confuse its form with its intent.  The importance of genre is that it provides a point of view to elucidate a topic.

Picture books provide an alternative genre to present content about birds.  They involve the tight juxtaposition of words and images, often to provide a richer narrative about a topic.  Beyond that, the structure involved is not standardized or well defined.  The genre is most often associated with children’s books and exemplified by outstanding writer-illustrators such as Maurice Sendak, Dr. Seuss, and Quintin Blake.

But picture books are not limited to children’s entertainment.  They can potentially be used for any sort of topic and any audience.

This spring a new picture book, What it’s like to be a Bird, was released that became an instant bestseller.  It was produced by David Sibley, a renowned ornithologist and illustrator.  “My original idea, in the early 2000s, was to produce a bird guide for kids. Then I started thinking about it as a bird guide for beginners of any age.  But having created a comprehensive North American bird guide, the concept of a ‘simplified’ guide never clicked for me. Instead, I wanted to make a broader introduction to birds.”

His goal is to “give readers some sense of what it’s like to be a bird…My growing sense as I worked on this book is that instinct must motivate a bird by feelings — of satisfaction, anxiety, pride, etc…how else do we explain the complex decision that birds make everyday…”  Sibley wants to capture the bird’s experience making decisions on its life journey.  A wonderful backdrop as we think about the reader’s experience on the journey through his book.

“Each essay focuses on one particular detail…they are meant to be read individually, not necessarily in sequence — everything is interconnected, and there are frequent cross-references suggesting which essay to read next.”    We can see how Sibley has planned a content model for his material.

Bolstered by his talents as a subject expert, writer, and illustrator, he’s been able to rethink how to present content about birds.  While he’s also written a conventional field guide to birds, his new book explores species through their behavior.  His is not the only recent book looking at bird behavior, but the perspective he offers is unique.  Rather than organizing content around behavior themes such as mating, he focuses the content on species of birds and then talks about two or three key behaviors they have that are interesting.  The birds act as protagonists in stories about their life situation.  

Sibley’s book profiles various species of birds — something field guides do as well. But Sibley’s profiles explain the bird from its own point of view, instead of from an external viewpoint of concrete properties such as feather markings or song calls.  The stories of these species incapsulate actions that happen over a period involving a motivation and outcome.  They aren’t data.  

He introduces species of birds by presenting two or three stories about each.  Each story is a short essay explaining an illustration of an activity the bird is engaged in.  Frequently, the illustration is puzzling, prompting the reader to want to understand it.  In some cases, he breaks the story into several small paragraphs, each with its own illustration, when he wants to describe a sequence of events over time.

A simplified content model will show how Sibley explains birds.  The diagram reveals a highly connected structure.  It doesn’t look like the hierarchical structure of a book. There’s no table of contents or index.  Though the content is manifested as a book, it could be delivered to alternative platforms and channels.  

A simplified content model for Sibley's What it's like to be a bird
A simplified content model for Sibley’s What it’s like to be a bird

On the far left of the diagram, we see themes about birdlife that are explored.  These themes may be broken into sub-themes.  For example, the theme of survival has two sub-themes, which has even more specific sub-themes:

  • Survival
    • Birds and weather
      • Keeping cool
      • Keeping warm
    • Avoiding predators
      • Be inconspicuous
      • Be alert
      • Create a distraction

Each theme or sub-theme presents a range of related factual statements.  For example, he presents a series of facts about how birds create distractions.  These facts represent some important highlights about a theme: an index of knowledge that offers a range of perspectives.  Each fact points to a profile of a bird specifies, where a story essay will provide context about the statement and make it more understandable.

In this example, we see how the theme of how birds use smell is revealed through a series of facts that point to essays about how different species use of smell.

Thematically grouped factual highlights about birdlife (source: Sibley's What it's like to be a Bird)
Thematically grouped factual highlights about birdlife (source: David Sibley’s What it’s like to be a Bird)

When we visit a profile of a bird species, we encounter several stories, which are a combination of picture and essay.  The illustration shows  a starling holding a cigarette in its beak.  The situation has the makings of a story — we want to know more.  The essay tells us.

Illustration and essay providing a story relating to a behavior of a bird species (source: Sibley's What it's like to be a Bird)
Illustration and essay providing a story relating to a behavior of a bird species (source: David Sibley’s What it’s like to be a Bird)

Even if the story is enjoyable, a part of us may wonder if it’s just an entertaining yarn. Picture books are most often associated with fantasy, after all.  And unlike a nature program on TV, we don’t see a museum expert in a talking head interview to make it seem more credible.  Instead, we get a list of recent scientific references relating to the issue.  All these references are arranged thematically, like our facts.  They provide an overview of the focus on recent scientific research about birds, giving us a sense of how much scientists are still learning about these ubiquitous creatures that are as old as dinosaurs. 

Source references of recent discoveries from scientific research relating to birds, arranged thematically (source: Sibley's What it's like to be a Bird)
Source references of recent discoveries from scientific research relating to birds, arranged thematically (source: David Sibley’s What it’s like to be a Bird)

The structuring of the content helps us to understand and explore.  The stories make each bird more real: something living we can become interested in.  We can understand their dilemmas and how they seek to solve them.  We are up-close.  But we can also step back and understand the broader behaviors of birds that influence their lives.  

While each species is described through stories, we are not limited to those.  How do other birds work with smell?  What have we learned recently about smell and birds?  These other pathways allow us to follow our interests and find different connections.  

Sibley’s model shows how to transcend the top-down hierarchies that force how to learn about a topic and the bottom-up collections of random facts that leave us with no structure to guide us.  Sibley’s model of content can be approached in different ways, but it is always deliberate.  There’s no feeling of being lost in hyperlinks.

Content models should reflect what readers want to get and how they might want to get it.  They are more than a technical specification.  They are an essential tool in editorial planning.  Developing a content model can be a creative act.

— Michael Andrews

Categories
Content Engineering

Lumping and Splitting in Taxonomy

Creating a taxonomy — noting which distinctions matter — often seems more art than science.  I’ve been interested in how to think about taxonomy more globally, instead of looking at it as a case-by-case judgment call.  Part of my interest here is a spin off from my interest of birding.  I’m no ornithologist, but I try to learn what I can about the nature of birds.  And species of birds, of course, are classified according to a taxonomy.  

The taxonomy for birds is among the most rigorous out there.  It is debated and litigated, sometimes over decades.  The process involves a progression of “lumps” and “splits” that recalibrate which distinctions are considered significant.  Recently the taxonomy underwent a major revision that reordered the kingdom of birds. 

In the mid-2010s, scientists changed the classification of birds to consider not only anatomical features, but DNA.  In the new ordering, eagles and falcons are not as closely related as was previously assumed. Eagles are closer to vultures, while falcons are closer to parrots.  And pigeons and flamingos are more closely related than thought previously.  Appearance alone is not enough on which to base similarity.

More closely related than you might think (Both produce milk to feed their young)

Taxonomy and Information Technology

Taxonomy doesn’t receive the attention it deserves in the IT world.  It seems subjective: vague, hard to predict, potentially the source of arguments.  Taxonomy resembles content: it may be necessary, but it is something to work around — “place taxonomy here when ready.”

But taxonomy can’t be avoided. Even though semantic technologies are becoming richer in describing the characteristics of entities, the properties of entities alone may not be enough to distinguish between types of entities.  Many entities share common properties, and even common values, so it becomes important to be able to indicate what type of entity something is.  We can describe something in terms of its physical properties such as weight, height, color and so on, and still have no idea what it is we are describing.  It can resemble the parlor game of twenty questions: a prolonged discourse that’s prone to howlers.

Classification is the bedrock of algorithms: they drive automated decisions.  Yet taxonomies are human designed.  Taxonomies lack the superficial impartiality of machine-oriented linked data or machine learning classification.  But taxonomies are useful because of their perceived limitations. They require human attention and human judgment.  That helps make data more explainable.  

Humans decide taxonomies — even when machines provide assistance finding patterns of similarity. Users of taxonomies need to understand the basis of similarity.  No matter how experienced the taxonomist or sophisticated the text analysis, the basis of a taxonomy should be explainable and repeatable ideally.  Machine-driven clustering approaches lack these qualities.  

To be durable, a taxonomy needs a reasoned basis and justification.  Business taxonomies can borrow ideas from scientific taxonomies.   

Four approaches can us help decide how to classify categories:

  1. Homology
  2. Analogy
  3. Differentia
  4. Interoperability

Homology and analogy deal with “lumping” — finding commonality among different items.  Differentia and interoperability help define “splitting” — where to break out similar things.

Homology: Discovering shared origins

Homology is a phrase taxonomists use to describe when features, while appearing different, have a common origin and original intent.  For example, mammals have limbs, but the limb could be manifested as an arm or as a flipper.  

Homology refers to cases where things start the same but go in different directions.  It can get at the core essence of a feature: what it enables, without worrying so much how it appears or precisely what it does.  Homology is helpful to find larger categories that link together different things.

There are two ways we can use homology when creating a taxonomy. 

First, we can look at the components or features of items.  We look for what they share in common that might suggest a broader capability to pay attention to.  Lots of devices have embedded microprocessors, even though these devices play different roles in our lives.  Microprocessors provide a common set of capabilities of that even allow different kinds of items to interact with one another, such as in the case of the Internet of Things (IoT).  Homology is not limited to physical items.  Many business models get copied and modified by different industries, but they share common origins and drivers. We can speak of a class of businesses using an online subscription model, for example.

Second, we can consider whole items and how they are used.  Homology can be useful when a distinct thing has more than one use, especially when it doesn’t have a single primary purpose.  Baking soda is advertised as having many purposes and some consumers like products that contain baking soda.  Here we have a category of baking soda-derived products.  In the kitchen, there are many small appliances that have a rotator on which one can attach implements.  They may be called a food processor, a blender, a mixer, or some trademarked proprietary name.  What can they do?  Many tasks: chopping vegetables, making dough, making soups, smoothies, spreads…the list is endless.  But the most seem to be about pulverizing and mixing ingredients.  It’s a broad class of gadgets that share many capabilities, though they scatter in what they offer as they seek to differentiate themselves.

But there’s another approach to lumping things: analogy.    

Analogy: Discovering shared functions

We use analogies all the time in our daily conversation.  Taxonomists focus on what analogies reveal.  

Analogy helps identify things that are functionally similar, and might share a category as a result.

Analogy is the opposite of homology. With analogy, two things start from a different place, but produce a similar result.  For example, the wings of bees and wings of birds are analogous.  They are similar in their function, but different in their origin and details.  Analogies capture common affordances: where different things can be used in similar ways

Analogies are most useful when defining mental categories, such as devices to watch video, or places to go on a first date.  It’s the most subjective kind of taxonomy: different people need to hold similar views in order for these categories to be credible.

Contrasting homology and analogy, we can see two concepts, which represent notions of convergence (from differences to similarity) and divergence (from similarity to differences).

The other end of taxonomy is not about lumping things into broader categories, but splitting them into smaller ones.

Differentia: Defining Segments

Taxonomists talk about differentia (Latin for difference), which is broadly similar to what marketers refer to as segmentation.

Aristotle defined humans as animals capable of articulated speech. His formulation provided a structural pattern still used in taxonomy today:

  • A species equals a genus plus differentia

That is, the differences within a genus define individual species.  

To put it in more general terms: 

  • A segment is a group plus its distinguishing characteristics (its epithet)

A group gets divided into segments based on distinguishing characteristics.  The differentia separates members from other members.  

One of the most popular marketing segmentations relates to generational differences. In the United States, people born after the Second World War are segmented into 4 groups by age.  Other countries use similar segments, but it is not a universal segmentation so I will focus specifically on US nationals.  A common segmentation (with the exact years sometimes varying slightly) is:

  • Generation W (aka “Boomers”): American nationals born between 1946 and 1964
  • Generation X: American nationals born between 1965 and 1980
  • Generation Y (aka “Millennials”): American nationals born between 1981 and 1996
  • Generation Z: American nationals born since 1997

Such segmentation has the virtue of creating category segments that are comprehensive (no item is without a category) and mutually exclusive (no item belongs to more than one category).  It’s clean, though it is not necessarily correct — in the sense that the categories identify what most matters.  

Segments won’t be valuable if the distinctions on which they are based aren’t that important.  A segment could comprise things with a common characteristic that are otherwise quite diverse.  It’s possible for segment to be designed around an incidental characteristic that makes different things seem similar.

The point of differentia is to represent a defining characteristic. Differentia is valuable when it helps us think through which distinctions matter and are valid.  For example, we might segment people by eye color.  But that hardly seems an important way to segment people. Such segmentation encourages us to refine the group we are segmenting.  Eye color is of interest to makers of tinted contact lenses.  But even then, eye color is not a defining characteristic of a potential contact lens customer, even if were a relevant one.

While differentia can be hard to define durably, it can play a useful role in taxonomies.  It seems reasonable to segment aircraft according to the number of passengers they carry, for example.  It can capture one key aspect that represents many important issues.

Interoperability: Distinctions within commonality

A related issue is deciding when things are similar enough to say they are the same, and when we can say they are related but different.

Our final perspective comes from nature. The similarity of species is partly defined by their ability to mate.  Some closely related species of birds, for example, will cross breed.  Other pairs of less similar species lack that ability.  

A similar situation exists with languages.  Where are the distinctions and boundaries between similar languages? And when are differences just dialects and not actually different languages?  In language, mutual-intelligibility plays a role.  (Language also involves convergence and divergence — but we’ll consider their interoperability here).

The presence or absence of connection between distinct things is associated with two overlapping but distinct concepts: 

  1. Interoperability 
  2. Substitution

Both these concepts address ways in which distinct things might be consider the “same.”

Interoperability is most often associated with technology, though it can be applied to other areas, for example, cultural norms such as religions as well.  The presence of interoperability — the ability of distinct things to connect together easily because they follow a common standard or code of operation — is an indication of their similarity.  If things interoperate — they require no change in set up to work together — then they belong to the same “family,” even if the things come from different sources. The absence of interoperability is a sign that these things may not belong together and need to be split.   

Being part of the same family does not imply they are the same.   Any distinctions would relate to the role of each thing in the family (same family, different roles).   Things that follow the same standard may be similar (same role), or they may be complements (different roles).  

If things can be substituted — they are interchangeable but require a different set up to use — they may belong to the same category, but that category may need to be broken down further.  Windows, Linux and MacOS computers can be substituted with one another  — they serve the same role — so they belong to the broader personal computer category (same role, different families).  But they are separate categories because they don’t interoperate.

The value of taxonomies

Defining taxonomies is not easy.  Interpretation is needed to spot the differences that make a difference. We can improve the discovery process by using heuristic perspectives for lumping and splitting. 

Taxonomy is valuable because it can provide a succinct way to express the significance of an entity in relation to another entities.  Sometimes we need a quick summary to boil down the essence of a thing: what’s distinctive about it, so we can see how it relates to a given situation.  Taxonomies help us overcome the fragmentation of information.  

— Michael Andrews