Categories
Intelligent Content

Content Structure and JavaScript

How audiences view content has radically changed since the introduction of HTML5 around five years ago.  JavaScript is playing a significant role in how content is accessed, and this has implications for content structure.  Content is shifting from being document centric to application centric.  Content strategy needs to reconsider content from an applications centric perspective.

The Standards Consensus: Separate Content Structure from Content Behavior

In the first decade of the new millennium, the web community formed a consensus around the importance of web standards.  Existing web standards were inadequate, so solid standards were needed.  And a widely accepted idea was that content structure, content behavior, and content presentation should all be separate from each other.  This idea was sometimes expressed as the “separation of concerns.”  As a practical matter, it meant making sure CSS and JavaScript doesn’t impact the integrity of your content.

“Just like the CSS gurus of old taught us there should be a separation of layout from markup, there should be a separation of behavior from markup. That’s HTML for the content and structure of the document, CSS for the layout and style, and Unobtrusive JavaScript for behavior and interactivity. Simple.”

— Treehouse blog January 2014

The advice to keep content structure separate from content behavior continues today.   The pillars of separating behavior from structure are unobtrusive JavaScript, and progressive enhancement.  A W3C tutorial advises: “Once you’ve made these scriptless pages you have created a basic layer that will more or less work in any browser on any device.”

Google says similar things: “If you’re starting from scratch, a good approach is to build your site’s structure and navigation using only HTML. Then, once you have the site’s pages, links, and content in place, you can spice up the appearance and interface with AJAX. Googlebot will be happy looking at the HTML, while users with modern browsers can enjoy your AJAX bonuses.”  Google’s advice here considers JavaScript as supporting presentation, rather than affecting content.

Microsoft writer argues: “The idea is to create a Web site where basic content is available to everyone while more advanced content and functionality are accessible to those with more capability, more bandwidth or more advanced tools.”   While it’s not clear what the distinction is between basic and advanced content, the core idea is similar: that important content shouldn’t be dependent on JavaScript behavior.

The web standards consensus was driven by an awareness that browsers varied, that JavaScript was sometimes unreliable, and that separation meant that persons using assistive technology were not disadvantaged.  That consensus is now eroding.  Some developers argue that it no longer matches the reality of current technical capabilities, and that the evolution of standards is solving prior issues that necessitated separation.  These developers are fusing content behavior and structure together.

The New Reality: JavaScript Driven Content

“The separation of structure, presentation and behavior is dead. It has been dead for a while. Still, this golden rule of web design sticks around. It lives on like Elvis and we need to address it.”

— Treehouse blog January 2012

Over the past five years, the big change in the web world has been the adoption of HTML5, with its heavy focus on applications, in contrast to the more document focused XHTML it replaced.  The emphasis among developers has been more about enhancing application behavior, and less about enhancing content structure.   HTML5 killed the unpopular proposed XHTML2 spec that emphasized greater structure in content, and developers have been seeking ways to remove XML-like markup where possible.

Silicon Veteran David Rosenthal, an Internet engineer at Stanford, describes the change this way: “The key impact of HTML5 is that, in effect, it changes the language of the Web from HTML to JavaScript, from a static document description language to a programming language.”  He notes: “The communication between the browser and the application’s back-end running in the server will be in some application-specific, probably proprietary, and possibly even encrypted format.”  And adds: “HTML5 allows content owners to implement a semi-effective form of DRM for the Web.”

The emphasis on applications behavior has resulted in new interaction capabilities and enhanced user experiences.  Rather than view a succession of webpages, users can interact with content continuously.  This has resulted in what’s called the Single Page Application, where “the web page is constructed by loading chunks of HTML fragments and JSON data.”

This shift has also been referred to as the “app-ification” of the web, where “a single page app typically feels much more responsive to user actions.”  “Single Page Applications work by loading a single HTML page to the user’s browser and subsequently never navigating away from this page. Instead, content, functional buttons, and actions are implemented as JavaScript actions.”

People are now thinking about content as apps.  An article entitled “The Death of the Web Page” declares: a “Single Page can produce much slicker, more customized and faster experiences for content consumption just as it can for web apps.”

JavaScript increasingly shapes the web’s building blocks.   Even semantic markup identifying the meaning of pieces of content, which has customarily been expressed in XML-flavored syntax (eg, RDF), is now being expressed through scripts.  JSON-LD, an implementation of the JavaScript Object Notation that is being used for some Schema descriptions of web content, relies on an embedded script, rather than markup that’s independent of the browser.

Risks Associated with Content On Demand

The rise of the Single Page Application is the most recent stage in the evolution of an approach I’ll call content on demand.

Content on demand means that content is hidden from view, and can only be discovered through intensive interrogation.  JavaScript libraries such as AngularJS determine the display of content in the client’s browser.  Server side content decisions are also being guided by browser interactions.  Even prior the rise of the current generation of Single Page Applications, the use of AJAX meant that users were specifying many parameters for content, especially on ecommerce sites.  “Entity-oriented deep-web sites are very common and represent a significant portion of the deep-web sites. Examples include, among other things, almost all online shopping sites (e.g., ebay.com, amazon.com, etc), where each entity is typically a product that is associated with rich information like item name, brand name, price, and so forth. Additional examples of entity-oriented deep-web sites include movie sites, job listings, etc” noted a team of Google researchers.  Such sites are hard for bots to crawl.

Google may not know what’s on your website if a database needs to return specific content.  If you have a complex system of separate product, customer and content databases feeding content to your visitors, it’s possible you are not entirely certain what content you have.  The Internet Archive’s Wayback Machine has trouble archiving the growing amount of content that is dependent on JavaScript.  There are now companies specializing in the crawling and scraping of “deep web” content to try to figure out what’s there.

Content on demand can sometimes be fragmented, and hard to manage.  Traditional server driven ecommerce sites manage their content using product information management databases, and can run reports on different content dimensions.  The same isn’t true of newer Single Page Applications, which may talk to content repositories that have little structure to them. JavaScript often manipulates content based on numeric IDs that may be arbitrary and do not represent semantic properties of content.  Content with idiosyncratic IDs obviously can’t be reused in other contexts easily.

Dynamic, constantly refreshing content can be relevant, and engaging for users.  But it doesn’t always meet their needs.  Especially when the implementing technology assumes audiences will want the existing paradigm exactly as it is.

JavaScript rendered content presumes the use of browsers for audience interaction.  That’s a good bet for many use cases, but it’s not a safe bet.   Audiences may choose to access their content through a simple API — perhaps a RSS feed or an email update sent to Evernote — that doesn’t allow them to interrogate the content.   In practice, the proportion of content being delivered through traditional browsers seems to be declining as new platforms and channels emerge.

Forcing users to interrogate content consistently could pose problems with the emerging category of multimodal devices.  To access content, audiences may depend on different input types such as gestures, speech recognition and voice search.    Content needs to be available in non-browser contexts on phones and handheld devices, home appliances, intelligent autos, and medical devices.  But input implementations are not uniform, and can be often proprietary.  Consider the hottest new form of interaction: speech input.  Chrome allows speech input, but other browsers can’t use Google’s proprietary technology, and x-webkit-speech only supports speech interaction for some form input types.

When viewable content is determined by a sequence of user interactions, it can become an exercise in “guess what’s here” because content is hidden behind buttons, menus and gestures.   Often, the presence of these controls only provides the illusion of choice.  In older page-based systems, users might choose many terms and be lead to pages with different URLs that had the same content.  Now, with “stateless” content, users might not even be sure of how they got to what they are seeing, and have no way to backtrace their journey through a history or bookmarks.

The risk of the content on demand approach is that content is may loose its portability when it is optimized for certain platforms.  We might want to believe that everyone is now following the same standards, but that wouldn’t be wise.  While tremendous progress has been made harmonizing standards for the web, the relentless innovations mean that different players such as Google, Apple, and Microsoft are being pulled in different directions.  Even Android devices, all nominally following the same approach, implement things differently, so that the browser on an Amazon Kindle will not display the same as a browser on a Samsung tablet.  The more JavaScript embedded in one’s content, the less easily it can be adapted to new platforms and services.

Some kinds of content hidden in the Deep Web (via Wikipedia)

  • Dynamic content: dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.

  • Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).

  • Scripted content: pages that are only accessible through links produced by JavaScript

Know Your Risks, Prioritize What’s Important

My goal is not to criticize the app-ification of the web.  It has brought many benefits to audiences and to brands.  But it is important not to be intoxicated by these benefits to the point of underestimating associated costs.

Google, which has a big interest in the rise of JavaScript rendered content, recently noted:

“When pages that have valuable content rendered by JavaScript started showing up, we weren’t able to let searchers know about it, which is a sad outcome for both searchers and webmasters. In order to solve this problem, we decided to try to understand pages by executing JavaScript. It’s hard to do that at the scale of the current web, but we decided that it’s worth it. We have been gradually improving how we do this for some time.”

It’s fair to say JavaScript-rendered content is here to stay.  But if it’s hard for a bot to click on every JavaScript element to find hidden content, think about the effort it takes for an ordinary user.  Just because content is rendered quickly doesn’t mean the user doesn’t have to do a lot of work to swipe their way through it.  My advice: use JavaScript intelligently, and only when it really benefits the content.

Functionality should support choices significant to the user, and not mandate interactions.  There is an unfortunate tendency among some cosmetically focused front-end developers to provide gratuitous interactions because they seem cool.  Rather than be motivated by the goal of reducing friction, they present widgets for their spectacle effects rather than their necessity to support the user journey.  Is that slider really necessary, or was too much content presented to begin with which required the filtering?

We should consider limiting the number of parameters for dynamic content.  In the name of choice, or because we don’t know what audiences want, we sometimes provide them with countless parameters they can fiddle with.   Too many parameters can be overwhelming to users, and make content unnecessarily complex.  When Google studied ecommerce sites several years ago, they discovered that the numerous different results returned by searching product databases actually aligned to a limited number of product facets.  The combination of these facets represented “a more tractable way to retrieve a similar subset of entities than enumerating the space of all input value combinations in the search form.”    In other words, instead of considering content in terms of user selected contingencies, one can often discover that content has inherent structure that can be worked with.

A big consideration with content on demand is understanding what entities have an enduring presence.  As content moves toward being more adaptive and personalized, it is important to know and manage the core content.  There can be a danger when stringing together various and changing HTML fragments via continuous XMLHttpRequests on a single page that neither the audience nor the brand can be sure what was presented at a given point in time.  This is not just a concern for the legal compliance officer working at a bank: it’s important to all content owners in all organizations.  For audiences, it is hugely frustrating to be unable to retrieve content one has seen previously because you are unable to recreate the original sequence of steps that produced that view.

A core content entity should be a destination that is not dependent on a series of interactions to reach it.   Google has long advocated the use of canonical URLs instead of database generated ones.  But stateless app-like web pages lack any persistent identity.  Is that really necessary?  The BBC manages a vast database of changing content while providing persistent URLs.  Notably, they use specific URIs for their core content that allow content to be shared and re-used.  They do this without requiring the use of JavaScript.  To me, it seems impressive, and I encourage you to read about it.

What’s the Future of Structure?

An approach that decomposes content into unique URIs could provide the benefits of dynamic content with the benefits of persistence.  Each unique entity gets a unique URI, and entities are determined through the combination of relevant facets.  URIs are helpful for linking content to content hosted elsewhere.  One could layer personalization or modifications around the core content, and reference these through a sub path linked to the parent URI.  Such an approach requires more planning, but would enable content to be ready for any device or platform without scripting dependencies.  I can’t speak authoritatively concerning the effort required, any implementation limitations, or how readily such an approach could be used in different context.  This kind of approach isn’t being done much, but it leverages thinking from linked data about making content atoms that communicate with each other.  I would like to see developers review and explore the practicalities  of URI-defined content as content strategists think through the organizational and audience use cases.

Content strategists often advocate XML-like markup for structure, but I see few signs that is gaining widespread traction in the developer world, where XML is loathed.  XML markup seems to be in retreat in the web world, while JSON is king. How do we express structured content in the context of a programming language rather than a documentation language?  We need collectively to figure out how to make structure the friend of development, rather than a hinderance.

Content strategists can no longer presume content will be represented by static html pages that are unaffected by JavaScript behavior.  JavaScript rendered content is already a reality.   The full implications of these changes are still not clear, and neither are realistic best practices.  We need to discover how to balance the value of persistent content having a coherent identity, with the value of dynamic adaptive and personalized content that may never be the same twice.

— Michael Andrews

Categories
Intelligent Content

Wine, Content, and Domain Models

Suppose your organization wants to become the preeminent source of information about a topic. It aims to give audiences the ability to look at any dimension of a topic they might be interested in. How would you offer this?

To deliver informationally rich content, numerous content items need to be associated to one another. Content needs to be modular, with components that work together. But how do these things relate to each other? Where does one start?

Content models define how units of content should interact. Content modelling can be difficult to grasp and practice, partly because it is not a single uniform method. It encompasses a spectrum of related approaches that can be adapted to different needs.

People sometimes start to model their content before they know all the content they really need. They focus on what content has been already created, and not explore what content is not yet available that might be of interests to users.

Content models are often more robust when they are backed by a domain model. A domain model enables content designers to untangle a messy topic and explore and define requirements and design solutions.

The role of content modelling

A content model is the end goal of a domain model. Rachel Lovinger has been instrumental in developing and advocating the practice of content modelling, so I will rely on her definition. She states: “A content model documents all the different types of content you will have for a given project. It contains detailed definitions of each content type’s elements and their relationships to each other.” She recommends using content models to bridge perspectives on a team.

“A content model helps clarify requirements and encourages collaboration between the designers, the developers creating the CMS, and the content creators.” — Rachel Lovinger

In addition to facilitating project delivery, content models improve how content is delivered to audiences. Content models can enable personalization, adaptive content, and content APIs. Cleve Gibbon, a collaborator of Rachel Lovinger in evangelizing content models, notes: “Great APIs are founded upon solid models. So if you’re building a Content API, be sure to create a content model FIRST that conveys the required level of structure and meaning.”

The spectrum of content modelling

Models can represent different dimensions of a topic: either conceptual, or formal and structural. Content models can indicate how to assemble content components. But first one needs to know how solidly your content types are defined.

On one end of the spectrum, you may have well defined, fixed content. In such cases, one can develop what Deane Barker calls a relational content model. He defines it as “the concept of how different, separately-managed pieces of content relate to each other.  (This is distinct from ‘discrete content modeling,’ which is how you structure a single piece of content.” He explains the goal as “the idea of taking multiple discrete content objects (articles, sections, issues) and ‘rolling them up’ into a more complex content object (publication).”

On the other end of the spectrum, you may have fluid content, where the exact requirements are still emerging and many different hubs of content are possible. In such cases, a domain focused, ontology based form of modelling can be helpful. This approach has been used by the BBC for several large projects. Mike Atherton emphasizes the importance of the domain of the topic in content models: “A content model maps our subject domain, not our website structure.” He advises: “Concentrate on modelling real (physical and metaphysical) things not web pages.”

One way to consider the differences in a content model and a domain model is the metadata they emphasize. Rachel Lovinger states: “The Content Model is primarily concerned with structural metadata, while the Domain Model is largely concerned with descriptive metadata.”

A domain model and content model are complementary. A domain model helps you describe things that will be represented by content, while the content model helps you structure the content. Using both allows you to understand the relationship of a real world entity with a content entity.

A domain model is a useful place to start when content does not yet exist, or one is looking for a fresh redesign of content. Domain models may be considered as the prequel to content models. By focusing on entities in the real world, and the relationships between these entities, one can see opportunities to develop content associated with these entities, and what elements would be needed for that content. The correspondence of domain entity type, and content type, is illustrated in the table.

The relationship between a domain model and content model
The relationship between a domain model and content model

Domain models in the real world: Italian wine

Domain models can clarify one’s understanding of a topic, and offer insights into how different items of information relate to each other. Domain modelling emerged as strategy in software development to bridge analysis and design of complex business domains by using a shared verbal and visual language between experts, endusers and developers. Domain models can be especially useful for complicated and messy topics. They would seem perfect for understanding Italian wine.

When you live in Italy, as I do, understanding Italian wine is a practical problem. Wine is ubiquitous, but understanding Italian wine is not self-evident. Walk into an Italian wine store and you are confronted with walls of bottles whose contents are largely unrecognizable. It’s not that all wine is difficult to understand. When I lived in New Zealand, I had a good idea what different wines were about. It’s Italian wine that is the challenge.

The famous wine critic Hugh Johnson once wrote: “the already bewildering complexity of Italian wines has become tangled enough to drive a critic to drink.” Italian wine is particularly hard to understand because of its heterogeneity. Even the imposition of standardized nomenclature to designate where a wine is from results in a bewildering array of non-standard implementations of these standards. Idiosyncratic traditions, politics, and rogue approaches mean that wines are described in great detail, but in richly differing ways.

At the core of why Italian wine is difficult to decipher is its product architecture: how specific wines are labelled. Consumers need an easy way to know the basic characteristics of a wine based on its label. Do consumers think of wine in terms of where it’s from (Burgundy) or what grapes it is made from (Chardonnay)?[1]  High volume wine producers have attempted to solve the product architecture problem by promoting brand awareness of a grape variety or a region. What happens when consumers are not familiar with either the origin name or the grape name?

Unfortunately, Italian wine labels are uncharacteristically difficult to decipher. Italian labels will show the producer + (grape variety and/or geographic indication) + year. That seems reasonable enough, until the consumer realizes that the only items on the label they might recognize are the digits of the year. Even if they have familiarity with another proper name on the label, that is not sufficient to make a selection decision.

The most significant piece of information about the kind of wine is indicated by the grape variety and/or the geographic indication (a regional designation similar to an appellation in France). Between these two items, there are nearly 1000 different varietals and zones that indicate the basic composition of the wine. [2]  To get a sense of how good the wine is, the most reliable information is the producer, and the year of vintage. Yet there are many thousands of wine producers in Italy of varying abilities, and the correlation of product quality to year of vintage is very specific to the variety of wine and where it was produced.

The complexity of Italian wine would seem tailored for digital content. But existing digital-only information sources on the web tend to be shallow — both in terms of their range of attributes, and their selective coverage.

Good information about wines, producers, and regions are available from several well known printed guides, such as those by L’Espresso, Gambero Rosso, Touring Club, Bibenda, and Slow Food. Despite the editorial quality of the content, the information is not as usable as it could be. Depending on the specific organization of the book, the information is stovepiped in one way or another. The editors of each guide assumes a fixed path of entry that generally leads to a producer profile. Users are expected to think like the editors to uncover information of interest to them.

In some cases there are iPad versions of these printed guides, but they don’t feel natively digital, and require lots of tapping to move around from screen to screen. They are less usable than the print version, because they are slower to move through, and one’s orientation can get lost when hopping between screens. The content, while structured editorially, is not structured digitally with digital metadata. There is no ability to move laterally through the content: navigation is hierarchical. Unfortunately shovelware that ports a printed product and dumps it into a tablet format is too common, due to the false promises embedded in Adobe InDesign.

What users need is not simply a catalog of items, but a way to make sense of the bigger picture, in addition to exploring the detail. The heavy focus on profiles means that the user doesn’t see easily how these items relate to other things. They also miss seeing collective behaviors of similar items, which is possible when one digitally aggregates items sharing the same metadata. Thinking through these relationships and behaviors is one benefit of domain modelling.

Understanding the domain

How do people think about a subject? Mike Atherton suggests: “Experts map the world, users mark points of interest.” It helps to know how experts think about a topic like wine, and then during design, figure out what more typical users consider high priority goals. What aspects of wine do people consider significant? How might different aspects be pulled together into interesting items of content?

The topic of wine is distinctive because many people want to become experts, in contrast to other products. Getting information about the product is rarely a perfunctory task, but a connoisseurial pastime. Some people want to develop a broad knowledge about all styles of wine, while other people want to have a deep knowledge about a few specific producers or product areas, perhaps tied to places they go on holiday. Many things people might be interested in are non-obvious. For example, soil characteristics can influence how a grape variety tastes. Others may be interested in the environmental credentials of a producer.

How to break things down so they can be managed

The most important task when developing a domain model is to identify appropriate entities. An entity is a thing, either tangible or conceptual, with a distinct identity. It’s not the same as an existing item of content — the content may not exist yet. Entities, to use the words of Cleve Gibbon, are “first class citizens in the business domain” — they are the actors in the drama on the stage.

Entities have attributes — characteristics. Attributes do not necessarily become a field in the content, but they often do. That decision needs to be made when the content is designed. Taste is certainly an attribute of wine, but is not necessarily a field in a description of a wine.

Once entities have been identified, it is necessary to determine where to put attributes, and whether to break entities into smaller units. Often, one discovers intermediate zones that straddle two entities. The horticultural characteristics of the vineyard reflect the interaction of the producer and the wine produced. The interplay between region and varietal defines the vintage for a given year. These intermediate areas may not deserve to be entities themselves, but one should consider how to make sure their role remains visible.

What a domain model for Italian wine looks like

It is helpful to first consider the relationships between entities, then examine the attributes associated with each entity.

When looking at entities, two things are important. First, how many instances are there for each entity type? The entity map shows that most of the entities, there are hundreds or even thousands of instances. This large number suggests that establishing meaningful relationships between entities will be important if users are to be successful navigating through such a large volume of content. Second, what is be essential character of relationships between entities? We want to know how many connections there are between entities: the more connections to other entities, the richer the potential interaction of information. We also want to know if the relationship between entities is a one-to-one relationship, a one-to-many relationship, or a many-to-many relationship. The “crow’s feet” in our entity map indicates numerous many-to-many relationships. That may make the design of content a bit more challenging, but it also indicates many interesting connections. Our content is a valuable resource when it’s not easy to see these connections in one’s head.

Relationships between different entity types associated with the domain of Italian wine.
Relationships between different entity types associated with the domain of Italian wine.

Next, explore the attributes associated with each entity. The goal is to identify and associate attributes of entities. Each entity has a number of attributes. Some will be short fields, others will involve longer text descriptions. There is no right number of attributes, provided all attributes are meaningful. The number of attributes to implement in design will depend on both business and design decisions. There will be a business decision concerning the cost of acquiring the information related to the attribute, and the usefulness to consumers of that information. There will also be a design decision relating to which attributes to expose to which audiences.

Typical attributes of each entity type relating to domain of Italian wine
Typical attributes of each entity type relating to domain of Italian wine

Our model shows attributes that are commonly associated with the domain of Italian wine. For example, it can be interesting to know the number of bottles produced of a wine. That can indicate how widely available the wine is to buy, or perhaps its scarcity (that one needs to reserve purchase). Some wine guides will indicate the total number of bottles according to producer, while others will indicate total number of bottles by label. This difference means that one can answer different questions, such as who is the largest producer within a geographic indication zone, or who is the largest producer of a specific kind of wine. Ideally, one would like data at both the producer and product levels, but that may not be easy to obtain for all producers.

Lessons from domain modelling

Even though domain modelling attempts to represent the real world, reality is often less orderly than we would like it to be.

Not everything can be easily expressed as a regularized attribute. Audiences will want to know: What does the wine taste like? It would be wonderful to provide a reliable, easy-to-understand way to explain taste that allows easy comparison between wines, zones, and producers. Sadly, taste is — surprise — a bit subjective. Different experts will say different things about the same wine, even when they agree on an overall judgment. Terminology is not standard either. The same words can mean different things. Critics may use the word “cherry” to describe a taste as “spicy black cherry” or as “cherry rhubarb.” There is no controlled vocabulary for wine, no limited set of descriptors with precisely defined and agreed meanings.

Example of geographic designation zones within a single region.  Screenshot from a certification body website.
Map of geographic designation zones within a single region. Screenshot from a certification body website.

By their nature, models simplify reality. The geographic indication signifies where a wine is made, and the criteria by which it is made. Whereas most geographical entities are based on either political administrative geography or physical geography, geographic indications exist outside these frameworks. A geographic indication can straddle two administrative regions. It can exist in two different, discontinuous locations. Some geographic indication zones have subzones. Wine producers also can behave in complex ways. Sometimes a wine producer is a brand “house” that has vineyards in several locations, or a consortium that sources from different vineyards. The informational details associated with these exceptions may not be important to users, and can add design complexity.

The identity of items can be constructed in several ways. One needs to be able to distinguish one entity from others belonging to the same entity type — items need to be uniquely identified. Despite the challenges of deciphering Italian wine, specific entities fortunately are identified with meaningful, human readable names, rather than numeric product codes. The domain model can use existing identifiers, which are based on several approaches:

  • Collectively defined names (the names of regions, geographic indications, and grape varieties), though some producers use alternate names for grape varieties.
  • Self described (the name of producer), though sometimes producers choose to use both a house and proprietor name
  • Inherited identity (the environmental profile for a producer)
  • Names composed of compound attributes , such as dry sparkling rosato as a wine category entity.

Thinking about design

The domain model can support early design discussions. Many questions that are interesting to audiences will span two or more different entities. For example:

  • What year produced the best wine from a region?
  • What geographic indication commands the highest average prices?
  • What grape varieties produce the most wine?
  • What wines for a given year and geographic designation are ready to drink?

Some answers require computations of structured data. Questions of interest to audiences need to be translated into content types that will be represented in the content model.

In addition to supporting interesting exploration, the design needs to support common tasks. The domain model helps to identify information available to support common tasks. Some common points of entry audiences will seek when exploring wine include:

  • By rating
  • By price
  • By category
  • By variety

Users often focus on one specific criteria when starting the process of seeking information. In some cases, these are entities, in others, these are attributes. Considering task starting points can help identify potential groupings of content elements. Depending on the depth of content, these groupings may not be manageable for users without providing additional parameters to narrow the pool of candidate content. The most salient criteria is not the only factor that’s important to the user.

In contrast to starting points, another perspective is to consider the end goal of the task. Examining the end goal, the content designer can consider the orientation of different users. Users of wine information may be:

  • Bottle centric — interested in the characteristics of specific bottles of wine
  • Producer centric — interested in the story of the producer, perhaps with an intention of visiting them
  • Food centric — mostly interested in wine styles as a complement to food dishes.

Domain depth and domain scope

The depth of a domain reflects both the number of attributes for an entity type, and quantity of items. Both aspects can impact the design. The quantity of items will influence content types that presents lists and links. The number of attributes will impact content type structures for content items.

Content designers decide how much of the domain model to present to users. A fixed content type may show all attributes as part of in content type. With a flexible content type, attributes may be optionally available, or have serval variations. Designers may choose progressive disclosure of content that hides details, which are revealed only when wanted. Or they may implement an adaptive approach, where different variations of content types are shown depending on the interests of an audience segment, or device formats.

The other aspect of the domain model, thus far unmentioned, is how it might connect with other domains. The domain model offers the possibility of enlarging the scope addressed by considering related domains. Different variations of content may draw on common content, while including different content as well (see diagram). Three different apps may share common core content. But they provide different functionality depending on their focus (touring vineyards, pairing wine with food, or knowledge enhancement of wine). The domain model can also be used to guide the planning of releases of content and functionality.

Relationship between the depth of a domain, and its scope.  Content can be deep, covering many attributes.  And content can we wide, connecting with other domains.
Relationship between the depth of a domain, and its scope. Content can be deep, covering many attributes. And content can we wide, connecting with other domains.

Relating entities: Comparisons to other approaches

Domain modelling is not the only approach to sorting through complex content. Before closing this discussion, it is worth talking about two other well known approaches that look and behavior similarly, but have some differences.

Faceted search, an approach popular in library science and information architecture, allows users to locate specific content by filtering on facets. Facets can be attributes or entities. The idea is that users can locate content that has the qualities of A & B & C. Faceted search is a popular technique, common on ecommerce sites, and is often helpful. The utility of the technique rests on several assumptions. First, faceted search assumes users know the two to four most important criteria, and will get a manageable set of results. If the set of results is large, users generally take a satisficing approach, happy with the first result encountered that is minimally acceptable. Second, faceted search presumes that each facet is independent of each other, which in the case of wine isn’t true. It is possible to get null sets if facets aren’t deep. While faceted search has been implemented on some wine ecommerce sites, it is not an effective approach for helping users discover content they might be interested in but not know about, and tends to focus on a limited range of aspects.

Linked data is an approach to modelling content that has close associations to domain modelling, thanks to the BBC’s integration of the two approaches. To simplify, linked data allows users to find content with characteristic A that has B, which has C. Organizing content using a linked data approach has both benefits and drawbacks. One drawback is that queries can be path dependent. Whether results appear promising or discouraging depends on how you construct the query. Linked data queries are generally more open ended than predefined structured queries that answer fixed questions with predictable sets of results. A bigger concern is that linked data treats all aspects of an entity as other entities, and each entity gets its own page. But not all attributes are meaningful entities — things worthy of their own content destination. On the positive side, linked data is good for what-else questions. One can link outside of a domain to other domains, such as to geophysical data.

Model behavior

Models aren’t reality, according to the cliche. Domain models may appear esoteric to some people, given that they aren’t actually something implemented directly, but are an input to other deliverables. To get buy-in for domain models, it may be best to use it as a discussion document, and note that it will evolve into the content model. While it l lacks the appeal of being code-ready, a domain model can play an important role on a project. It can uncover hidden requirements and opportunities, help align different stakeholders around a common vision, and accelerate the design process.

— Michael Andrews


  1. Chardonnay grapes originated in Burgundy. Even though most people associate Burgundy with red wine, there are also white Burgundy wines made from Chardonnay.  ↩
  2. A canonical list of varietals and zones is available from the databases of the intergovernmental wine organization OIV http://www.oiv.int/oiv/info/enbasededonneesIG  ↩