Categories
Content Engineering

Paradata: where analytics meets governance

Organizations aspire to make data-informed decisions. But can they confidently rely on their data? What does that data really tell them, and how was it derived? Paradata, a specialized form of metadata, can provide answers.

Many disciplines use paradata

You won’t find the word paradata in a household dictionary and the concept is unknown in the content profession.  Yet paradata is highly relevant to content work. It provides context showing how the activities of writers, designers, and readers can influence each other.

Paradata provides a unique and missing perspective. A forthcoming book on paradata defines it as “data on the making and processing of data.” Paradata extends beyond basic metadata — “data about data.” It introduces the dimensions of time and events. It considers the how (process) and the what (analytics).

Think of content as a special kind of data that has a purpose and a human audience. Content paradata can be defined as data on the making and processing of content.

Paradata can answer:

  • Where did this content come from?
  • How has it changed?
  • How is it being used?

Paradata differs from other kinds of metadata in its focus on the interaction of actors (people and software) with information. It provides context that helps planners, designers, and developers interpret how content is working.

Paradata traces activity during various phases of the content lifecycle: how it was assembled, interacted with, and subsequently used. It can explain content from different perspectives:

  • Retrospectively 
  • Contemporaneously
  • Predictively

Paradata provides insights into processes by highlighting the transformation of resources in a pipeline or workflow. By recording the changes, it becomes possible to reproduce those changes. Paradata can provide the basis for generalizing the development of a single work into a reusable workflow for similar works.

Some discussions of paradata refer to it as “processual meta-level information on processes“ (processual here refers to the process of developing processes.) Knowing how activities happen provides the foundation for sound governance.

Contextual information facilities reuse. Paradata can enable the cross-use and reuse of digital resources. A key challenge for reusing any content created by others is understanding its origins and purpose. It’s especially challenging when wanting to encourage collaborative reuse across job roles or disciplines. One study of the benefits of paradata notes: “Meticulous documentation and communication of contextual information are exceedingly critical when (re)users come from diverse disciplinary backgrounds and lack a shared tacit understanding of the priorities and usual practices of obtaining and processing data.“

While paradata isn’t currently utilized in mainstream content work, a number of content-adjacent fields use paradata, pointing to potential opportunities for content developers. 

Content professionals can learn from how paradata is used in:

  • Survey and research data
  • Learning resources
  • AI
  • API-delivered software

Each discipline looks at paradata through different lenses and emphasizes distinct phases of the content or data lifecycle. Some emphasize content assembly, while others emphasize content usage. Some emphasize both, building a feedback loop.

Conceptualizing paradata
Different perspectives of paradata. Source: Isto Huvila

Content professionals should learn from other disciplines, but they should not expect others to talk about paradata in the same way.  Paradata concepts are sometimes discussed using other terms, such as software observability. 

Paradata for surveys and research data

Paradata is most closely associated with developing research data, especially statistical data from surveys. Survey researchers pioneered the field of paradata several decades ago, aware of the sensitivity of survey results to the conditions under which they are administered.

The National Institute of Statistical Sciences describes paradata as “data about the process of survey production” and as “formalized data on methodologies, processes and quality associated with the production and assembly of statistical data.”  

Researchers realize how information is assembled can influence what can be concluded from it. In a survey, confounding factors could be a glitch in a form or a leading question that prompts people to answer in a given way disproportionately. 

The US Census Bureau, which conducts a range of surveys of individuals and businesses, explains: “Paradata is a term used to describe data generated as a by-product of the data collection process. Types of paradata vary from contact attempt history records for interviewer-assisted operations, to form tracing using tracking numbers in mail surveys, to keystroke or mouse-click history for internet self-response surveys.”  For example, the Census Bureau uses paradata to understand and adjust for non-responses to surveys. 

Paradata for surveys
Source: NDDI 

As computers become more prominent in the administration of surveys, they become actors influencing the process. Computers can record an array of interactions between people and software.

 Why should content professionals care about survey processes?

Think about surveys as a structured approach to assembling information about a topic of interest. Paradata can indicate whether users could submit survey answers and under what conditions people were most likely to respond.  Researchers use paradata to measure user burden. Paradata helps illuminate the work required to provide information –a topic relevant to content professionals interested in the authoring experience of structured content.

Paradata supports research of all kinds, including UX research. It’s used in archaeology and archives to describe the process of acquiring and preserving assets and changes that may happen to them through their handling. It’s also used in experimental data in the life sciences.

Paradata supports reuse. It provides information about the context in which information was developed, improving its quality, utility, and reusability.

Researchers in many fields are embracing what is known as the FAIR principles: making data Findable, Accessible, Interoperable, and Reusable. Scientists want the ability to reproduce the results of previous research and build upon new knowledge. Paradata supports the goals of FAIR data.  As one study notes, “understanding and documentation of the contexts of creation, curation and use of research data…make it useful and usable for researchers and other potential users in the future.”

Content developers similarly should aspire to make their content findable, accessible, interoperable, and reusable for the benefit of others. 

Paradata for learning resources

Learning resources are specialized content that needs to adapt to different learners and goals. How resources are used and changed influences the outcomes they achieve. Some education researchers have described paradata as “learning resource analytics.”

Paradata for instructional resources is linked to learning goals. “Paradata is generated through user processes of searching for content, identifying interest for subsequent use, correlating resources to specific learning goals or standards, and integrating content into educational practices,” notes a Wikipedia article. 

Data about usage isn’t represented in traditional metadata. A document prepared for the US Department of Education notes: “Say you want to share the fact that some people clicked on a link on my website that leads to a page describing the book. A verb for that is ‘click.’ You may want to indicate that some people bookmarked a video for a class on literature classics. A verb for that is ‘bookmark.’ In the prior example, a teacher presented resources to a class. The verb used for that is ‘taught.’ Traditional metadata has no mechanism for communicating these kinds of things.”

“Paradata may include individual or aggregate user interactions such as viewing, downloading, sharing to other users, favoriting, and embedding reusable content into derivative works, as well as contextualizing activities such as aligning content to educational standards, adding tags, and incorporating resources into curriculum.” 

Usage data can inform content development.  One article expresses the desire to “establish return feedback loops of data created by the activities of communities around that content—a type of data we have defined as paradata, adapting the term from its application in the social sciences.”

Unlike traditional web analytics, which focuses on web pages or user sessions and doesn’t consider the user context, paradata focuses on the user’s interactions in a content ecosystem over time. The data is linked to content assets to understand their use. It resembles social media metadata that tracks the propagation of events as a graph.

“Paradata provides a mechanism to openly exchange information about how resources are discovered, assessed for utility, and integrated into the processes of designing learning experiences. Each of the individual and collective actions that are the hallmarks of today’s workflow around digital content—favoriting, foldering, rating, sharing, remixing, embedding, and embellishing—are points of paradata that can serve as indicators about resource utility and emerging practices.”

Paradata for learning resources utilizes the Activity Stream JSON, which can track the interaction between actors and objects according to predefined verbs called an “Activity Schema” that can be measured. The approach can be applied to any kind of content.

Paradata for AI

AI has a growing influence over content development and distribution. Paradata is emerging as a strategy for producing “explainable AI” (XAI).  “Explainability, in the context of decision-making in software systems, refers to the ability to provide clear and understandable reasons behind the decisions, recommendations, and predictions made by the software.”

The Association for Intelligent Information Management (AIIM) has suggested that a “cohesive package of paradata may be used to document and explain AI applications employed by an individual or organization.” 

Paradata provides a manifest of the AI training data. AIIM identifies two kinds of paradata: technical and organizational.

Technical paradata includes:

  • The model’s training dataset
  • Versioning information
  • Evaluation and performance metrics
  • Logs generated
  • Existing documentation provided by a vendor

Organizational paradata includes:

  • Design, procurement, or implementation processes
  • Relevant AI policy
  • Ethical reviews conducted
Paradata for AI
Source: Patricia C. Franks

The provenance of AI models and their training has become a governance issue as more organizations use machine learning models and LLMs to develop and deliver content. AI models tend to be ” black boxes” that users are unable to untangle and understand. 

How AI models are constructed has governance implications, given their potential to be biased or contain unlicensed copyrighted or other proprietary data. Developing paradata for AI models will be essential if models expect wide adoption.

Paradata and document observability

Observing the unfolding of behavior helps to debug problems to make systems more resilient.

Fabrizio Ferri-Benedetti, whom I met some years ago in Barcelona at a Confab conference, recently wrote about a concept he calls “document observability” that has parallels to paradata.

Content practices can borrow from software practices. As software becomes more API-focused, firms are monitoring API logs and metrics to understand how various routines interact, a field called observability. The goal is to identify and understand unanticipated occurrences. “Debugging with observability is about preserving as much of the context around any given request as possible, so that you can reconstruct the environment and circumstances that triggered the bug.”

Observability utilizes a profile called MELT: Metrics, Events, Logs, and Traces. MELT is essentially paradata for APIs.

Software observability pattern
Software observability pattern.  Source: Karumuri, Solleza, Zdonik, and Tatbul

Content, like software, is becoming more API-enabled. Content can be tapped from different sources and fetched interactively. The interaction of content pieces in a dynamic context showcases the content’s temporal properties.

When things behave unexpectedly, systems designers need the ability to reverse engine behavior. An article in IEEE Software states: “One of the principles for tackling a complex system, such as a biochemical reaction system, is to obtain observability. Observability means the ability to reconstruct a system’s internal state from its outputs.”  

Ferri-Benedetti notes, “Software observability, or o11y, has many different definitions, but they all emphasize collecting data about the internal states of software components to troubleshoot issues with little prior knowledge.”  

Because documentation is essential to the software’s operation, Ferri-Benedetti  advocates treating “the docs as if they were a technical feature of the product,” where the content is “linked to the product by means of deep linking, session tracking, tracking codes, or similar mechanisms.”

He describes document observability (“do11y”) as “a frame of mind that informs the way you’ll approach the design of content and connected systems, and how you’ll measure success.”

In contrast to observability, which relies on incident-based indexing, paradata is generally defined by a formal schema. A schema allows stakeholders to manage and change the system instead of merely reacting to it and fixing its bugs. 

Applications of paradata to content operations and strategy

Why a new concept most people have never heard of? Content professionals must expand their toolkit.

Content is becoming more complex. It touches many actors: employees in various roles, customers with multiple needs, and IT systems with different responsibilities. Stakeholders need to understand the content’s intended purpose and use in practice and if those orientations diverge. Do people need to adapt content because the original does not meet their needs? Should people be adapting existing content, or should that content be easier to reuse in its original form?

Content continuously evolves and changes shape, acquiring emergent properties. People and AI customize, repurpose, and transform content, making it more challenging to know how these variations affect outcomes. Content decisions involve more people over extended time frames. 

Content professionals need better tools and metrics to understand how content behaves as a system. 

Paradata provides contextual data about the content’s trajectory. It builds on two kinds of metadata that connect content to user action:

  • Administrative metadata capturing the actions of the content creators or authors, intended audiences, approvers, versions, and when last updated
  • Usage metadata capturing the intended and actual uses of the content, both internal (asset role, rights, where item or assets are used) and external (number of views, average user rating)

Paradata also incorporates newer forms of semantic and blockchain-based metadata that address change over time:

  • Provenance metadata
  • Actions schema types

Provenance metadata has become essential for image content, which can be edited and transformed in multiple ways that change what it represents. Organizations need to know the source of the original and what edits have been made to it, especially with the rise of synthetic media. Metadata can indicate on what an image was based or derived from, who made changes, or what software generated changes. Two corporate initiatives focused on provenance metadata are the Content Authenticity Initiative and the Coalition for Content Provenance and Authenticity.

Actions are an established — but underutilized — dimension of metadata. The widely adopted schema.org vocabulary has a class of actions that address both software interactions and physical world actions. The schema.org actions build on the W3C Activity Streams standard, which was upgraded in version 2.0 to semantic standards based on JSON-LD types.

Content paradata can clarify common issues such as:

  • How can content pieces be reused?
  • What was the process for creating the content, and can one reuse that process to create something similar?
  • When and how was this content modified?

Paradata can help overcome operational challenges such as:

  • Content inventories where it is difficult to distinguish similar items or versions
  • Content workflows where it is difficult to model how distinct content types should be managed
  • Content analytics, where the performance of content items is bound up with channel-specific measurement tools

Implementing content paradata must be guided by a vision. The most mature application of paradata – for survey research – has evolved over several decades, prompted by the need to improve survey accuracy. Other research fields are adopting paradata practices as research funders insist that data be “FAIR.” Change is possible, but it doesn’t happen overnight. It requires having a clear objective.

It may seem unlikely that content publishing will embrace paradata anytime soon. However, the explosive growth of AI-generated content may provide the catalyst for introducing paradata elements into content practices. The unmanaged generation of content will be a problem too big to ignore.

The good news is that online content publishing can take advantage of existing metadata standards and frameworks that provide paradata. What’s needed is to incorporate these elements into content models that manage internal systems and external platforms.

Online publishers should introduce paradata into systems they directly manage, such as their digital asset management system or customer portals and apps. Because paradata can encompass a wide range of actions and behaviors, it is best to prioritize tracking actions that are difficult to discern but likely to have long-term consequences. 

Paradata can provide robust signals to reveal how content modifications impact an organization’s employees and customers.  

– Michael Andrews

Categories
Content Engineering

What’s the value of content previews?

Content previews let you see how your content will look before it’s published.  CMSs have long offered previews, but preview capabilities are becoming more varied as content management is increasingly decoupled from UI design and channel delivery. Preview functionality can introduce unmanaged complexity to content and design development processes.  

Discussions about previews can spark stong opinions.

Are content previews:

  1. Helpful?
  2. Unnecessary?
  3. A crutch used to avoid fixing existing problems?
  4. A source of follow-on problems?
  5. All of the above?

Many people would answer previews are helpful because they personally like seeing previews. Yet whether previews are helpful depends on more than individual preferences. In practice, all of the above can be true.  

It may seem paradoxical that a feature like previews can be good and bad. The contradiction exists only if one assumes all users and preview functionality are the same. Users have distinct needs and diverging expectations depending on their role and experience. How previews are used and who is impacted by them can vary widely. 

Many people assume previews can solve major problems authors face. Previews are popular because they promise to bring closure to one’s efforts. Authors can see how their content will look just before publishing it. Previews offer tangible evidence of one’s work. They bring a psychic reward. 

Yet many factors beyond psychic rewards shape the value of content previews. 

What you see while developing content and how you see it can be complicated. Writers are accustomed to word processing applications where they control both the words and their styling. But in enterprise content publishing, many people and systems become involved with wording and presentation. How content appears involves various perspectives. 

Content teams should understand the many sides of previews, from the helpful to the problematic.  These issues are becoming more important as content becomes uncoupled from templated UI design. 

Previews can be helpful 

Previews help when they highlight an unanticipated problem with how the content will be rendered when it is published. Consider situations that introduce unanticipated elements. Often, these will be people who are either new to the content team or who interact with the team infrequently. Employees less familiar with the CMS can be encouraged to view the preview to confirm everything is as expected.  Such encouragement allows the summer intern, who may not realize the need to add an image to an article, to check the preview to spot a gap.  

Remember that previews should never be your first line of defense against quality problems. Unfortunately, that’s often how previews are used: to catch problems that were invisible authors and designers when developing the content or the design.

Previews can be unnecessary 

Previews aren’t really necessary when writers create routine content that’s presented the same way each time.  Writers shouldn’t need to do a visual check of their writing and won’t feel the need to do so provided their systems are set up properly to support them. They should be able to see and correct issues in their immediate work environment rather than seesaw to a preview. Content should align with the design automatically. It should just work.

In most cases, it’s a red flag if writers must check the visual appearance of their work to determine if they have written things correctly. The visual design should accommodate the information and messages rather than expect them to adapt to the design. Any constraints on available space should be predefined rather than having writers discover in a preview that the design doesn’t permit enough space. Writers shouldn’t be responsible to ensuring the design can display their content properly. 

The one notable exception is UX writing, where the context in which discrete text strings appear can sometimes shape how the wording needs to be written. UX writing is unique because the content is highly structured but infrequently written and revised, meaning that writers are less familiar with how the content will display. For less common editorial design patterns, previews help ensure the alignment of text and widgets. However, authors shouldn’t need previews routinely for highly repetitive designs, such as those used in e-commerce.

None of the above is to say a preview shouldn’t be available; only that standard processes shouldn’t rely on checking the preview. If standard content tasks require writers to check the preview, the CMS setup is not adequate. 

Previews can be a crutch 

Previews are a crutch when writers rely on them to catch routine problems with how the content is rendered. They become a risk management tool and force writers to play the role of risk manager. 

Many CMSs have clunky, admin-like interfaces that authors have trouble using. Vendors, after all, win tenders by adding features to address the RFP checklist, and enterprise software is notorious for its bad usability (conferences are devoted to this problem).  The authoring UI becomes cluttered with distracting widgets and alerts.  Because of all the functionality, vendors use “ghost menus” to keep the interface looking clean, which is important for customer demos. Many features are hidden and thus easy for users to miss, or they’ll pop up and cover over text that users need to read.  

The answer to the cluttered UI or the phantom menus is to offer previews. No matter how confusing the experience of defining the content may be within the authoring environment, a preview will provide a pristine view of how the content will look when published.  If any problems exist, writers can catch them before publication. If problems keep happening, it becomes the writer’s fault for not checking the preview thoroughly and spotting the issue.

At its worst, vendors promote previews as the solution to problems in the authoring environment. They conclude writers, unlike their uncomplaining admin colleagues, aren’t quite capable enough to use UIs and need to see the visual appearance. They avoid addressing the limitations of the authoring environment, such as:

  • Why simple tasks take so many clicks 
  • Why the UI is so distracting that it is hard to notice basic writing problems
  • Why it’s hard to know how long text or what dimensions images should be

Writers deserve a “focus” mode in which secondary functionality is placed in the background while writers do essential writing and editing tasks. But previews don’t offer a focus mode – they take writers away from their core tasks. 

Previews can cause follow-on problems

Previews can become a can of worms when authors use them to change things that impact other teams. The preview becomes the editor and sometimes a design tool. Unfortunately, vendors are embracing this trend.

Potential problems compound when the preview is used not simply to check for mistakes but as the basis for deciding writing, which can happen when:

  1. Major revisions happen in previews
  2. Writers rely on previews to change text in UI components 
  3. Writers expect previews to guide how to write content appearing in different devices and channels 
  4. Writers use previews to change content that appears in multiple renderings
  5. Writers use previews to change the core design substantially and undermine the governance of the user experience 

Pushing users to revise content in previews. Many vendors rely on previews to hide usability problems with the findability and navigation of their content inventory. Users complain they have difficulty finding the source content that’s been published and want to navigate to the published page to make edits. Instead of fixing the content inventory, vendors encourage writers to directly edit in the preview. 

Editing in a preview can support small corrections and updates. But editing in previews creates a host of problems when used for extensive revisions, or multi-party edits because the authoring interface functionality is bypassed. The practices change the context of the task.  Revisions are no longer part of a managed workflow. Previews don’t display field validation or contextual cues about versioning and traceability.  It’s hard to see what changes have been made, who has made them, or where assets or text items have come from. Editing in context undermines content governance. 

Relying on previews to change text in UI components. Previews become a problem when they don’t map to the underlying content. More vendors are promoting what they call “hybrid” CMSs (a multi-headed hydra) that mix visual UI components with content-only components – confusingly, both are often called “blocks.” Users don’t understand the rendering differences in these different kinds of components. They check the preview because they can’t understand the behavior of blocks within the authoring tool. 

When some blocks have special stylings and layouts while others don’t, it’s unsurprising that writers wonder if their writing needs to appear in a specific rendering. Their words become secondary to the layout, and the message becomes less important than how it looks. 

Expecting previews to guide how to write content appearing in different devices and channels. A major limitation of previews occurs when they are relied upon to control content appearing in different channels or sites. 

In the simplest case, the preview shows how content appears on different devices. It may offer a suggestive approximation of the appearance but won’t necessarily be a faithful rendering of the delivered experience to customers. No one, writers especially, can rely on these previews to check the quality of the designs or how content might need to change to work with the design.

Make no mistake: how content appears in context in various channels matters. But the place to define and check this fit is early in the design process, not on the fly, just before publishing the content. Multi-channel real-time previews can promote a range of bad practices for design operations.

Using previews to change content that appears in multiple renderings. One of the benefits of a decoupled design is that content can appear in multiple renderings. Structured writing interfaces allow authors to plan how content will be used in various channels. 

We’ve touched on the limitations of previews of multiple channels already.  But consider how multi-channel previews work with in-context editing scenarios.  Editing within a preview will  focus on a single device or channel and won’t highlight that the content supports multiple scenarios. But any editing of content in one preview will influence the content that appears in different sites or devices. This situation can unleash pandemonium.

When an author edits content in a preview but that content is delivered to multiple channels, the author has no way of knowing how their changes to content will impact the overall design. Authors are separated from the contextual information in the authoring environment about the content’s role in various channels. They can’t see how their changes will impact other channels.

Colleagues may find content that appears in a product or website they support has been changed without warning by another author who was editing the content in a preview of a different rendering, unaware of the knock-on impact. They may be tempted to use the same preview editing functionality to revert to the prior wording. Because editing in previews undermines content governance, staff face an endless cycle of “who moved my cheese” problems. 

Using previews to substantially change the core design. Some vendors have extended previews to allow not just the editing of content but also the changing of UI layout and design. The preview becomes a “page builder” where writers can decide the layout and styling themselves. 

Unfortunately, this “enhancement“ is another example of “kicking the can” so that purported benefits become someone else’s problem. It represents the triumph of adding features over improving usability.

Writers wrestle control over layout and styling decisions that they dislike. And developers celebrate not having to deal with writers requesting changes.  But page building tries to fix problems after the fact.  If the design isn’t adequate, why isn’t it getting fixed in the core layout? Why are writers trying to fix design problems?

Previews as page builders can generate many idiosyncratic designs that undermine UX teams. UI designs should be defined in a tool like Figma, incorporated in a design system, and implemented in reusable code libraries available to all. Instead of enabling maturing design systems and promoting design consistency, page builders hurt brand consistency and generate long term technical debt.

Writers may have legitimate concerns about how the layout has been set up and want to change it. Page builders aren’t the solution. Instead, vendors must improve how content structure and UI components interoperate in a genuinely decoupled fashion. Every vendor needs to work on this problem.

Some rules of thumb

  • Previews won’t fix significant quality problems.
  • Previews can be useful when the content involves complex visual layouts in certain situations where content is infrequently edited. They are less necessary for loosely structured webpages or frequently repeated structured content.
  • The desire for previews can indicate that the front-end design needs to be more mature. Many design systems don’t address detailed scenarios; they only cover superficial, generic ones. If content routinely breaks the design, then the design needs refinement.
  • Previews won’t solve problems that arise when mixing a complex visual design with highly variable content. They will merely highlight them. Both the content model and design system need to become more precisely defined.
  • Previews are least risky when limited to viewing content and most risky when used to change content.
  • Preview issues aren’t new, but their role and behavior are changing. WYSIWYG desktop publishing metaphors that web CMS products adopted don’t scale. Don’t assume what seems most familiar is necessarily the most appropriate solution.

– Michael Andrews

Categories
Content Engineering Personalization

Orchestrating the assembly of content

Structured content enables online publishers to assemble pieces of content in multiple ways.  However, the process by which this assembly happens can be opaque to authors and designers. Read on to learn how orchestration is evolving and how it works.

To many people, orchestration sounds like jargon or a marketing buzzword. Yet orchestration is no gimmick. It is increasingly vital to developing, managing, and delivering online content. It transforms how publishers make decisions about content, bringing flexibility and learning to a process hampered in the past by short-term planning and jumbled, ad-hoc decisions.  

Revealing the hidden hand of orchestration

Orchestration is both a technical term in content management and a metaphor. Before discussing the technical aspects of orchestration, let’s consider the metaphor.  Orchestration in music is how you translate a tune into a score that involves multiple instruments that play together harmoniously. It’s done by someone referred to as an arranger, someone like Quincy Jones. As the New Yorker once wrote: “Everyone knows Quincy Jones’s name, even if no one is quite sure what he does. Jones got his start in the late nineteen-forties as a trumpeter, but he soon mastered the art of arranging jazz—turning tunes and melodies into written music for jazz ensembles.”

Much like music arranging, content orchestration happens off stage, away from the spotlight. It doesn’t get the attention given to UI design. Despite its stealthy profile, numerous employees in organizations become involved with orchestration, often through small-scale A/B testing by changing an image or a headline. 

Orchestration typically focuses on minor tweaks to content, often cosmetic changes. But orchestration can also address how to assemble content on a bigger scale. The emergence of structured content makes intricate, highly customized orchestration possible.

Content assembly requires design and a strategy. Few people consider orchestration when planning how content is delivered to customers. They generally plan content assembly by focusing on building individual screens or a collection of web pages on a website. The UI design dictates the assembly logic and reflects choices made at a specific time.  While the logic can change, it tends to happen only in conjunction with changes to the UI design. 

Orchestration allows publishers to specify content assembly independently of its layout presentation. It does so by approaching the assembly process abstractly: evaluating content pieces’ roles and purposes that address specific user scenarios.

Assembly logic is becoming distributed. Content assembly logic doesn’t happen in one place anymore. Originally, web teams created content for assembly into web pages using templates defined by a CMS on the backend. In the early 2000s, frontend developers devised ways to change the content of web pages presented in the browser using an approach known initially as Ajax, a term coined by the information architect Jesse James Garrett. Today, content assembly can happen at any stage and in any place. 

Assembly is becoming more sophisticated. At first, publishers focused on selecting the right web page to deliver. The pages were preassembled – often hand-assembled. Next, the focus shifted to showing or hiding parts of that web page by manipulating the DOM (document object model).  

Nowadays, content is much more dynamic. Many web pages, especially in e-commerce, are generated programmatically and have no permanent existence.  “Single page applications” (SPAs) have become popular, and the content will morph continuously. 

The need for sophisticated approaches for assembling content has grown with the emergence of API-accessible structured content. When content is defined semantically, rather than as web pages, the content units are more granular. Instead of simply matching a couple of web page characteristics, such as a category tag and a date, publishers now have many more parameters to consider when deciding what to deliver to a user.

Orchestration logic is becoming decoupled from applications. While orchestration can occur within a CMS platform, it is increasingly happening outside the CMS to take advantage of a broader range of resources and capabilities. With APIs growing in coordinating web content, much content assembly now occurs in a middle layer between the back-end storing the content and the front-end presenting it. The logic driving assembly is becoming decoupled from both the back-end and front-end. 

Publishers have a growing range of options outside their CMS for deciding what content to deliver.  Tools include:

  • Digital experience, composition, and personalization orchestration engines (e.g., Conscia, Ninetailed)
  • Graph query tools (e.g., PoolParty)
  • API federation management tools (e.g., Apollo Federation)

These options vary in their aims and motivations, and they differ in their implementations and features. Their capabilities are sometimes complementary, which means they can be used in combination. 

Orchestration inputs that frame the content’s context

Content structuring supports extensive variation in the types of content to present and what that content says. 

Orchestration involves more than retrieving a predefined web page.  It requires considering many kinds of inputs to deliver the correct details. 

Content orchestration will reflect three kinds of considerations:

  1. The content’s intent – the purpose of each content piece
  2. The organization’s operational readiness to satisfy a customer’s need
  3. The customer or user’s intent – their immediate or longer-term goal

Content characteristics play a significant role in assembly. Content characteristics define variations among and within content items. An orchestration layer will account for characteristics of available content pieces, such as:

  • Its editorial role and purpose, such as headings, explanations, or calls to action
  • Topics and themes, including specific products or services addressed
  • Intended audience or customer segment
  • Knowledge level such as beginner or expert
  • Intended journey or task stage
  • Language and locale
  • Date of creation or updating
  • Author or source
  • Size, length, or dimensions
  • Format and media
  • Campaign or announcement cycle
  • Product or business unit owner
  • Location information, such as cities or regions that are relevant or mentioned
  • Version 

Each of these characteristics can be a variable and useful when deciding what content to assemble. They indicate the compatibility between pieces and their suitability for specific contexts.

Other information in the enterprise IT ecosystem can help decide what content to assemble that will be most relevant for a specific context of use. This information is external to the content but relevant to its assembly.

Business data is also an orchestration input. Content addresses something a business offers. The assembled content should link to business operations to reflect what’s available accurately.

The assembled content will be contextually relevant only if the business can deliver to the customer the product or services that the content addresses. Customers want to know which pharmacy branches are open now or which items are available for delivery overnight.  The assembled content must reflect what the business can deliver when the customer seeks it.

The orchestration needs to combine content characteristics from the CMS with business data managed by other IT systems. Many factors can influence what content should be presented, such as:

  • Inventory management data
  • Bookings and orders data
  • Facilities’ capacity or availability
  • Location hours
  • Pricing information, promotions, and discount rules
  • Service level agreement (SLA) rules
  • Fulfillment status data
  • Event or appointment schedules
  • Campaigns and promotions schedule
  • Enterprise taxonomy structure defining products and operating units

Business data have complex rules managed by the IT system of record, not the CMS or the orchestration layer.  For content orchestration, sometimes it is only necessary to provide a “flag,” checking whether a condition is satisfied to determine which content option to show.

Customer context is the third kind of orchestration input. Ideally, the publisher will tailor the content to the customer’s needs – the aim of personalization.  The orchestration process must draw upon relevant known information about the customer: the customer’s context.

The customer context encompasses their identity and their circumstances. A customer’s circumstances can change, sometimes in a short time.  And in some situations, the customer’s circumstances dictate the customer’s identity. People can have multiple identities, for example, as consumers, business customers at work, or parents overseeing decisions made by their children.

Numerous dimensions will influence a customer’s opinions and needs, which in turn will influence the most appropriate content to assemble. Some common customer dimensions include:

  • Their location
  • Their personal characteristics, which might include their age, gender, and household composition, especially when these factors directly influence the relevance of the content, for example, with some health topics
  • Things they own, such as property or possessions, especially for content relating to the maintenance, insurance, or buying and selling of owned things
  • Their profession or job role, especially for content focused on business and professional audiences
  • Their status as a new, loyal, or churned customer
  • Their purchase and support history

The chief challenge in establishing the customer context is having solid insights.  Customers’ interactions on social media and with customer care provide some insights, but publishers can tap a more extensive information store.  Various sources of customer data could be available:

  • Self-disclosed information and preferences to the business (zero-party data or 0PD)
  • The history of a customer’s interactions with the business (first-party data or 1PD) 
  • Things customers have disclosed about themselves in other channels such as social media or survey firms (second-party data or 2PD)
  • Information about a cohort they are categorized as belonging to, using aggregated data originating from multiple sources (third-party data or 3PD)

Much of this information will be stored in a customer data platform (CDP), but other data will be sourced from various systems.  The data is valid only to the extent it is up-to-date and accurate, which is only sometimes a safe assumption.

Content behavior can shape the timing and details assembled in orchestration. Users can signal their intent through their interaction with content. The user’s decisions while interacting with content can signal their intentions.  Some behavior variables include:

  • Source of referral 
  • Previously viewed content 
  • Expressed interest in topics or themes based on prior content consumed
  • Frequency of repeat visits 
  • Search terms used 
  • Chatbot queries submitted
  • Subscriptions chosen or online events booked
  • Downloads or requests for follow-up information
  • The timing of their visit in relation to an offer 

The most valuable and reliable signals will be specific to the context. Many factors can shape intent, so many potential factors will not be relevant to individual customers. Just because some factors could be relevant in certain cases does not imply they will be applicable in most cases. 

Though challenging, leveraging customer intent offers many opportunities to improve the relevance of content. A rich range of possible dimensions is available. Selecting the right ones can make a difference. 

Don’t rely on weak signals to overdetermine intent. When the details about individual content behavior or motivations are scant, publishers sometimes rely on collective behavioral data to predict individual customer intentions.  While occasionally useful, predictive inputs about customers can be based on faulty assumptions that yield uneven results. 

Note the difference between tailoring content to match an individual’s needs and the practice of targeting. Targeting differs from personalization because it aims to increase average uptake rather than satisfy individual goals. It can risk alienating customers who don’t want the proffered content.

Draw upon diverse sources of input. By utilizing a separate layer to manage orchestration, publishers, in effect, create a virtual data tier that can federate and assess many distinct and independent sources of information to support decisions relating to content delivery. 

An orchestration layer gives publishers greater control over choosing the right pieces of content to offer in different situations. Publishers gain direct control over parameters to select,  unlike many AI-powered “decision engines” that operate like a black box and assume control over the content chosen.

The orchestration score

If the inputs are the notes in orchestration, the score is how they are put together – the arrangement. A rich arrangement will sometimes be simple but often will be sophisticated. 

Orchestration goes beyond web search and retrieval. In contrast to a ordinary web search, which retrieves a list of relevant web pages, orchestration queries must address many more dimensions. 

In a web search, there’s a close relationship between what is requested and what is retrieved. Typically, only a few terms need matching. Web search queries are often loose, and the results can be hit or miss. The user is both specifying and deciding what they want from the results retrieved.

In orchestration, what is requested needs to anticipate what will be relevant and exclude what won’t be. The request may refer to metadata values or data parameters that aren’t presented in the content that’s retrieved. The results must be more precise. The user will have limited direct input into the request for assembled content and limited ability to change what is provided to them.

Unlike a one-shot web search process, in orchestration, content assembly involves a multistage process.  

The orchestration of structured content is not just choosing a specific web page based on a particular content type.  It differs in two ways:

  1. You may be combining details from two (or more) content types.  
  2. Instead of delivering a complete web page associated with each content type (and potentially needing to hide parts you don’t want to show), you select specific details from content items to deliver as an iterative procedure.

Unpacking the orchestration process. Content orchestration consists of three stages:

  1. FIND stage: Choose which content items have relevant material to support a user scenario
  2. MATCH stage: Combine content types that, if presented together, provide a meaningful, relevant experience
  3. SELECT and RETURN stage: Choose which elements within the content items will be most relevant to deliver to a user at a given point in time

Find relevant content items. Generally, this involves searching metadata tags such as taxonomy terms or specific values such as dates. Sometimes, specific words in text values are sought. If we have content about events, and all the event descriptions have a field with the date, it is a simple query to retrieve descriptions for events during a specified time period.

Typically, a primary content type will provide most of the essential information or messages. However, we’ll often also want to draw on information and messages from other content types to compose a content experience. We must associate different types of items to be able to combine their details.

Match companion content types. What other topics or themes will provide more context to a message? The role of matching is to associate related topics or tasks so that complementary information and messages can be included together.

Graph queries are a powerful approach to matching because they allow one to query “edges” (relationships) between “nodes” (content types.)  For example, if we know a customer is located in a specific city, we might want to generate a list of sponsors of events happening in that city.  The event description will have a field indicating the city. It will also reference another content type that provides a profile of event sponsors.  It might look like this in a graph query language like GQL, with the content types in round brackets and the relationships in square brackets.

MATCH (:Event WHERE location=”My City”) - [:SponsoredBy] -> (:SponsorProfile)

We have filtered events in the customer’s city (fictiously named My City) and associated content items about sponsors who have sponsored those events. Note that this query hasn’t indicated what details to present to users. It only identifies which content types would be relevant so that various types of details can be combined. 

Unlike in a common database query, what we are looking for and want to show are not the same. 

Select which details to assemble. We need to decide what information within a relevant content type which details will be of greatest interest to a user. Customers want enough details for the pieces to provide meaningful context. Yet they probably won’t want to see everything available, especially all at once – that’s the old approach of delivering preassembled web pages and expecting users to hunt for relevant information themselves.

Different users will want different details, necessitating decisions about which details to show. This stage is sometimes referred to as experience composition because the focus is on which content elements to deliver. We don’t have to worry about how these elements will appear on a screen, but we will be thinking about what specific details should be offered.

GraphQL, a query language used in APIs, is very direct in allowing you to specify what details to show. The GraphQL query mirrors the structure of the content so that one can decide which fields to show after seeing which fields are available. We don’t want to show everything about a sponsor, just their name, logo, and how long they’ve been sponsoring the event.  A hypothetical query named “local sponsor highlights” would extract only those details about the sponsor we want to provide in a specific content experience.

Query LocalSponsorHighlights {
… on SponsorProfile {
name
logo
sponsorSince
} }

The process of pulling out specific details will be repeated iteratively as customers interact with the content.

Turning visions into versions

Now that we have covered the structure and process of orchestration let’s look at its planning and design. Publishers enjoy a broad scope for orchestrating content. They need a vision for what they aim to accomplish. They’ll want to move beyond the ad hoc orchestration of page-level optimization and develop a scenario-driven approach to orchestration that’s repeatable and scaleable.

Consider what the content needs to accomplish. Content can have a range of goals. They can explicitly encourage a reader to do something immediately or in the future. Or they encourage a reader’s behavior by showing goodwill and being helpful enough that the customer wants to do something without being told what to do.

Content goalImmediate 
(Action outcome)
Consequent
 (Stage outcome)
Explicit 
(stated in the content)
CTA (call to action) conversionContact sales or visit a retail outlet
Implicit 
(encouraged by the content)
Resolve an issue without contacting customer supportRenew their subscription

Content goals must be congruent with the customer’s context. If customers have an immediate goal, then the content should be action-oriented. If their goal is longer-term, the content should focus on helping the customer move from one stage to another.

Orchestration will generate a version of the content representing the vision of what the pieces working together aim to accomplish.

Specify the context.  Break down the scenario and identify which contextual dimensions are most critical to providing the right content. The content should adapt to the user context, reflect the business context, and provide users with viable options. The context includes:

  • Who is seeking content (the segment, especially when the content is tailored for new or existing customers, or businesses or consumers, for example)
  • What they are seeking (topics, questions, requests, formats, and media)
  • When they are seeking it (time of day, day of the week, month, season, or holiday, all can be relevant)
  • Where they are seeking it (region, country, city, or facility such as an airport if relevant)
  • Why (their goal or intent as far as can be determined)
  • How (where they started their journey, channels used, how long have they pursuing task)

Perfecting the performance: testing and learning

Leonard Bernstein conducts the New York Philharmonic in a Young People’s Concert. Image: Library of Congress


An orchestral performance is perfected through rehearsal. The performance realized is a byproduct of practice and improvisation.

Pick the correct parameters. With hundreds of parameters that could influence the optimal content orchestration, it is essential that teams not lock themselves into a few narrow ones. The learning will arise from determining which factors deliver the right experience and results in which circumstances. 

Content parameters can be of two kinds:

  1. Necessary characteristics tell us what values are required 
  2. Contingency characteristics indicate values to try to find which ones work best
Specifies in the orchestrationDetermines in the contentOutcome expected
Necessary characteristics

(tightly defined scenarios)
What values are required in a given situationWhich categorical version or option the customer getsThe right details to show to a given customer in a given situation
Contingency characteristics

(loosely defined scenarios)
What values are allowed in a given situationWhich versions could be presentedCandidate options to present to learn which most effectively matches the customer’s needs

The two approaches are not mutually exclusive. More complex orchestration (sometimes referred to as “multihop” queries) will involve a combination of both approaches.

Necessary characteristics reflect known and fixed attributes in the customer or business context that will affect the correct content to show. For example, if the customer has a particular characteristic, then a specific content value must be shown. The goal should be to test that the orchestration is working correctly – that the assumptions about the context are correct.  For example, there are no wrong assumptions or missing ones. This dimension is especially important for aspects that are fixed and non-negotiable. The content needs to adapt to these circumstances, not ignore them. 

Contingency characteristics reflect uncertain or changeable attributes relating to the customer’s context. For example, if the customer has had any one of several characteristics now or in the past, try showing any one of several available content values to see which works best given what’s known. The orchestration will prioritize variations randomly or in some ranked order based on what’s available to address the situation.

You can apply the approach to other situations involving uncertainty. When there are information gaps or delays, contingency characteristics can apply to business operations variables and to the content itself.  The goal of using contingency characteristics is to try different content versions to learn what’s most effective in various scenarios.  

Be clear on what content can influence. We have mostly looked at the customer’s context as an input into orchestration. Customers will vary widely in their goals, interests, abilities, and behaviors. A large part of orchestration concerns adapting content to the customer’s context. But how does orchestration impact the customer? In what ways might the customer’s context be the outcome of the content?

Consider how orchestration supports a shift in the customer’s context. Orchestration can’t change the fixed characteristics of the customer. It can sway ephemeral characteristics, especially content choices, such as whether the customer has requested further information.  And the content may guide customers toward a different context. 

Context shifting involves using content to meet customers where they are so they can get where they want to be. Much content exists to change the customer’s context by enabling them to resolve a problem or encouraging them to take action on something that will improve their situation. 

The orchestration of content needs to connect to immediate and downstream outcomes.  Testing orchestration entails looking at its effects on online content behavior and how it influences interactions with the business in other areas. Some of these interactions will happen offline.  

The task of business analytics is to connect orchestration outputs with customer outcomes. The migration of orchestration to an API layer should open more possibilities for insights and learning. 

– Michael Andrews