Categories
Content Integration

The Benefits of Hacking Your Own Content

How can content strategy help organizations break down the silos that bottle up their content?  The first move may be to encourage organizations to hack their own content.

Silos are the villains of content strategists. To slay the villain, the hero or heroine must follow three steps to enlightenment:

  1. Transcend organizational silos that hinder the coordination and execution of content
  2. Adopt an omnichannel approach that provides customers with content wherever and however they need it, so that they aren’t hostage to incoherent internal organizational processes and separately managed channels that fragment their journey and experience
  3. Reuse content across the organization to achieve a more cost-effective and revenue-enhancing utilization of content

The path that connects these steps is structured content. Each of these rationales is a powerful argument to change fractured activities.  Taken together, they form a compelling motivation to de-silo content.

“Content silo trap: Situation created by authors working in isolation from other authors within the organization. Walls are erected among content areas and even with in content areas, which leads to content being created and recreated and recreated, often with changes or differences in each iteration.”  Ann Rockley and Charles Cooper in Managing Enterprise Content: Unified Content Strategy.

The definition of a content silo trap emphasizes the duplication of effort.  But the problems can manifest in other ways.  When groups don’t share content with each other, it results in a content situation that divides the haves and the have-nots.  Those who must create content with finite resources need to prioritize what content to create.  They may forego providing their target audiences with content relating to a facet of a topic, if it involves more work than the staff available can handle.  Often organizational units devote most of their time to revising existing content rather than creating new content, so what they offer to audiences is highly dependent on what they already have.  Even when it seems like a good idea to incorporate content related to one’s own area of responsibility that’s being used elsewhere, it can be difficult to get it in a timely manner.  It may not be clear if it is be worth the effort to re-produce this content oneself.

What Silos Look Like from the Inside

Let’s imagine a fictional company that serves two kinds of customers: consumers, and businesses.  The products that the firm offers to consumers and businesses are nearly identical, but are packaged differently, with slightly different prices, sales channels, warranties, etc.  Importantly, the consumer and B2B businesses are run as separate operating units, each responsible for their own expenses and revenues.  The consumer unit has a higher profit margin and is growing faster, and decided a couple of years ago to upgrade its CMS to a new system that’s not compatible with the legacy system the entire company had used.  The B2B division is still on the old CMS, hoping to upgrade in the near future.

A while ago, a product manager in the B2B division asked her counterpart in the consumer division if she’d be able to get some of the punchy creative copy that the consumer division’s digital agency was producing.  It seemed like it could enhance the attractiveness of the B2B offering as well.   Obviously only parts were relevant, but the product manager asked to receive the consumer product copy as it was being produced, so it could be incorporated into the B2B product pages.  After some discussion, the consumer division product manager realized that sharing the content involved too much work for his team.  It would suck up valuable time from his staff, and hinder his team’s ability to meet its objectives.  In fact, making the effort to do the laborious work of sending each item of content on a regular basis wouldn’t bring any tangible benefit to his team’s performance metrics.

This scenario may seem like a caricature of a dysfunctional company.  But many firms face these kinds of internal frictions, even if the most prevalent cases happen more subtly.

Many organizations know on a visceral level that silos are a burden and hinder their capability to serve customers and grow revenues. But they may not have a vivid understanding of what specific frictions exist, and the costs associated with these frictions. Sometimes they’ve outlined a generic high-level business case for adopting structured content across their organization that talks in terms of big themes such as delivery to mobile devices and personalization.  But they often don’t have a granular understanding of what exact content to prioritize for structuring.

The Dilemma of Moving to Structured Content

Many organizations that try to adopt structured content in a wholesale manner find the process more involved than they anticipated.  It can be complex and time-consuming, involving much organizational process change, and can seem to jeopardize their ability to meet other, more immediate goals.  Some early, earnest attempts at structured content failed, when the enthusiasm for a game-changing future collided with the enormity of the task.  De-siloing projects also run the risk of being ruthlessly de-scoped and scaled-back, to the point where the original goal looses its potency.  When the effort involved comes to the foreground, the benefits may seem abstract and distant, receding to the background. Consultant Joe Pairman speaks about “structured content management project failure” as a problem that arises when the expectations driving the effort are fuzzy.

Achieving a unified content strategy based on coordinated, structured content involves a fundamental dilemma.  Firms  with the most organizational complexity and that stand to benefit most are the ones that have the most silos to overcome.  They frequently have the most difficulty transitioning to a unified structured content approach.  The more diverse your content, the more challenging it is to do a total redesign of it based on modular components.

“The big bang approach can be difficult,” Rebecca Schneider, President of Azzard Consulting, noted during the panel discussion [at the Content Strategy Applied conference]. “But small successes can yield broad results,”  according to a Content Science blog post

Content Hacking as an Alternative to Wholesale Restructuring

If wholesale content restructuring is difficult to do quickly in a complex organization, what is the alternative?  One approach is to borrow ideas from the Create Once, Publish Everywhere (COPE) paradigm by using APIs to get content to more places.

Over the past two years, a number of new tools have emerged that make shifting content easier.  First, there are simple web scraping tools, some browser-based, that can lift content from sections of a page.  Second, there are build-your-own API services such as IFTTT and Zapier that require little or no programming knowledge.

Particularly interesting are newer services such as Import.IO and Kimono that combine web scraping with API creation.  Both these services suggest that programming is not required, though the services of a competent developer are useful to get their full benefits.  Whereas previously developers needed to hand-code using say, PHP, to scrape a web page, and then translate these results into an API, now much of this background work can be done by third party services.  That means that scraping and republishing content is now easier, faster and cheaper.  This opens new applications.

Screenshots of kimono
Screenshots of Kimono (via Kimono Labs)

Lowering the Barriers to Sharing Content

The goal for the B2B division product manager is to be able to reuse content from the consumer division without having to rely on that division’s staff, or on access to their systems.  Ideally, she wants to be able to scrape the parts she needs, and insert them in her content.  Tools that combine web scraping and API creation can help.

Generic process of web scraping/content extraction and API tools
Generic process of web scraping/content extraction and API tools

The process for scraping content involves highlighting sections of pages you want to scrape, labeling these sections, then training the scraper to identify the same sorts of items on related pages you want to scrape.  The results are stored in a simple database table.  These results are then available to an API that can be created to pull elements and insert them onto other pages.  The training can sometimes be fiddly, depending on the original content characteristics.  But once the content is scraped, it can be filtered and otherwise refined (such as given a defined data type) before republishing.  The API can specify what content to use and its source in a range of coding languages compatible with different content delivery set-ups.

The scrape + API approach mimics some of the behavior of structured content.  The party needing the content identifies what they need, and essentially tags it.  They define the meaning of specific elements.   (The machine learning in the background still needs the original source to have some recognizable, repeating markup or layout to learn the elements to scrape, even if it doesn’t yet know what the elements represent.)

While a common use case would be scraping content from another organizational unit, it might also have applications to reuse content within one’s own organizational unit.  If a unit publishing content doesn’t have well-defined content themselves, they are likely having trouble reusing their own content in different contexts.  They may want to reuse elements for content that address different stages of a customer journey, or different audience variations.

Benefits of Content Hacking

This approach can benefit a party that needs to use content published elsewhere in the organization.  It can help bridge organizational silos, technical silos, and channel silos that customers encounter when accessing content.  The approach can even be used to jump across the boundaries that separate different firms.  The creators of Import.IO, for example, are targeting app developers who make price comparison apps.  While scraping and republishing other firms’ content without permission may not be welcomed, there could be cases where two firms agree to share content as part of a joint business project, and a scraping + API approach could be a quick and pragmatic way to amplify a common message.

As a fast, cheap, and dirty method, the scrape + API approach excels at highlighting what content problems need to be solved in a more rigorous way, with true content structuring and a common, well-defined governance process.  One of the biggest hurdles to adopting a unified, structured approach to content is knowing where to start, and knowing what the real value of the effort will be.  By prototyping content reuse through a scrape + API approach, organizations can get tangible data on the potential scope and utilization of content elements.  APIs make it possible for content elements to be sprinkled in different contexts.  One can test if content additions enhance outcomes: for example, driving more conversions. One can A/B test content with and without different elements to learn their value to different segments in different scenarios.

Ultimately, prototyping content reuse can provide a mapping of what elements should be structured, and prioritize when to do that.  It can identify use cases where content reuse (and supporting content structure) is needed, which can be associated with specific audience segments (revenue-generating customers) and internal organizational sponsors (product owners).

Why Content Hacking is a Tactic and not a Strategy

If content hacking sounds easy, then why bother with a more methodical and time-consuming approach to formal content structuring?  The answer is that though content hacking may provide short-term benefits, it can be brittle — it’s a duct tape fix.  Relying on it too much can eventually cause issues.  It’s not a best practice: it’s a tactic, a way to use “lean” thinking to cut through the Gordian knot of siloed content.

Content hacking may not be efficient for content that needs frequent, quick revision, since it needs to go through extra steps of being scraped and stored. It also may not be efficient if multiple parties need the same content but want to do different things with the content — a single API might not serve all stakeholder needs.  Unlike semantically structured content, scraped content doesn’t enable semantic manipulation, such as the advanced application of business logic against metadata, or detailed analytics tracking of semantic entities. And importantly, even a duck tape approach requires coordination between the content producer and the person who reuses the content, so that the party reusing content doesn’t get an unwelcome surprise concerning the nature and timing of content available.

But as a tactic, content hacking may provide the needed proof of value for content reuse to get your organization to embark on dismantling silos and embracing a unified approach.

— Michael Andrews

Categories
Content Efficiency

Four approaches to content reuse

How organizations approach reusing content impacts their publishing efficiency, and their ability to serve audience needs. Four distinct approaches to content reuse exist, each of which focuses on different goals. Due to specialization in the content profession, content professionals may be familiar with only some content reuse approaches. To support broader organizational objectives effectively, content strategists should become familiar with all four alternative approaches to reuse, since each offers each unique benefits.

Why content reuse matters

While content reuse is a topic of active discussion in the content profession, no one definition for content reuse adequately captures its various meanings. In practice, there are four distinct types of content reuse:

  • Ad hoc reuse of assets
  • The planned reuse of content components
  • Enabling reuse of content across channels
  • Selective reuse through adaptive content

Nearly everyone agrees reusing content is a good thing. Content professionals sometimes invoke the phrase “single sourcing” to suggest the notion that one “source” can serve all needs, both internally and for audiences. What is being reused, exactly? Is the source a database? A file? A finished piece of content?

Many different specialities work with content. Each specialty is working to solve an aspect of reuse and will tend to promote its approach as a solution the core problems associated with poor content reuse. But specialists are not always aware of the larger picture needs of complex organizations or multidimensional audiences. Solution advocacy can sometimes create own silo problems!

When discussing content reuse, it is important to distinguish between reusing as-is content, recycling (repurposing) content, and providing on-demand, customized content. Is the source granular or whole? For example, is the source a whole video recording, or a collection of video snippets? Is the source a document, or a library of documents?

Different reuse approaches reflect different goals. All are valid, but none are complete. At present, no one approach will address all needs faced by enterprise scale publishers.

Specifying content

The term content is abstract and fuzzy, open to various interpretations. Content may be raw or finish, partial or complete. We need to understand different levels or states of content. Fortunately, we can draw on insights from library science to distinguish different levels of specificity by using a concept called the FRBR. [1]

The FRBR model provides levels to analyze content, divided according to how explicit the description of the content is. The key levels of concern to us are work, expression, and manifestation. If the content item is a book, it might be described as follows:

  • Work (Bible)
  • Expression (King James translation)
  • Manifestation (1994 Oxford University Press edition)

The work is the raw content, the underlying intellectual property. It might be a class of content such as a novel or symphony. It describes the content or asset.

The expression identifies a version of the content.

The manifestation specifies the content’s specific revision or a rendition, for example, the edition, format, mode of access, or date of publication.

The table below illustrates the hierarchy, with rough equivalents in content strategy.

FRBR Concept Level of Identification Rough Equivalent in Content Strategy Example
Work Described by a Title Assets relating to a topic Long, unedited video file
Expression Uniquely ID’ed Collection of content components relating to a topic Tagged video clip highlights
Manifestation Versioned Finished content about topic Linked series of transcript-captioned video segments

Different levels of content reflect different frequencies of change and target audiences. Assets don’t change; they are repurposed. Components can be revised, but there will only be one version of a component at a given time. Content composites seen by audiences may come in multiple versions, which can exist simultaneously.

Rather than describe everything as content, it is more helpful to separate different notions:

  • content (items audiences consume)
  • content components (recurring elements incorporated in audience facing content)
  • assets (intellectual property used to create finished content)

Delivering equivalent content to different platforms: COPE

As content channels have multiplied, publishers have needed to make their content available to different devices and different kinds of content customers. The approach known as COPE (Create Once, Publish Everywhere) addresses the issue. Rather than recreate multiple versions of the same content for different devices or platforms, publishers can use standards and structure to provide the same content through an API that can be accessed by a variety of applications. The same content is used in multiple contexts, often distributed simultaneously. Since reuse can imply using the same content at different points in time, the notion of “content once” being published everywhere may be better thought of as multi-use content distribution.

One goal of COPE is the wide dissemination of content across different channels. COPE started as a technology solution to address point-of-failure concerns when publishing to multiple parties from a single database of content. Over time, it has evolved into an approach to syndicate content to other parties.

What COPE does

In the COPE approach, a central content database provides multiple versions of the same content to different people and devices. The original idea didn’t foresee revisions to the content (hence: create once), and also presumed that core essence within content items pushed to different endpoints would be essentially the same. Different technical packages (formats and associated metadata) allow endusers to consume the version of content they want. Technical endusers (content partners and third party app developers) are able to choose which content items they want, but generally lack the ability to request specific components of content from within an item. The API disseminates a large, structured chunk, but not finely defined, reconfigurable chunks. Content consumers choose which content host to use to access the content. They might use their local radio station’s website, or NPR’s own app to access the same content.

Benefits and limitations of COPE

COPE is an effective approach to disseminate articles to multiple partners and platforms. Because of its push orientation, it is not optimized to offer personalized content that responds to specific requests from content consumers. As originally conceived, the body of the content is static.

Reusing common elements in different content products: the DITA model

While COPE is largely focused on formats and metadata, another reuse approach is focused on reusing components of content within the body-field of an item.

Publishers of technical content have championed reusing specific content components in different items of content. Technical documentation is repetitive. Much writing is redundant, where the same text is being repeated in many places. Technical writers sometimes speak about the ideal of WOOO: Write Once and Once Only.

Component reuse is closely associated with an approach called DITA (Darwin Information Typing Architecture), an XML schema originally developed by IBM. DITA is designed to address specific publishing issues with user assistance for technical products, though many DITA proponents argue it can be successfully used for other kinds of content.

For the most part, the motivations behind DITA have been writing efficiency and consistency, rather than audience needs. Few individuals will ever read the many minor variations of content possible with a DITA document, and content variations are largely defined by topic variants rather than reflect audience preferences.

Reusing Components through Transclusion

Most approaches that reuse content components rely on transclusion. Transclusion is the process of incorporating content into an item of content from another source by use of a link to that source. In its most simple form, it is similar to when one embeds an item of content in another, such as embedding a slideshow or YouTube video hosted elsewhere in an article you’ve written. In DITA, the process is called a conref or content reference. Transclusion is a core concept not only in DITA but also in MediaWiki, which powers Wikipedia among other sites. Transclusion allows the same content to be used in multiple locations in Wikipedia.

Transclusion can be applied to any item of content: a word or phrase, a paragraph, or a large section.

A related approach is to show and hide components depending on certain criteria, perhaps intended audience segment. Business customers might see a certain paragraph, while consumers wouldn’t see that paragraph. The process of showing and hiding XML nodes is called profiling in DITA. It allows the output of multiple documents (variations on the master document) from a single source.

Benefits and limitations of Transclusion

Reusing components is effective when there is a repetition of messages, and regular variations among specific components. It can provide efficiencies and consistency for content that is highly regular and needs to be delivered in a uniform manner. If business requirements mandate that all customers see the same terms and conditions in the content regardless of what content they see, transclusion can be an effective approach.

The weakness of transclusion is that it is not very flexible. DITA, for example, assumes a linear flow of content from the publisher to the content consumer. It presupposes content elements can be planned and compiled into well-structured formats. That vision implies the presence of regular content entities and that one can anticipate the exact circumstances of when these entities are required by endusers.

Embedding content through link referencing, or hiding content through profiling, is not very dynamic. The process can groan when the variations become complex. It is also difficult for the publisher to confidently say precisely what an audience wants, and so there is a tendency to deliver too much content because it is easy to include it. Transclusion, by itself, doesn’t adapt to specific audience demands for information, or marketers’ desire to change the messaging in response to CRM and real time analytics data. The motivation to write once only doesn’t accord with audience desires to pick and choose what content they want to see at a given time. It is not clear if the XML-based structure of DITA will be up to the demands of real time personalization associated with performance-based marketing.

Mark Baker noted recently some other shortcomings of transclusion:

“Reusing text where you would have been writing substantially the same text anyway is clearly the right thing to do. But taking all the various ways in which you might express an important idea and combining them into one expression is a bad idea. Your idea will have more impact and more reach if it is expressed in different ways and in different media for different audiences, different purposes, and different occasions.”

Asset Reuse: the DAM model

A third approach to content reuse relates to assets. Reusing assets allows organizations to exact more value from their intellectual property. It recognizes that rich assets can be potentially applicable to different contexts at different times. A systematic approach to asset reuse requires a centralized repository for the raw material that authors draw upon to create audience-facing content.

How Asset Reuse works

A growing number of web publishers — though still a minority — have repositories to hold digital assets that are used to create content for audiences. They may use:

  • A digital asset management (DAM) system for videos, audio, graphics and photos, including brand assets and templates
  • An enterprise content management (ECM) system for complex documents, such as legal documentation
  • A database or file server to store code or data files that can be repurposed

Such repositories differ in purpose for content management systems, which are geared toward the creation and management of content for audiences. Unlike a CMS, a DAM may contain content that is neither currently published, nor being readied for publication.

The varied types of assets that can be stored in a repository share certain characteristics. Assets frequently involve complex workflows. They may involve substantial editorial oversight, to produce and prepare for publication. Unique approvals may be required, such as for branding assets stored in a DAM, or legal copy stored in an ECM. Data, perhaps from a periodic customer survey, may be stored in databases that require running structured queries and reports before they can be made available for content authors to use. Photo archives may have permissions and licensing requirements that must be vetted before items are available for publication.

When considering asset reuse, it helps to know how stable the asset is. Elizabeth Keathley distinguishes between static assets and living assets.

Static assets are generally stable and don’t change often. If they do change, there will only be one version at a time, with a persistent ID. These assets may have associated use rights governing when and how they are used, and by whom. The asset creator may have an explicit goal of preventing derivative reuse, such as prohibiting unapproved modifications of brand assets.

Living assets can be repurposed to support different goals, and are sometimes converted into different formats. Living assets are commonly composed of compound asset parts and have elaborate workflows to produce them. They are not simply derivative of other assets but are substantially original. A living asset is broadly equivalent to a work in FRBR terminology. Other items of content are derived from a living asset, and these will have identities separate from the master asset. Because the structure of living assets is complex and irregular, they are not as readily broken into content components, especially if an exact need for elements in the asset cannot be predicted in advance. Also, the nature of repurposing content means that the approval process will be different than it is for content components involving planned reuse for defined purposes.

Benefits and limitations of DAMs

DAMs and other asset repositories can offer authors a richer library of content than available in CMSs. Unlike with a CMS, authors are not restricted to a narrow perspective where they only see and have access to currently published content.

DAMs have challenges as well. Unless actively managed, metadata descriptions can be poor, hindering asset retrieval. Some DAM systems are improving auto tagging of assets to reduce the burden on contributors. Another limitation is that DAM assets are generally not directly accessible by audiences, so audience requirements for access to this content needs to be understood and planned in advance.

A framework for content reuse

Shows relationship between DAMs for digital assets, DITA, COPE, and adaptive content

The conceptual diagram reflects different content reuse activities according to their purpose. It is not meant to show specific platforms or systems, which vary considerably in practice. Only a few publishers perform all these activities as part of an integrated end-to-end process. The path from potential assets to ready-to-consume content resembles a waterfall: one is dependent on what content is available upstream.

The limits of specialized solutions

Relying on one approach entails various potential pitfalls. Not having a DAM means that potentially valuable content assets are siloed within different organizational departments and not available to authors. A failure to plan for modular reuse of content components hinders efficiency and consistency, and hurts the audience experience as well. Relying on responsive web design might be effective to reach immediate consumers, but won’t allow partners to reuse your content the way an API would allow, and might therefore reduce the total reach of your content.

Many aggravations arise from a poor conceptual understanding of the granularity of content, and how frequently different elements change and are used within the organization. Authors may try to reuse content that is actually a compound object made up of different assets and components. They may actually need to only reuse some parts of the content.

A core issue with reuse is whether the content continues to be up-to-date and accurate. Unfortunately, just because something is currently published does not indicate it should be reused elsewhere. A table that complements an article might be sufficiently current to stay on a website, but really shouldn’t be incorporated in new content without updating. Content created for one audience may seem to offer a good blueprint for new but similar content for another audience. But in the course of repurposing this content, the authors may conclude that revisions are needed for content that is being reused. What is sufficiently current is often a judgment call based on resources and mission importance.

Publishers face another challenge: the tension between content modularity and integration. While technical documentation can generally be disaggregated into modular components, other content is more powerful when tightly integrated. Ideally content elements should support one another, rather than simply be presented together. But cross-dependency among elements make them less attractive candidates to manage as separate components. A reusable, adaptable template may be a better approach when elements tend to occur together in an integrated manner. Authors may want to reuse the structure of the body of the content without reusing the actual content components.

Adaptive content and reuse

The newest approach to content reuse is known as adaptive content. Unfortunately, there is no widely accepted definition of adaptive content, and content professionals tend to speak about adaptive content in different ways. The phrase provokes two obvious questions:

  1. What adapts?
  2. To what does it adapt?

Sometimes people will speak about “the content” adapting to “the device” the individual is using. That interpretation is not much different from responsive web design, and is not very ambitious. It should be possible to have the content itself change based on any number of criteria, such as contextual factors (location, time of day, user status), and various user preferences or behaviors. I would rather define adaptive content in terms of the goal it supports.

Adaptive content
content that changes what is presented to reflect the intentions of the content consumer.

How Adaptive Content works

Adaptive content relies on the use of algorithms and audience data to change the content. There are significant differences between preplanned content variations such as are specified in DITA, and enabling dynamic, on-demand variations associated with adaptive content. Adaptive content builds on transclusion and COPE, but extends it.

Content reuse to support adaptive content must accommodate on-demand access to content by individuals, to deliver content composed of components that reflect the interests and needs of an individual when they ask for them.

An early example of adaptive content is the NPR One app for audio content. Individuals indicate what kinds of programming they want, rather than having the publisher deciding that for them. NPR extends its API not only to content partners (local radio stations who add local content), but also to the end consumer of the content, giving them control over what content they receive through likes and shares. The app is adaptive, but not entirely a content on-demand solution, since it is based on streaming.

Benefits and limitations of adaptive content

To realize the goal of having content components available on demand, responding to user preferences in real time, will remove the problems associated with publishers making wrong guesses about what someone wants to view. The limitation of this approach is the complexity it introduces for publishers. They need to think even harder about where the value of their content resides, based on actual use analytics, and structure the content elements to allow retrieval. Web searchers can now cherry-pick information in the search results to get the exact content items they want from articles marked up in schema.org. Such behavior provides a preview of how content will need to become adaptive to user needs.

Conclusion

Content reuse is rich with possibilities. Different content specializations are working to improve reuse. It is useful to understand different approaches. By combining approaches, one can support an integrated strategy that improves both internal goals such as efficiency and governance, and external goals such as personalization and engagement.

— Michael Andrews


  1. FRBR stands for Functional Requirements for Bibliographic Records. FRBR’s focus is on bibliographic records for long-form content such as books, sound recordings, and films. Its focus is different from that of content strategy, so it will not be exactly equivalent. It offers helpful insights as long as we don’t expect literal compliance to its terminology. My apologies to librarians if I run roughshod over these concepts.  ↩