Categories
Content Engineering

Taxonomies to track AI use in content development

AI tools are increasingly used in content development, for both good and bad. As AI’s use becomes more varied and pervasive, readers seek to understand how AI has been used and to assess the credibility and provenance of information. New taxonomies have emerged to enhance transparency in the use of AI.

Most initiatives to track AI use in content development have focused on stamping content that has interacted with AI tools. For example, Microsoft’s “Project Origin” encodes “fingerprints” when its AI tools manipulate content.

While fingerprinting can be useful, it doesn’t, on its own, reveal at what stage or how the AI tool manipulated the content. It provides a vague signal that the content has been “tampered with,” but it doesn’t specify what has changed.

Readers would like a more robust taxonomy to explain how content has changed, so they can determine whether these changes are a problem.

The field of online scientific and technical publishing has been pioneering such taxonomies. Other online publishers can learn from what is underway to prepare for potential issues they may need to address in the future.

A group called the Committee on Publication Ethics (COPE) has developed a taxonomy that explores the nuances of data manipulation. Their taxonomy addresses a range of issues, including consent and copyright. Their breakout highlights the risks AI tools could pose to content integrity.

The COPE case taxonomy covers issues other than data, but its coverage of data is especially relevant to content development:

  • Data fabrication: Making up research details/findings/documents.
  • Data falsification: Altering research details/findings/documents.)
  • Data integrity: When there is data falsification or fabrication, also mistakes/problems leading to data problems.
  • Data manipulation: Issues to do with handling and changing of data.
  • Data misappropriation/theft
  • Data ownership
  • Data, selective/misleading reporting/interpretation
  • Data or information omitted/misreported to mislead/fit a theory, desired outcome, etc.
  • Data, sharing
  • Data, unauthorized use
  • Image manipulation: Includes all changes to original images, whether appropriate or inappropriate; also, image duplication.

While the COPE taxonomy, unlike the Microsoft Project Origin fingerprint, relies on human review, it offers a richer vocabulary for discussing problems. It allows us to distinguish between incorrect content and stolen content, for example.

Much of COPE’s focus relates to distortions in the content’s original meaning. We must also acknowledge that AI tools, when used appropriately, can enhance source content.

Another taxonomy can help bring transparency to how AI tools support content development.

The Generative AI Delegation Taxonomy (GAIDeT) addresses the growing use of AI agents in information development. Its developers note: “Readers can better understand how much of the work was supported by AI and in what way, which helps them to interpret findings with the right context…it can strengthen their trust in the research and reassure them that AI has been used responsibly.”

The GAIDeT taxonomy examines how AI tools are involved in core tasks, from conceptualization to fact-checking. The taxonomy is comprehensive, as it is intended for scientific researchers, but the basic framework applies to anyone developing original content.

Conceptualization tasks include:

  • Idea generation
  • Defining the research objective
  • Formulating research questions and hypotheses
  • Feasibility assessment and risk evaluation
  • Preliminary hypothesis testing

Research of existing content tasks include:

  • Literature search and systematization
  • Writing the literature review
  • Analysis of market trends and/or patent environment
  • Evaluation of the novelty of the research and identification of gaps

Methodological planning tasks for assessing information include:

  • Research design
  • Development of experimental or research protocols
  • Selection of research methods

Software development tasks used to produce or refine information include:

  • Code generation
  • Code optimization
  • Process automation
  • Creation of algorithms for data analysis

Data and information management tasks:

  • Data collection
  • Validation
  • Data cleaning
  • Data curation and organization
  • Data analysis
  • Visualization
  • Reproducibility testing

Writing and editorial tasks:

  • Text generation
  • Proofreading and editing
  • Summarizing text
  • Formulation of conclusions
  • Adapting and adjusting emotional tone
  • Translation
  • Reformatting
  • Preparation of press releases and outreach materials

Ethics review tasks:

  • Bias analysis and potential discrimination assessment
  • Ethical risk analysis
  • Monitoring compliance with ethical standards
  • Data confidentiality monitoring

Quality oversight:

  • Quality assessment
  • Trend identification
  • Identification of limitations
  • Recommendations
  • Publication support

While many of these details may not be relevant to the kinds of content you work with, they illustrate the expansive range of tasks that AI agents can be involved with.

We can see that AI agents can introduce efficiencies — and potentially problems — at many stages of the content development process.

Having a controlled vocabulary to track these issues will be valuable as AI agents become embedded in content processes. It can provide readers with more context on how AI has been used.

— Michael Andrews

Categories
Content Engineering

AI meets content orchestration

When Generative AI emerged, interest in structured content went into hibernation. It seemed like LLMs could generate text on demand, tailored to any specific scenario. All you needed was a bank of prompts to cover the scenarios you had in mind.

Alas, the real world turns out to be more complicated. LLMs don’t have enough context to provide highly tailored content, and the information sources are too diverse for them to understand what to draw on.

As organizations drop magical thinking about LLMs and embrace greater realism, they are returning to the concept of orchestration.

I wrote about orchestration a couple of years ago. I noted that “structured content enables online publishers to assemble pieces of content in multiple ways.” But the challenge is knowing how to assemble those pieces, the job of orchestration. It involves matching the user’s intent, the content’s intent, and the organization’s readiness to address the user’s needs.

Orchestration is challenging for many reasons. It involves many types of inputs to consider and a mix of hard and soft rules for deciding how to assemble the right content.

AI — not just LLMs, but also predictive and semantic AI — offers a range of tools to support decision-making in orchestration. What previously seemed too complex is now more possible.

The growth in the implementation of the model context protocol (MCP) is helping connect LLMs with the information sources they need to draw on and with the applications required for decision-making and action. Whereas previously, a handful of firms were positioning themselves to be a proprietary connection layer, now connections are becoming open.

The most promising new orchestration tool is Activepieces, an open-source tool that enables organizations to integrate data, content, LLMs, and workflow tools. It already integrates with Drupal, for example, allowing organizations to pull content from that CMS, combine it with database data, and leverage LLMs to generate content. These tasks can be orchestrated through workflow rules. The tool is open-ended, allowing for a range of orchestration possibilities.

Tools like Activepieces will support the development of AI-native content. We’ll find that providing the right content to the right person at the right time requires a diverse array of content, data, rules, policies, and design decisions. Orchestrators will need to consider context from many perspectives.

— Michael Andrews