Categories
Misinformation and AI

Ghost-busting Generative AI

People are increasingly tempted to outsource their thinking to chatbots. Chatbots seem to provide answers without the hassle of wading through individual articles.   

But unlike content written by human authors, chatbots aren’t accountable for what they write. Chatbots place a greater burden on readers to determine the veracity of online statements. Yet, for a variety of reasons, it is difficult for readers to assess where information in chatbot responses comes from, how complete it is, and whether it is reliable. 

This post explores the trust deficit inherent in chatbot responses. It will identify common weaknesses in the quality of responses and suggest countermeasures that users of AI tools can take to obtain more reliable information.

The rise of phantom authorship

If you read a best-selling book by someone famous, chances are that book was written by a ghostwriter, whose role is not acknowledged. To be clear: I’m not referring to books by a renowned author; few individuals become famous because of a book they’ve written.

Camera-preening media personalities seldom write books.  But their name recognition sells books. Actors, sports stars, politicians, TV hosts, influencers, and other celebrities license their names to books produced by others.  In such licensing deals, the party that develops the product is a mystery. The putative author pretends to have written the book, while the reader engages in “suspension of disbelief,” pretending that the story they are reading is genuine.

A similar phenomenon is happening with online content: it’s becoming unclear who wrote it.  News stories, research findings, regulatory notices, cooking recipes, and health advice get reinterpreted by AI robots as “responses” to user prompts.

An example of how pervasive chatbots are becoming is the new wave of health bots. Last month, both Microsoft and Perpexity announced health chatbots.

Generative AI is similar to ghostwriting: in both cases, the writer is anonymous and lacks firsthand knowledge of the topic.

In fact, GenAI is beginning to cannibalize traditional ghostwriting. “Today, ghostwriting websites must work to advertise why they could perform their writing-for-hire services better than a machine,” notes Emily Hodgson Anderson at USC.

Now, machines do the ghostwriting.  Online content is becoming dominated by phantom authorship.

While some users are willing to accept chatbot responses as genuine, many have nagging doubts about the trustworthiness of the outputs.

The growing supply and demand for machine-synthesized content

Ghostwriting and generative content are manifestations of a broader “explainer” economy, where one party makes money by explaining the work of others.  Explainers promise to improve clarity and reduce the time required to understand topics.  The benefit they sell is productivity.

Explainers are not new.  Students have long used Cliff Notes to get the essence of classic novels — without the effort of reading them. More recently, a range of apps, such as Blinkist and Shortform, promise 15-minute summaries of current bestsellers.

Generative AI has made it possible to turn any content — audio, video, PDFs, PowerPoints, web pages, Word docs — into a short explanation.

Talks, interviews, legal documents, scientific studies, blog posts, and academic articles, each created by individuals or small teams of collaborators, can all be regenerated by AI chatbots.

Generative AI rewrites and repackages original content into various products:

  • As a summary of a single source from a single author
  • As a retelling of discrete cases, anecdotes, or incidents into a common story arc
  • As the synthesis of multiple sources from different authors

The supply of synthesized content is now endless.  It’s easy, cheap, and profitable to repackage original content. GenAI tools rely on “word spinners” that rephrase and restructure the original content to make it more “engaging” and avoid plagiarism and copyright infringement.

And users can’t seem to get enough of this effort-saving convenience.  Historically, summaries such as abstracts were descriptions used to preview content that might be read in full.  Now, AI summaries have become the end goal.

An anonymous AI-generated summary increasingly acts as a substitute for reading the original source, such as an interview transcript, research document, or a book. The summary will often be more convenient to read or access, especially in the browser or editor the user relies on.  The original source content is no longer necessary.

So, what’s the problem?

The little “labs” icon on the right is a clue

Trust in machine-generated responses

Because of the impersonal quality of online communications, trust has always been top of mind for users.  AI-generated content adds another complication to users’ concerns about trust because it is even more impersonal than articles on websites.

Traditionally. written content embodied a social contract between the writer and their readers.  The writer addressed readers, who, in turn. expected the writer to reveal themselves through their prose.  Both sides aspired to believe they knew and understood each other on some level.

Corporate online content, being more prosaic, often lacks a byline that identifies an individual as the author. In such cases, the company is promoted to the role of author.  The brand’s reputation guarantees the accuracy of the content.  Ideally, the web page includes a link or contact information so customers can reach a real person if they need information clarified.  While users didn’t know who specifically wrote the content, they at least knew a live human being was accountable for it.  The involvement of a biological entity having a heartbeat was presumed — until the advent of chatbots.

Responses generated by AI platforms lack any discernible connection to a real person.  Readers no longer have a relationship with anyone.  Users are left to imagine what is driving the dynamic between them and the responses they counter.

These anonymous responses seem suspect and untrustworthy because no person seems accountable for what’s said.  It’s unclear who’s behind the statements; asking the bot doesn’t clarify the issue.  Readers are haunted by nagging doubts. Who is saying this, really: the bot’s vendor or the firm that the customer wants to do business with?  Is the message spin or manipulation — is it generated to make me feel a certain way or to placate me?

Chatbots can seem evasive: hard to pin down, too obsequious to trust, alternately vague or over-confident.

And when users feel forced to rely on a bot, they can feel their time is being wasted.  As one comment overheard in an online forum noted: “If it wasn’t worth a human’s time or effort to write, it’s not worth the time or effort for a human to read.”

Such hesitations are not limited to bots.  People are generally spooked by things that happen that can’t be attributed.  For example, many are unnerved by the pervasive influence of so-called “dark money” funding of political campaigns and operations in the United States by anonymous donors.

With bots, missing accountability is not a choice or a bug but an inherent feature of the product.  Few users understand the mechanics of bot training, but they can sense that no one wants to take the blame if the bot behaves badly.

While chatbots can trigger low trust, they can also induce false confidence. Some users will take bot responses at face value, to their detriment.

Boobytraps in AI-generated content

A boobytrap is an apparently harmless object that is no such thing.  Despite its benign exterior, it can exact painful results.  AI responses can be boobytraps. 

AI responses appear reassuring, with friendly wording, each sentence dedicated to a single idea. The responses seem the antithesis of the convoluted double-talk of human rhetoricians. The wording of responses is optimized to disarm skepticism.

I don’t want to make people paranoid about bot responses.  Many are genuinely useful. Yet, it’s prudent not to take them at face value and question their consequences and intent.

Let’s look at some less obvious boobytraps.

Boobytrap 1: Untraceable original information

With authored content, any information not cited as originating from someone else is presumed to be developed by the author. But with AI-generated content, it can be unclear who contributed the information and who is accountable for it.

Even when phantom authors credit sources, it does not follow that they haven’t added original content of their own, or misstated original content.   There’s the possibility of embellishments — statements that sound credible but aren’t entirely accurate. AI-generated responses are like a movie that’s “based on a true story.” The user can’t be sure if the whole package is true.

The concern is not necessarily obvious hallucinations involving wholesale fabrications. The problems may involve real information that doesn’t belong in the context because it isn’t related to the sources or topic covered.  A common occurrence is when generated content conflates people or events because it “borrows” information from extraneous sources.

Boobytrap 2: Inaccurate interpretations

When you can’t read the original content, a summary’s faithfulness to the original meaning becomes important.  Translators who render the words of an author into another language have a responsibility for fidelity.  But AI-generated content can take liberties with phrasing and argumentation.

Chatbot responses often reflect a fast-casual writing style.  Friendly short-form outputs convey familiarity and reassurance, even though the concepts involved may be more complex and nuanced than portrayed.  Chatbots behave as if they can explain nuclear physics to a ten-year-old.

Chatbots are especially prone to misrepresenting original ideas in two areas.

First, chatbots will adopt simplified terminology that can erase the distinctive meaning of the original terminology. Words have specific meanings in particular contexts, but chatbot responses tend to use common words that may not precisely reflect the intent of the original terminology. The facile substitution of word-spinning can destroy the meaning of concepts discussed in the original sources.

Second, chatbot responses tend to disaggregate and nullify the argumentation used in original sources.  Instead of maintaining the sequencing and transitions of explanatory arguments, chatbot responses tend to break thematic ideas into discrete sentences, often strung together as a list of bullets.  To the user, the laundry list of statements reads like a wall of declarations, devoid of coherence.

Responses that seem easier to read are not necessarily easier to understand or more informative.

Boobytrap 3: Third-hand information 

LLMs don’t convey knowledge.  They merely repeat messages in a modified form.  And LLMs can’t distinguish primary sources from secondary ones.  All text is of equal value, regardless of its provenance. 

By the time a user sees an AI chatbot response, they are seeing third-hand information. The chatbot is regenerating text from other sources.  Most of this text will be secondary sources that explain and summarize what subject-matter experts know.  Only a small portion of the texts that LLMs rely on are primary sources written by the experts themselves.  LLMs depend on large quantities of content, even if the quality of that content is suspect. 

It’s easy to see how informational accuracy degrades as a secondary source explains what an expert says, then a chatbot reinterprets what the secondary source says. 

Yet the myth persists that AI chatbots can “reason” and, in doing so, verify knowledge.  But the reality is that, while AI chatbots explain, they have no understanding of what they explain. 

Given their parroting tendencies, LLMs are prone to reproducing at scale the phenomenon known as truthiness — statements that sound true but are substantively empty or even false.  This happens with memes and other folk wisdom that many people believe to be true, and that is widely disseminated online. 

Boobytrap 4: Hidden agendas of unnamed writers

Human readers notice an author’s point of view and their voice. They detect an author’s agenda, which is often spelled out explicitly in a forward or introduction. The author wants to convince you of something and crafts an argument or narrative to that end.

Sometimes the author’s agenda is not explicit, but it can be inferred. If the author is employed by someone or invests in some financial enterprise, we infer they have a vested interest in promoting those interests.

The phantom writer’s agenda is hidden. Human ghostwriters are hired to burnish the image of celebrity clients.  Chatbots are also expected to advance their funders’ goals.

Bots are designed to appear free of self-interest, presenting themselves as loyal, tireless servants of the user. At times, bots complement the user on their questions. Yet, as anyone with a dog will know, even a loyal companion has an agenda of its own. Man’s best friend learns to manipulate its owner to obtain desired rewards.

Unlike a dog, lifeless AI bots lack emotions and a sense of right and wrong.  LLMs are amoral; at most, they have ethical guardrails introduced by developers. Guardrails can also be imposed to censor LLM outputs, as shown in several authoritarian countries.

The agenda of AI platforms is largely commercial and not necessarily aligned with the user.  Like social media and gaming applications, AI chatbots are designed to offer users feedback and rewards to encourage continued use.  Chatbots make users feel good through flattery and convenience. Users marvel at how much more productive and accomplished they’ve become.  Mastery of generative AI tools is a status reward, separating the workers of the future from those left behind.

But the promotion of engagement by AI platforms is subtly different from early waves of online products.  AI platforms envision themselves managing everything in your life and, consequently, are motivated to make users dependent on their products, if for no other reason than the worry that users might start using a competitor.  Platforms are battling in a winner-takes-all competition.

AI capabilities are everywhere because firms want users to get into the habit of relying on AI tools as their default behavior.  As this happens, the user’s ability to use alternative AI products, or none at all, becomes restricted.

While platforms promote user dependence on their products, they also seek to maximize revenues and profits.  The pricing of AI usage is opaque, and users may not be aware how their experience with AI responses is shaped by the platform’s financial objectives. Users should never assume that the response they get is the best one possible.

Several platforms are exploring advertising in chatbots.  Whether such marketing promotion seeps into chatbot responses remains to be seen.  One can easily imagine product placements, such as those that routinely occur in movies, appearing in chatbot responses.

The value of the user for the platform shapes how many tokens the platform is willing to expend to generate a response.  If the user isn’t a high-value prospect, the platform will truncate the response by offering the fastest and easiest response to generate, rather than the most complete one.

Countermeasures to phantom authorship

Despite the problems associated with phantom authorship, I’m not advocating that you refrain from using AI.  Chatbots can be a valuable tool when used judiciously.

GenAI requires a new kind of information literacy, one that builds on best practices of the search era, yet extends them to address both new problems and opportunities.

The most basic principle to keep in mind is that chatbot responses are not answers or data.  They are best thought of as pointers to actual information rather than as vetted knowledge.

Here are some tips to counter bots’ tendency to sound unjustifiably authoritative. 

Countermeasure 1: Authenticate the source

When reviewing a response, start with the question: Who said that? Track down where the information supposedly comes from and look at the source directly.

Some AI platforms provide links to the sources used — following these links helps provide more context for what the response has pulled.  You can check what exactly was said and the framing of the original discussion.  You can evaluate the source’s authority and credibility on the topic, dimensions that chatbots don’t evaluate.  Some platforms have been known to generate “ghost references” of sources that don’t exist.

If a link isn’t provided in the response, note if any source document is mentioned, and search to locate that document.  If the source document is behind a paywall, an archival version may be available. 

Sometimes the source document is overwhelming in its size or complexity. In such cases, don’t be afraid to dig deeper into the content.  Browsers now have built-in AI chatbots that let users explore the source material directly, rather than relying on an AI platform’s initial response that synthesized multiple sources. 

A potential red flag is when AI responses don’t cite anyone specifically for a statement.  Maybe the response reflects a consensus opinion, which could be either right or wrong. Or maybe the AI platform only provided a superficial response to a complex issue.  Oversimplification is a common boobytrap in AI responses.  AI platforms ration the tokens used to generate responses. 

Unless the question is strictly factual, the issue may involve nuances that aren’t reflected.  It’s a good idea to ask more specific questions to drill into potential nuances, and to seek alternative responses from other AI platforms to ensure you are covering all relevant perspectives.

Countermeasure 2: When scanning for information, try more than one platform

AI platforms don’t want people to browse the way they used to, by scanning links in search results.  Platforms synthesize information from diverse sources so users don’t have to.

But diversity of information is a good thing, not a burden. It provides a richer picture of a topic.

Don’t feel pressured to stick with a single platform — don’t be captured by a subscription plan. Be a showroom shopper who is “just looking” at the responses that a platform has to offer, and be willing to check out other platforms.

AI platforms want an exclusive relationship with you.  But for now at least, users have many suitors.  Most platforms offer at least a basic tier for free, allowing users to ask the same question on different platforms.  While doing so will yield some overlap in responses, it will also surface new avenues to explore. 

Countermeasure 3: Ask the core question multiple ways

Chatbots are known for providing slightly different responses to the same prompt. It’s useful to leverage this property to your advantage. Embrace the indeterminacy of bots to widen the range of responses to get useful ones.

There’s no such thing as a perfect prompt, especially when exploring a topic you are unfamiliar with.  You might not be sure how source content discusses a topic, or how the bot will interpret your prompt.  Experimentation helps to clarify these relationships.

Chatbots can be highly sensitive to the terminology used in prompts.  If the bot seems to misinterpret what you are seeking, try different terminology.  Sometimes, more concrete terminology helps; other times, more abstract or general terms work better.

Similarly, broadening and narrowing the requirements expressed in prompts can change the utility of responses. 

Countermeasure 4: Be mindful of the changeability of a topic

Users don’t know what content the LLM was trained on, or how old it is.

Chatbot responses can be biased toward reflecting the dominant views of topics, especially when the prompts are general. What view is dominant will depend on how much it is discussed within the corpus of text on which the LLM was trained.

Some topics are anchored in old information.  A plethora of dated information crowds out nascent information that hasn’t been written about extensively. The old consensus dominates. 

Other topics aren’t covered because they are too old.  Most LLMs are trained primarily on post-1995 web pages, meaning that pre-Internet content is unknown to them. (Anthropic is an exception, as they have scanned millions of paper books to feed into their training.)  Even internet content more than a decade old has often been taken down and has disappeared forever. 

Internet content, upon which LLMs depend, is sensitive to fads and fashions.  Publishers create content based on current reader interests, not long-term interests.  Many topics and perspectives lack comprehensive coverage because they were never part of a popularity wave. 

Be sure to include the timeframe of information you are seeking. Decide what timeframe will yield the most useful insights. If you know that medical guidelines have changed recently, be sure to ask for the most recent guidelines. But in other cases, prioritizing the most recent information will skew responses toward the latest controversies on a topic that may be of little importance to the user. 

Ask how people viewed an issue prior to a certain date, or what the major issues were at different time frames relating to a topic.  Doing so can bring additional perspectives and overcome the recency bias in chatbot responses. It might also reveal what the LLM can’t address, which is also valuable feedback. 

Countermeasure 5: Probe terminology

The closer the response draws from sources developed by experts, the more accurate and reliable the information is likely to be. But experts, while knowledgeable, tend to speak in jargon.

Chatbot responses tend to translate both the experts’ information and their wording simultaneously, which can result in a loss of detail or inaccuracies. For example, if a word represents a fundamental concept in a domain like law or medicine, but a synonym is used, the specific meaning of the underlying concept might not be conveyed. 

Rather than have the chatbot translate both the experts’ information and their wording simultaneously, break these into separate steps.   Ask how the expert would describe the issue, then follow up by asking what unfamiliar terms mean in that context. 

LLMs have been trained on an astronomical amount of text that sheds light on what words mean in specific contexts.  Ask bots what a term means in a specific context and why a concept is important in a specific context.

Ghost-busting Generative AI

Andrej Karpathy, co-founder of OpenAI, has said: “Today’s LLMs are like ghosts.” 

Because so much about chatbot mechanics is invisible, users must infer how responses are generated. They shouldn’t take responses at face value. 

Users need to bring a skeptical mindset to AI tools and be prepared to challenge their responses. 

— Michael Andrews

Categories
Misinformation and AI

Trust, advice, and sourcing: what information is canonical?

In this series, I’ve examined how misinformation can creep into chatbot responses. It can be hard to trust these answers because the information’s provenance is unclear and potentially unreliable. 

What sources should AI bots rely on?  One suggestion is that bots should use canonical content.  If it’s canonical, it should be reliable. As elegant as that solution may sound, the concept of canonical content is waffly.  

This post will define canonical content more precisely, helping us determine when and whether it can guide chatbots’ use of scraped online content.

SEO concepts have lost their authority

Search engine optimization practices, which dominated the website era, developed a series of axiomatic platitudes about the value of content: authoritative content was trustworthy, and trustworthy content was canonical. Lofty words without much meaning. 

In SEO practice, a canonical tag is no more than a self-declaration to bots that a given piece of content was the primary version. Such self-declarations are question-begging: compared to what, exactly?  Search canonicalization applies only to one’s own content. Telling a search engine which page on your website to prioritize didn’t imply that your page was more important than those on another website. This kind of implementation won’t help chatbots decide which sources to draw upon for answers.

It’s time to retire SEO notions of canonical content and instead develop a new approach for the AI era.

Deciding what belongs in the canon involves distinguishing the genuine from the fake or flawed. 

Historically, canons are sacred books or genuine works of the highest quality. Scholars debate whether writings belong in Shakespeare’s canon or in the canon of the greatest works of poetry. Theologians debate canon law.  IT folks talk about canonical data – the sources of record that systems can rely on. What is common in all these domains is that interpretation is involved in deciding what belongs in the canon.  

Both humans and bots need a clear definition of what canonical means and clarity about who makes the decision. Canonical can’t simply be a matter of individual belief.  For the concept to work in practice, various people and machines need a common understanding of what content is canonical. 

Who said that? The importance of the role of information sources

A large portion of online content is crowd-sourced. Bots crawl this online content indiscriminately and can’t distinguish different roles and who is responsible for decisions or information accuracy.  Few chatbot users realize the Wild West rodeo that’s corralling the information fed into the answers they see. 

Scraped online content is often of mysterious provenance.  As I have been writing this series, the New York Times reported on the legal actions that the crowd-source platform Reddit is taking against AI platforms such as OpenAI, Anthropic, Perplexity, and others. The reality of today’s AI ecosystems is that AI platforms are scraping other online platforms for content written by unidentified individuals who may not even have direct knowledge of what they are posting.

AI platforms can’t interpret the roles and responsibilities of the sources they crawl.  A chatbot might decide that Wikipedia is a more trusted source than the social media platform X (formerly Twitter), but it can’t say why.  

Both Wikipedia and X contain statements by people who convey information. The difference is that Wikipedia articles should never be written by someone directly involved, whereas an X post can be. Wikipedia is always third-hand information, while X posts sometimes are first-hand. A statement on X by a famous person about a decision they made or action they took will be more authoritative than a Wikipedia article that footnotes a news article that cites the original post.  

The X versus Wikipedia example highlights the differences in the role of sources.  It’s important to drill down into the details of the different roles and their relationships to information.

We can categorize sources into three tiers: first-party, second-party, and third-party. Each tier involves different types of information creators, who have varying degrees of direct knowledge of what they write about.  

1st party content

  • All content and data developed and published by the party responsible for deciding policies and specifications (prices, rules, service availability, performance, etc.)

2nd party content

  • Statements, postings, and advice offered by partners, distributors, and paid influencers

3rd party content

  • Crowd-sourced information 
  • User-generated content 
  • Republishers of information 
  • Summarizers of 1st party information

First parties offer content that figuratively comes “straight from the horse’s mouth.”  First-party sources are far fewer than third-party sources, which cover a wide range of online content.  Second parties sometimes look like first parties, but aren’t really.

Only first-party information can be canonical

Only first parties have a direct relationship to the knowledge they write about. Consequently, only first-party content can be canonical. That means any source that writes about what others are doing can’t be canonical.  

For example, only a manufacturer of a product is the canonical source of information about that product.  Others writing about the product – whether resellers, customers, reviewers, or news reporters – can’t be considered canonical sources of information about it.  They may have valuable insights about the product, but nothing they say will be a definitive statement.  Unless their role as a source is clearly identified in chatbot answers, people may believe that these second and third-party views are definitive. 

RoleRelationship to knowledgeExamples of content and parties
1st partyThe party responsible for deciding policies and specificationsAll content and data developed and published by the deciding party (prices, rules, service availability, performance, etc.); Government department; Manufacturer; Insurer
2nd partyA party financially or organizationally affiliated with a deciding party, but not responsible for decisions about policies or specificationsStatements, postings, and advice offered by partners, distributors, guest posters, paid or incentivized influencers
3rd partyUnaffiliated party that is not financially dependent on the 1st party, such as a user, customer, competitor, news organization, or automated platformCrowd-sourced information (aggregated from multiple sources); User-generated content (statements by users, customers, citizens, or non-affiliated contributors, which may be posted to a single platform or to distributed platforms); Republishers of information (unaffiliated curators and aggregators of articles and data from various sources); Summarizers of 1st party information

It’s important to differentiate first-party information from the concept of trusted information. Despite considerable overlap, these are distinct concepts, and it’s important to keep them separate.

A product manufacturer is the first-party source of information about its products.  A trusted review publication such as Consumer Reports isn’t.  

The first party provides the baseline information that others will evaluate.  Most often, the first-party information is accurate as far as it goes, though it may be incomplete. That’s one reason second and third-party sources are valuable. 

First-party information is generally accurate, since the organization is legally responsible for the policies and specifications in the content. Readers presume the primary source knows best.  Even so, first-party information might contain errors, omissions, out-of-date facts, or even willful distortions. But since they are responsible for legally binding claims, they are considered the authoritative source. Only they can correct the record. 

But manufacturers are not always first-party information sources. Manufacturers may compare their products to competitors’ products. The information they offer about their competitors’ products is third-party. The notion of a first party is linked to a source and its role. The source alone won’t tell us whether the information is from a first-party. We need to know what it is writing about, and its relationship to that topic.

Third-party information is broad and diverse.  It includes user-generated posts such as product reviews and help questions, as well as news reporting and machine-generated content.

Only some first-party statements are canonical

Not all statements by first parties are canonical. Even when a party is writing about itself, what it says is not necessarily definitive. Whether the statement is canonical depends on whether it is declarative or interpretive.

  • Declarative statements refer to factual assertions
  • Interpretive statements relate to what something means to the writer or for the reader; they are not legally binding claims

Declarative statements include product specifications, pricing, customer warranties, and so on.  They represent promises about what the first party will provide (or not do, because it is the customer or someone else’s responsibility).

First-party interpretative statements aren’t canonical because they aren’t factual or legally binding.  Instead, they are “official” endorsements and advice on how or why others should do something. Most marketing and non-contractual customer care content is interpretative rather than declarative. These statements aren’t absolute directives that would void a warranty if not followed, but rather recommendations that customers are responsible for interpreting and following. Because the importance of these instructions is unclear, it is common for second and third-party advisors to offer their own advice on the same topics. 

The table below shows the kinds of declarative and interpretive statements offered by primary sources (first parties), surrogates (second parties), and outsiders (third parties).

(scope)1st party (the primary source of information)2nd party (an affiliated party) contributing what they believe 3rd party (an unaffiliated party) contributing what they know or think
Declarative (what is said)Canonical statements from the decider of specifications or policiesSurrogates Restatements and rewordingsOutsider Understandings (what it means to them)
Interpretive (what it means)Primary source Justifications: How the decider conveys their decisions (why) Surrogates Perspectives (what’s best for most)Outsider Opinions (what’s best for them)

First parties use surrogates as message multipliers to extend coverage and reach.  If a customer asks how to fix a problem not addressed on the customer help, or advice on making a choice not covered by marketing, a second party might volunteer their own advice independently of the first party.  

Because of their affiliation with the first party, surrogates are often perceived as more trustworthy than unaffiliated outsiders.  However, second-party information is seldom approved or verified by the first party and often addresses edge cases that the first party hasn’t covered.  Second-party statements are never canonical, even when they address factual information.

Both surrogates and outsiders sometimes restate the first party’s factual statements.  For example, a tax preparer might restate an IRS rule in layman’s language that is easier to understand.  But such a restatement, despite its factual nature, will not be canonical because it isn’t issued by the IRS, which is the decision authority on the rule.

Accurate information depends on clear provenance 

Indicating where information comes from is not just a matter of supplying a link to a source, since that source itself may be a compilation of sources.

As soon as the chain of attribution gets complicated, the provenance of the information becomes murky.

Both people and bots need a simple yet robust framework for evaluating how the source of information influences its expected accuracy.  If it’s confusing for people, it’s likely to be confusing for bots, too.

Two factors influence the likely accuracy of the content: the reliability of the information source and its timeliness.  People and bots need these dimensions to be traceable and clear. If evaluating these dimensions gets complex, then people and bots will tend to ignore them altogether. 

Does the information come from the original source that would have decided the information, or is it a pastiche of assertions from random people?  Is the information fresh, or was it cobbled together at different times? 

The central question becomes: who owns the information and takes responsibility for its accuracy?  AI platforms that spider the entire web don’t do that. In some cases, they don’t want to know the origin of the information because it may expose them to potential legal liability for copyright infringement. 

While canonical information provides the benchmark for reliability, online users can’t rely solely on published canonical information. There are too many questions that canonical sources do not answer online. Outside sources can fill these gaps, though they must be scrutinized for accuracy. For example, the New York Times does not normally make the news (acting as a canonical source for stories about itself), but it is often a good source for reporting news that newsmakers don’t publish online themselves. 

Even when the information is not canonical, it’s still possible to evaluate its accuracy, provided it comes from an identifiable source at an identifiable time.  We can assess how intimate and complete the source’s knowledge is, and whether events occurred before, during, or after the content was published.

How, then, can one evaluate the accuracy of crowd-sourced information? Much online information consists of posts from individuals who add facts and observations about topics that otherwise don’t get much coverage.

Crowd-sourced information tends to be most accurate when everyone reports the same thing at the same time. When various people report different things, we need to know if these differences are correlated with differing timeframes. We don’t know whether everyone’s circumstances changed, or whether different people were in different circumstances either at the same time, or at different times. 

What’s wickedly challenging to evaluate is the accuracy of information from a mix of sources developed at different times.  It’s not easy to untangle this information, and, as with Grisham’s law, bad information can drive away trust in good information.

Crowd-sourced content will contain misleading information. Not only is the information not from a clearly identifiable single source that can be traced, but it tends to be composed of contributions made at different times, making it unclear which parts are current. This caution isn’t to imply that crowd-sourced content doesn’t contain valuable information. But finding, evaluating, and contextualizing that information requires sustained attention.  A cursory reading or bot crawl won’t be able to separate the wheat from the chaff.

Accountability in content is essential for AI applications

AI platforms have been happy to crawl crowd-sourced information, with little concern about its provenance. This represents the biggest vulnerability of chatbots to misinformation.

As bots, rather than people, become key readers of crowd-sourced content, we must jettison the nostalgic belief in the “wisdom of the crowd” and the hope that user-generated content is self-correcting because users will spot others’ errors and correct them. In practice, that’s not the case routinely.  

Even Wikipedia, the gold standard for crowd-contributed content, where edits are debated and revised for accuracy, can be bedeviled by misinformation that persists for considerable time before it is corrected – if it ever is.  Unlike most user-generated content, Wikipedia has an established editorial review process, but like all other forms of user-generated content, it relies on the goodwill and time of volunteer contributors who are stretched too thinly to correct more than the most high-profile errors. Unfortunately, these systems have been under severe strain in recent years, and the fabled reliability of Wikipedia may not be something to take for granted in the coming years. 

Past confidence in the democratization of information has eroded alongside changes in online user behavior, as people shifted from active information seekers to passive receivers. They’ve disengaged, developing shorter attention spans and reducing their interest in reading. They’ve decided that swiping left or right is the most effort they are willing to expend. 

Bots look like the answer to lazy interaction. Indeed, bots can correct simple errors – even Wikipedia relies on bots for basic content maintenance. But bots can’t replace active editorial oversight. Bots excel at learning patterns but don’t make critical judgments, despite claims to the contrary. 

AI platforms promise convenience.  But as bots increasingly substitute for people online, the solutions create their own problems – an example of iatrogenic progress.  Once platforms began aggregating reviews and making each review less informative in the process, bots began writing reviews themselves, hiding within the crowd that platforms summarize.  Now, users face a second-order set of problems, where answers might be based on bots harvesting reviews written by other bots.

AI platforms won’t earn credibility until they cultivate and support the sources they use to supply answers. Yet AI platforms seem to be moving in the opposite direction.  Elon Musk is promoting an AI-generated encyclopedia called Grokipedia to replace Wikipedia. The sources of information get more opaque, and their quality more dubious.

While the risk of misinformation is growing on third-party AI platforms, chatbots can provide accurate answers when implemented sensibly. The most reliable chatbots will be those that draw on clear and traceable information. The most direct way to do that is for publishers to develop their own AI platforms, rather than rely on third-party ones.  

– Michael Andrews