Ghost-busting Generative AI

People are increasingly tempted to outsource their thinking to chatbots. Chatbots seem to provide answers without the hassle of wading through individual articles.

But unlike content written by human authors, chatbots aren’t accountable for what they write. Chatbots place a greater burden on readers to determine the veracity of online statements. Yet, for a variety of reasons, it is difficult for readers to assess where information in chatbot responses comes from, how complete it is, and whether it is reliable.

This post explores the trust deficit inherent in chatbot responses. It will identify common weaknesses in the quality of responses and suggest countermeasures that users of AI tools can take to obtain more reliable information.

The rise of phantom authorship

If you read a best-selling book by someone famous, chances are that book was written by a ghostwriter, whose role is not acknowledged. To be clear: I’m not referring to books by a renowned author; few individuals become famous because of a book they’ve written.

Camera-preening media personalities seldom write books. But their name recognition sells books. Actors, sports stars, politicians, TV hosts, influencers, and other celebrities license their names to books produced by others. In such licensing deals, the party that develops the product is a mystery. The putative author pretends to have written the book, while the reader engages in “suspension of disbelief,” pretending that the story they are reading is genuine.

A similar phenomenon is happening with online content: it’s becoming unclear who wrote it. News stories, research findings, regulatory notices, cooking recipes, and health advice get reinterpreted by AI robots as “responses” to user prompts.

An example of how pervasive chatbots are becoming is the new wave of health bots. Last month, both Microsoft and Perpexity announced health chatbots.

Generative AI is similar to ghostwriting: in both cases, the writer is anonymous and lacks firsthand knowledge of the topic.

In fact, GenAI is beginning to cannibalize traditional ghostwriting. “Today, ghostwriting websites must work to advertise why they could perform their writing-for-hire services better than a machine,” notes Emily Hodgson Anderson at USC.

Now, machines do the ghostwriting. Online content is becoming dominated by phantom authorship.

While some users are willing to accept chatbot responses as genuine, many have nagging doubts about the trustworthiness of the outputs.

The growing supply and demand for machine-synthesized content

Ghostwriting and generative content are manifestations of a broader “explainer” economy, where one party makes money by explaining the work of others. Explainers promise to improve clarity and reduce the time required to understand topics. The benefit they sell is productivity.

Explainers are not new. Students have long used Cliff Notes to get the essence of classic novels — without the effort of reading them. More recently, a range of apps, such as Blinkist and Shortform, promise 15-minute summaries of current bestsellers.

Generative AI has made it possible to turn any content — audio, video, PDFs, PowerPoints, web pages, Word docs — into a short explanation.

Talks, interviews, legal documents, scientific studies, blog posts, and academic articles, each created by individuals or small teams of collaborators, can all be regenerated by AI chatbots.

Generative AI rewrites and repackages original content into various products:

As a summary of a single source from a single author
As a retelling of discrete cases, anecdotes, or incidents into a common story arc
As the synthesis of multiple sources from different authors

The supply of synthesized content is now endless. It’s easy, cheap, and profitable to repackage original content. GenAI tools rely on “word spinners” that rephrase and restructure the original content to make it more “engaging” and avoid plagiarism and copyright infringement.

And users can’t seem to get enough of this effort-saving convenience. Historically, summaries such as abstracts were descriptions used to preview content that might be read in full. Now, AI summaries have become the end goal.

An anonymous AI-generated summary increasingly acts as a substitute for reading the original source, such as an interview transcript, research document, or a book. The summary will often be more convenient to read or access, especially in the browser or editor the user relies on. The original source content is no longer necessary.

So, what’s the problem?

The little “labs” icon on the right is a clue

Trust in machine-generated responses

Because of the impersonal quality of online communications, trust has always been top of mind for users. AI-generated content adds another complication to users’ concerns about trust because it is even more impersonal than articles on websites.

Traditionally. written content embodied a social contract between the writer and their readers. The writer addressed readers, who, in turn. expected the writer to reveal themselves through their prose. Both sides aspired to believe they knew and understood each other on some level.

Corporate online content, being more prosaic, often lacks a byline that identifies an individual as the author. In such cases, the company is promoted to the role of author. The brand’s reputation guarantees the accuracy of the content. Ideally, the web page includes a link or contact information so customers can reach a real person if they need information clarified. While users didn’t know who specifically wrote the content, they at least knew a live human being was accountable for it. The involvement of a biological entity having a heartbeat was presumed — until the advent of chatbots.

Responses generated by AI platforms lack any discernible connection to a real person. Readers no longer have a relationship with anyone. Users are left to imagine what is driving the dynamic between them and the responses they counter.

These anonymous responses seem suspect and untrustworthy because no person seems accountable for what’s said. It’s unclear who’s behind the statements; asking the bot doesn’t clarify the issue. Readers are haunted by nagging doubts. Who is saying this, really: the bot’s vendor or the firm that the customer wants to do business with? Is the message spin or manipulation — is it generated to make me feel a certain way or to placate me?

Chatbots can seem evasive: hard to pin down, too obsequious to trust, alternately vague or over-confident.

And when users feel forced to rely on a bot, they can feel their time is being wasted. As one comment overheard in an online forum noted: “If it wasn’t worth a human’s time or effort to write, it’s not worth the time or effort for a human to read.”

Such hesitations are not limited to bots. People are generally spooked by things that happen that can’t be attributed. For example, many are unnerved by the pervasive influence of so-called “dark money” funding of political campaigns and operations in the United States by anonymous donors.

With bots, missing accountability is not a choice or a bug but an inherent feature of the product. Few users understand the mechanics of bot training, but they can sense that no one wants to take the blame if the bot behaves badly.

While chatbots can trigger low trust, they can also induce false confidence. Some users will take bot responses at face value, to their detriment.

Boobytraps in AI-generated content

A boobytrap is an apparently harmless object that is no such thing. Despite its benign exterior, it can exact painful results. AI responses can be boobytraps.

AI responses appear reassuring, with friendly wording, each sentence dedicated to a single idea. The responses seem the antithesis of the convoluted double-talk of human rhetoricians. The wording of responses is optimized to disarm skepticism.

I don’t want to make people paranoid about bot responses. Many are genuinely useful. Yet, it’s prudent not to take them at face value and question their consequences and intent.

Let’s look at some less obvious boobytraps.

Boobytrap 1: Untraceable original information

With authored content, any information not cited as originating from someone else is presumed to be developed by the author. But with AI-generated content, it can be unclear who contributed the information and who is accountable for it.

Even when phantom authors credit sources, it does not follow that they haven’t added original content of their own, or misstated original content. There’s the possibility of embellishments — statements that sound credible but aren’t entirely accurate. AI-generated responses are like a movie that’s “based on a true story.” The user can’t be sure if the whole package is true.

The concern is not necessarily obvious hallucinations involving wholesale fabrications. The problems may involve real information that doesn’t belong in the context because it isn’t related to the sources or topic covered. A common occurrence is when generated content conflates people or events because it “borrows” information from extraneous sources.

Boobytrap 2: Inaccurate interpretations

When you can’t read the original content, a summary’s faithfulness to the original meaning becomes important. Translators who render the words of an author into another language have a responsibility for fidelity. But AI-generated content can take liberties with phrasing and argumentation.

Chatbot responses often reflect a fast-casual writing style. Friendly short-form outputs convey familiarity and reassurance, even though the concepts involved may be more complex and nuanced than portrayed. Chatbots behave as if they can explain nuclear physics to a ten-year-old.

Chatbots are especially prone to misrepresenting original ideas in two areas.

First, chatbots will adopt simplified terminology that can erase the distinctive meaning of the original terminology. Words have specific meanings in particular contexts, but chatbot responses tend to use common words that may not precisely reflect the intent of the original terminology. The facile substitution of word-spinning can destroy the meaning of concepts discussed in the original sources.

Second, chatbot responses tend to disaggregate and nullify the argumentation used in original sources. Instead of maintaining the sequencing and transitions of explanatory arguments, chatbot responses tend to break thematic ideas into discrete sentences, often strung together as a list of bullets. To the user, the laundry list of statements reads like a wall of declarations, devoid of coherence.

Responses that seem easier to read are not necessarily easier to understand or more informative.

Boobytrap 3: Third-hand information

LLMs don’t convey knowledge. They merely repeat messages in a modified form. And LLMs can’t distinguish primary sources from secondary ones. All text is of equal value, regardless of its provenance.

By the time a user sees an AI chatbot response, they are seeing third-hand information. The chatbot is regenerating text from other sources. Most of this text will be secondary sources that explain and summarize what subject-matter experts know. Only a small portion of the texts that LLMs rely on are primary sources written by the experts themselves. LLMs depend on large quantities of content, even if the quality of that content is suspect.

It’s easy to see how informational accuracy degrades as a secondary source explains what an expert says, then a chatbot reinterprets what the secondary source says.

Yet the myth persists that AI chatbots can “reason” and, in doing so, verify knowledge. But the reality is that, while AI chatbots explain, they have no understanding of what they explain.

Given their parroting tendencies, LLMs are prone to reproducing at scale the phenomenon known as truthiness — statements that sound true but are substantively empty or even false. This happens with memes and other folk wisdom that many people believe to be true, and that is widely disseminated online.

Boobytrap 4: Hidden agendas of unnamed writers

Human readers notice an author’s point of view and their voice. They detect an author’s agenda, which is often spelled out explicitly in a forward or introduction. The author wants to convince you of something and crafts an argument or narrative to that end.

Sometimes the author’s agenda is not explicit, but it can be inferred. If the author is employed by someone or invests in some financial enterprise, we infer they have a vested interest in promoting those interests.

The phantom writer’s agenda is hidden. Human ghostwriters are hired to burnish the image of celebrity clients. Chatbots are also expected to advance their funders’ goals.

Bots are designed to appear free of self-interest, presenting themselves as loyal, tireless servants of the user. At times, bots complement the user on their questions. Yet, as anyone with a dog will know, even a loyal companion has an agenda of its own. Man’s best friend learns to manipulate its owner to obtain desired rewards.

Unlike a dog, lifeless AI bots lack emotions and a sense of right and wrong. LLMs are amoral; at most, they have ethical guardrails introduced by developers. Guardrails can also be imposed to censor LLM outputs, as shown in several authoritarian countries.

The agenda of AI platforms is largely commercial and not necessarily aligned with the user. Like social media and gaming applications, AI chatbots are designed to offer users feedback and rewards to encourage continued use. Chatbots make users feel good through flattery and convenience. Users marvel at how much more productive and accomplished they’ve become. Mastery of generative AI tools is a status reward, separating the workers of the future from those left behind.

But the promotion of engagement by AI platforms is subtly different from early waves of online products. AI platforms envision themselves managing everything in your life and, consequently, are motivated to make users dependent on their products, if for no other reason than the worry that users might start using a competitor. Platforms are battling in a winner-takes-all competition.

AI capabilities are everywhere because firms want users to get into the habit of relying on AI tools as their default behavior. As this happens, the user’s ability to use alternative AI products, or none at all, becomes restricted.

While platforms promote user dependence on their products, they also seek to maximize revenues and profits. The pricing of AI usage is opaque, and users may not be aware how their experience with AI responses is shaped by the platform’s financial objectives. Users should never assume that the response they get is the best one possible.

Several platforms are exploring advertising in chatbots. Whether such marketing promotion seeps into chatbot responses remains to be seen. One can easily imagine product placements, such as those that routinely occur in movies, appearing in chatbot responses.

The value of the user for the platform shapes how many tokens the platform is willing to expend to generate a response. If the user isn’t a high-value prospect, the platform will truncate the response by offering the fastest and easiest response to generate, rather than the most complete one.

Countermeasures to phantom authorship

Despite the problems associated with phantom authorship, I’m not advocating that you refrain from using AI. Chatbots can be a valuable tool when used judiciously.

GenAI requires a new kind of information literacy, one that builds on best practices of the search era, yet extends them to address both new problems and opportunities.

The most basic principle to keep in mind is that chatbot responses are not answers or data. They are best thought of as pointers to actual information rather than as vetted knowledge.

Here are some tips to counter bots’ tendency to sound unjustifiably authoritative.

Countermeasure 1: Authenticate the source

When reviewing a response, start with the question: Who said that? Track down where the information supposedly comes from and look at the source directly.

Some AI platforms provide links to the sources used — following these links helps provide more context for what the response has pulled. You can check what exactly was said and the framing of the original discussion. You can evaluate the source’s authority and credibility on the topic, dimensions that chatbots don’t evaluate. Some platforms have been known to generate “ghost references” of sources that don’t exist.

If a link isn’t provided in the response, note if any source document is mentioned, and search to locate that document. If the source document is behind a paywall, an archival version may be available.

Sometimes the source document is overwhelming in its size or complexity. In such cases, don’t be afraid to dig deeper into the content. Browsers now have built-in AI chatbots that let users explore the source material directly, rather than relying on an AI platform’s initial response that synthesized multiple sources.

A potential red flag is when AI responses don’t cite anyone specifically for a statement. Maybe the response reflects a consensus opinion, which could be either right or wrong. Or maybe the AI platform only provided a superficial response to a complex issue. Oversimplification is a common boobytrap in AI responses. AI platforms ration the tokens used to generate responses.

Unless the question is strictly factual, the issue may involve nuances that aren’t reflected. It’s a good idea to ask more specific questions to drill into potential nuances, and to seek alternative responses from other AI platforms to ensure you are covering all relevant perspectives.

Countermeasure 2: When scanning for information, try more than one platform

AI platforms don’t want people to browse the way they used to, by scanning links in search results. Platforms synthesize information from diverse sources so users don’t have to.

But diversity of information is a good thing, not a burden. It provides a richer picture of a topic.

Don’t feel pressured to stick with a single platform — don’t be captured by a subscription plan. Be a showroom shopper who is “just looking” at the responses that a platform has to offer, and be willing to check out other platforms.

AI platforms want an exclusive relationship with you. But for now at least, users have many suitors. Most platforms offer at least a basic tier for free, allowing users to ask the same question on different platforms. While doing so will yield some overlap in responses, it will also surface new avenues to explore.

Countermeasure 3: Ask the core question multiple ways

Chatbots are known for providing slightly different responses to the same prompt. It’s useful to leverage this property to your advantage. Embrace the indeterminacy of bots to widen the range of responses to get useful ones.

There’s no such thing as a perfect prompt, especially when exploring a topic you are unfamiliar with. You might not be sure how source content discusses a topic, or how the bot will interpret your prompt. Experimentation helps to clarify these relationships.

Chatbots can be highly sensitive to the terminology used in prompts. If the bot seems to misinterpret what you are seeking, try different terminology. Sometimes, more concrete terminology helps; other times, more abstract or general terms work better.

Similarly, broadening and narrowing the requirements expressed in prompts can change the utility of responses.

Countermeasure 4: Be mindful of the changeability of a topic

Users don’t know what content the LLM was trained on, or how old it is.

Chatbot responses can be biased toward reflecting the dominant views of topics, especially when the prompts are general. What view is dominant will depend on how much it is discussed within the corpus of text on which the LLM was trained.

Some topics are anchored in old information. A plethora of dated information crowds out nascent information that hasn’t been written about extensively. The old consensus dominates.

Other topics aren’t covered because they are too old. Most LLMs are trained primarily on post-1995 web pages, meaning that pre-Internet content is unknown to them. (Anthropic is an exception, as they have scanned millions of paper books to feed into their training.) Even internet content more than a decade old has often been taken down and has disappeared forever.

Internet content, upon which LLMs depend, is sensitive to fads and fashions. Publishers create content based on current reader interests, not long-term interests. Many topics and perspectives lack comprehensive coverage because they were never part of a popularity wave.

Be sure to include the timeframe of information you are seeking. Decide what timeframe will yield the most useful insights. If you know that medical guidelines have changed recently, be sure to ask for the most recent guidelines. But in other cases, prioritizing the most recent information will skew responses toward the latest controversies on a topic that may be of little importance to the user.

Ask how people viewed an issue prior to a certain date, or what the major issues were at different time frames relating to a topic. Doing so can bring additional perspectives and overcome the recency bias in chatbot responses. It might also reveal what the LLM can’t address, which is also valuable feedback.

Countermeasure 5: Probe terminology

The closer the response draws from sources developed by experts, the more accurate and reliable the information is likely to be. But experts, while knowledgeable, tend to speak in jargon.

Chatbot responses tend to translate both the experts’ information and their wording simultaneously, which can result in a loss of detail or inaccuracies. For example, if a word represents a fundamental concept in a domain like law or medicine, but a synonym is used, the specific meaning of the underlying concept might not be conveyed.

Rather than have the chatbot translate both the experts’ information and their wording simultaneously, break these into separate steps. Ask how the expert would describe the issue, then follow up by asking what unfamiliar terms mean in that context.

LLMs have been trained on an astronomical amount of text that sheds light on what words mean in specific contexts. Ask bots what a term means in a specific context and why a concept is important in a specific context.

Ghost-busting Generative AI

Andrej Karpathy, co-founder of OpenAI, has said: “Today’s LLMs are like ghosts.”

Because so much about chatbot mechanics is invisible, users must infer how responses are generated. They shouldn’t take responses at face value.

Users need to bring a skeptical mindset to AI tools and be prepared to challenge their responses.

— Michael Andrews