As people become more familiar with AI bots, they are recognizing how vulnerable these bots are to failure.
AI hallucinations have triggered a moral panic. We’ve read alarming stories of AI bots instructing people to cook a pizza using glue as an ingredient. AI gibberish can be dangerous if not controlled properly. How can we rein in the bots?
Feedback alone won’t solve AI misinformation
Misinformation is a broader problem than nonsensical information. But the preoccupation with AI nonsense encourages us to view bot shortcomings as behavioral problems. Bots need fixing or need better coaching.
Some countermeasures to AI quality problems center on better prompts, such as warning bots not to suggest doing anything dangerous. Other approaches attempt to reform bots.
If chatbots exhibit undesirable behaviors, we should train them to behave properly. Researchers approach chatbot vulnerabilities as if disciplining a naughty child. Somehow, tweaking the feedback a chatbot gets will stop its bad behavior.
Pigeons can be trained with feedback, and so too can chatbots, the thinking goes.
The reeducation of chatbots may fix hallucinations, but that alone won’t make chatbots reliable. The weakness lies not in the bot itself, but in what it can ascertain about the vast quantity of details it needs to address.
In principle, AI can learn to think correctly. It can be guided by more precise instructions and get reinforcement for its conclusions. Hullicinations are a technical problem undergoing active research.
But feedback won’t solve chatbots’ biggest vulnerability: their susceptibility to promoting misinformation by offering authoritative-sounding answers to questions that require nuanced explanation.
The most common problem isn’t that the bot is making up answers but that it is distorting them. While hallucinations may be fixed over time, distortions in answers seem resistant to fixing.
False information can be rooted out. For example, prompt instructions can tell bots to avoid guessing and to explain their rationale. However, misinformation is harder to spot because it is not inherently wrong, but rather potentially misleading in many circumstances.
Origins of misinformation
Bots often get some details right but other details wrong, and consequently, it’s hard to know how accurate or complete an answer is. The bot’s answer sounds plausible, seemingly grounded in fact, but it may omit information or skew toward generic answers that fail to account for critical nuances. Bots act overconfidently and are prone to distorting information.
Most chatbots rely on text that’s scraped from sources they don’t control. OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and similar AI platforms take online content that was developed by others. That content was not developed with chatbots in mind. Chatbots are an unintended audience.
Online content that’s scraped by bots was intended for human readers on a specific website, who could see the context of the text’s intent and meaning.
The scraped text that’s been fed into a chatbot has been stripped of its original context. As a consequence, chatbots are inherently prone to misunderstanding the sources they scrape.
Third-party AI platforms rely almost exclusively on crawled information from other sources they haven’t vetted for accuracy or timeliness. Harvesting online content created by others is their business model. Their value is providing aggregation and convenience.
Although AI web crawling is similar to search indexing, AI platforms combine information from various sources instead of collecting a list of links to them. Distinctions among the sources disappear. The context of the original information is lost. As a consequence, extracted information becomes less reliable because the original intent and meaning aren’t visible. Extracted information is more likely to be misleading outside its original context.
When AI platforms encounter conflicting information, they are unable to reconcile explanations from alternative sources. Rather than acknowledging that different sources may address different aspects of a common yet complex issue, bots treat them as having different opinions about a simple, straightforward one.
Misinformation is a systemic problem for AI platforms. They depend on outside sources of information they don’t control. Critically, misinformation is not a technical problem that can be solved with further research. It is quality problem arising from the business model of AI platforms.
Misinformation is always present in online content owing to the ambiguity of human communication. People know to read online text carefully to ensure they aren’t misled. But bots harvesting online content lack that judgment.
AI platforms try to control the quality information inputs by various means, such as favoring certain sources of information or looking for signals about the agreement among sources.
However, these techniques are limited to popular topics with high volumes of online content. They are subject to sampling biases (what content bot crawlers can access) and confirmation biases (what information is most popular, regardless of its accuracy).
Bots are especially unreliable when dealing with the “long tail” of online information, where users need very specific answers to specific questions.
Bot misinformation is the byproduct of the incentives that drive how online information is developed and exploited. AI platforms feed off of online content, which for them is cost-free. Apart from rare cases, AI platforms are unwilling to pay for information they harvest. As commercial incentives to develop reliable information disappear, less reliable online sources are used.
AI platforms promote the illusion that their bots create the information, when in reality, they are rarely the original source. AI platforms treat the marginal cost to produce information as zero – it costs nothing to develop information because it is free to them. Unsurprisingly, the information they exploit (especially user-generated content) is often not maintained, because the original sources have no incentives to keep the content up-to-date as third-party platforms exploit their information.
The quality-deficiencies of information used by AI platforms is baked into their model. As much as bots want to tweak their algorithms to find and use higher quality information, they won’t be able to when people have little incentive to contribute quality information online.
– Michael Andrews