Categories
Misinformation and AI

Misinformation in AI platforms is a systemic problem

As people become more familiar with AI bots, they are recognizing how vulnerable these bots are to failure.  

AI hallucinations have triggered a moral panic. We’ve read alarming stories of AI bots instructing people to cook a pizza using glue as an ingredient. AI gibberish can be dangerous if not controlled properly. How can we rein in the bots?

Feedback alone won’t solve AI misinformation

Misinformation is a broader problem than nonsensical information. But the preoccupation with AI nonsense encourages us to view bot shortcomings as behavioral problems. Bots need fixing or need better coaching.

Some countermeasures to AI quality problems center on better prompts, such as warning bots not to suggest doing anything dangerous. Other approaches attempt to reform bots.

If chatbots exhibit undesirable behaviors, we should train them to behave properly.  Researchers approach chatbot vulnerabilities as if disciplining a naughty child. Somehow, tweaking the feedback a chatbot gets will stop its bad behavior. 

Pigeons can be trained with feedback, and so too can chatbots, the thinking goes.  

The reeducation of chatbots may fix hallucinations, but that alone won’t make chatbots reliable. The weakness lies not in the bot itself, but in what it can ascertain about the vast quantity of details it needs to address. 

In principle, AI can learn to think correctly. It can be guided by more precise instructions and get reinforcement for its conclusions.  Hullicinations are a technical problem undergoing active research.  

But feedback won’t solve chatbots’ biggest vulnerability: their susceptibility to promoting misinformation by offering authoritative-sounding answers to questions that require nuanced explanation. 

The most common problem isn’t that the bot is making up answers but that it is distorting them. While hallucinations may be fixed over time, distortions in answers seem resistant to fixing. 

False information can be rooted out.  For example, prompt instructions can tell bots to avoid guessing and to explain their rationale. However, misinformation is harder to spot because it is not inherently wrong, but rather potentially misleading in many circumstances. 

Origins of misinformation

Bots often get some details right but other details wrong, and consequently, it’s hard to know how accurate or complete an answer is. The bot’s answer sounds plausible, seemingly grounded in fact, but it may omit information or skew toward generic answers that fail to account for critical nuances. Bots act overconfidently and are prone to distorting information. 

Most chatbots rely on text that’s scraped from sources they don’t control. OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and similar AI platforms take online content that was developed by others.  That content was not developed with chatbots in mind.  Chatbots are an unintended audience. 

Online content that’s scraped by bots was intended for human readers on a specific website, who could see the context of the text’s intent and meaning.  

The scraped text that’s been fed into a chatbot has been stripped of its original context. As a consequence, chatbots are inherently prone to misunderstanding the sources they scrape. 

Third-party AI platforms rely almost exclusively on crawled information from other sources they haven’t vetted for accuracy or timeliness.  Harvesting online content created by others is their business model. Their value is providing aggregation and convenience. 

Although AI web crawling is similar to search indexing, AI platforms combine information from various sources instead of collecting a list of links to them.  Distinctions among the sources disappear.  The context of the original information is lost. As a consequence, extracted information becomes less reliable because the original intent and meaning aren’t visible. Extracted information is more likely to be misleading outside its original context.

When AI platforms encounter conflicting information, they are unable to reconcile explanations from alternative sources.  Rather than acknowledging that different sources may address different aspects of a common yet complex issue, bots treat them as having different opinions about a simple, straightforward one.

Misinformation is a systemic problem for AI platforms. They depend on outside sources of information they don’t control. Critically, misinformation is not a technical problem that can be solved with further research. It is quality problem arising from the business model of AI platforms.

Misinformation is always present in online content owing to the ambiguity of human communication. People know to read online text carefully to ensure they aren’t misled.  But bots harvesting online content lack that judgment.

AI platforms try to control the quality information inputs by various means, such as favoring certain sources of information or looking for signals about the agreement among sources.

However, these techniques are limited to popular topics with high volumes of online content. They are subject to sampling biases (what content bot crawlers can access) and confirmation biases (what information is most popular, regardless of its accuracy).  

Bots are especially unreliable when dealing with the “long tail” of online information, where users need very specific answers to specific questions.

Bot misinformation is the byproduct of the incentives that drive how online information is developed and exploited. AI platforms feed off of online content, which for them is cost-free. Apart from rare cases, AI platforms are unwilling to pay for information they harvest.  As commercial incentives to develop reliable information disappear, less reliable online sources are used. 

AI platforms promote the illusion that their bots create the information, when in reality, they are rarely the original source.  AI platforms treat the marginal cost to produce information as zero – it costs nothing to develop information because it is free to them. Unsurprisingly, the information they exploit (especially user-generated content) is often not maintained, because the original sources have no incentives to keep the content up-to-date as third-party platforms exploit their information.

The quality-deficiencies of information used by AI platforms is baked into their model.  As much as bots want to tweak their algorithms to find and use higher quality information, they won’t be able to when people have little incentive to contribute quality information online.

– Michael Andrews

Categories
Misinformation and AI

How everyday misinformation pervades online content

Online misinformation is a hidden problem.  We tend to be on guard for disinformation, outright lies such as fake reviews. Because disinformation intends to mislead, we treat it as dangerous. We learn to avoid sources of disinformation. 

Misinformation is more subtle. The content wasn’t created to deceive, but it still misleads, due to its ambiguity. 

Anyone can post misleading information online, even if they don’t intend to. Misinformation is not a problem of bad actors. We can’t divide information between misleading and non-misleading sources, because most sources can mislead someone at some point.  

Content can be misleading without being deceptive.  Human communication is inherently ambiguous, and once statements are posted online, it’s hard to tighten up the meaning of what’s already been said.

Readers are expected to exercise judgment and care before taking any statement at face value. They ask themselves what details have not been mentioned, or might have changed? They verify the content’s credibility and relevance from their personal perspective. 

We evaluate online information as a matter of habit.  We sift through online content looking for content that seems relevant and reliable. We focus on finding content relevant to our need and skip over content that seems off target. 

People online are jaded: they scan answers with a skeptical eye and filter out misleading information.  As a consequence, they typically won’t notice how much of the content online is potentially misleading. They encounter misleading content only when they get a “bum steer” from Google, telling them something is relevant when it isn’t, or else find that no online sources seem to answer the question they have.  

Red flags in online content

On a recent vacation, I encountered how common misinformation is online. I was visiting Romania for the first time, and my guidebooks didn’t cover many of the details I needed to know. 

Vacations are full of microdecisions: where to eat, which place to go to, and how to get from point A to point B. Each of these decisions depends on reliable information. The traveler’s primary goal is to find factual information to assess, rather than sample subjective opinions.  

Much online travel advice comes from crowd-sourced information, such as traveler’s forums or reviews. Other information is aggregated from mix of sources, such as reservation consolidators.  Unknown sources often supply the facts about schedules, options, or feasibility. Travelers need to check various sources to determine what information and advice is “best” for them. However, even hyper-vigilance and cross-checking data do not ensure reliability.

Travelers start with a question, such as, Can they visit a venue tomorrow?  On the surface, it is a simple question of the hours during which the venue is open.  But in practice, the question involves assumptions that only the opening hours are relevant.  I discovered that opening hours may be misleading.  Bots list venue opening hours, but include some venues that were out of business or temporarily closed for renovations or relocation.  They won’t disclose hidden availability factors, such as when access is unavailable during peak periods that are not mentioned.  Venue opening hours can be misleading. 

Another situation I faced was how to get to a neighboring town.  Google displayed bus routes that were no longer in operation. It apparently lifted the timetables from another timetable aggregator website, which had not updated them. The information may have been accurate at some point, but it is misleading now. 

I searched online for how to pay for city buses. While each town I visited had broadly similar buses, the payment process varied, involving either tickets and passes, transit cards, credit cards, and/or apps. Yet, online answers did not note any of these differences. Instead, Google generated generic answers that weren’t necessarily accurate for the town in question.  Generic answers proved misleading.

Another case of wrong answers arose from restaurant searches. I was searching for vegetarian restaurant options, but Google was giving me many “bum steers”.

My searches often yielded false positives, where the information suggested something available, but it wasn’t.  Both searches and chatbots rely on keywords that can serve as false flags.  An online reviewer might mention that a restaurant had no vegetarian options, but chatbots present the restaurant as a vegetarian-friendly option. 

Yet, false negatives – misleading indications that nothing is available – are equally a problem.  Other restaurants with many vegetarian options remain invisible, as they have not been discussed online by reviewers.  When crowd-sourced content remains silent about something that exists in the real world, it highlights the limitations of crowd-sourced information in staying current.  

Crowd-sourced content can be misleading because it may contain outdated information or fail to incorporate recent developments.   It fails to keep pace with the realities on the ground.  

But why highlight online misinformation if it isn’t a new problem? It’s because the human review is increasingly bypassed.  Humans can spot misleading information, but AI bots can’t.  AI bots don’t understand the ambiguities within information that might cause it to be misleading. It doesn’t understand the context of the information it draws from. 

AI agents promise to offload numerous decisions from customers.  Travel is one such domain that AI agents promise to simplify and streamline for travellers. No longer will travelers need to worry about pesky details – the bot will take care of them.

Yet, rather than liberate from the chore of chasing down information, AI bots impose a risk on us that the information is incomplete and misleading. And when we can’t rely on the information, we face even more work trying to verify or augment what bots tell us. 

Online misinformation is a larger problem than is generally recognized. It presents a big risk to the performance of AI bots. 

 – Michael Andrews