Categories
User centered AI

Why is using AI so exhausting? 

Users of AI tools can have conflicted feelings. AI can at once seem seductive and enticing, while feeling draining and dispiriting. While the convenience of AI is obvious, its effects on user energy are not. This post explores the subtle dynamics behind why using AI can be draining. 

I want to explore two mysteries surrounding the use of AI:

  • If AI is supposed to make work easier, why does it end up exhausting us?
  • Why do people of diverse backgrounds find using AI exhausting?

AI offers unprecedented convenience for users to do tasks far beyond what they’d be capable of doing by themselves.  Yet it‘s a mistake to equate convenience with ease. Using AI is not always easy, especially when the user has specific goals in mind. 

Even though exhaustion does not affect every user or happen all the time, it is nonetheless widespread and endemic. Despite its stated promise, AI too often fails to empower users. 

The topic of “AI exhaustion” or “AI fatigue” is actively discussed online, in forums, blogs, and research papers.  Advice columns offer tips on how to combat burnout from using AI (take breaks!). There are debates about the consequences of “overeliance” on AI. The Boston Consulting Group coined the phrase “AI brain fry” to describe the mental fatigue “from excessive use or oversight of AI tools beyond one’s cognitive capacity”.  

Large-scale studies involving tens of thousands of users over several years indicate the problem of exhaustion is genuine and not a media vibe. Yet the commentary about AI fatigue offers surprisingly little substantive analysis of why it occurs. Instead, commentators focus on people’s readiness to use AI tools. 

A representative article on AI fatigue, this one focused on coders. Screen cap: Scientific American

Whether AI seems fatiguing depends on the task, not on the user.  For some tasks, AI is a real time saver. But all kinds of users, whether pro-AI or anti-AI, newbies or experts, can find AI exhausting. It depends on what AI is doing for users. 

AI task delegation works well for trivial (routine, non-complex) tasks. Moreover, AI outputs can seem impressive for tasks that far exceed the user’s abilities or expertise. But if either of those conditions isn’t met, then the process of using AI becomes fatiguing. AI outputs are never quite right, forcing the user to put in extra work. 

Vexingly, tasks that humans find intrinsically draining and tedious, due to their intricacy and need for oversight, are often ones for which AI is least reliable.  The more complex the task, and the more knowledgeable the user about the right answer, the more likely the user will find AI outputs unsatisfactory. These tasks can still be draining, even when using AI.

Between repetitive situations conducive to simple automation, and completely novel situations, where users have little idea how to get answers, are the everyday problems we expect Generative AI to help with. 

For everyday knowledge tasks, few users find using AI to be low effort. Those who extoll AI’s effortlessness tend to be indifferent to outcomes, cheerfully generating workslop.  

Workslop may feel effortless to create but exacts a toll on the organization. What a sender perceives as a loophole becomes a hole the recipient needs to dig out of.

Harvard Business Review

But even when fatigue is acknowledged as a byproduct of AI use, it is often dismissed as a transient problem. 

The misdiagnosis: fatigue is a training problem

Skeptics believe AI fatigue is nothing that a bit of training can’t overcome. The solution looks obvious until you notice that training advocates don’t agree on what kind of training is needed to solve the problem.  They also have different views on what’s required from AI users.

One remedy to AI fatigue might be called “train the bot.” Pundits complain that fatigue stems from people using AI incorrectly. Users are crafting prompts incorrectly or omitting some context.  Getting AI to behave is like taming a wild horse; there’s a secret to be gleaned through deep communion. Such blaming and shaming of the user makes the problem seem like a skills deficiency, rather than a systemic issue.

An alternative fix might be called “train the user.” This view holds that we get tired when using AI because we aren’t yet accustomed to it. Users must “build muscles” to get comfortable using new AI tools. The viewpoint embraces a bootcamp-like stoicism: when using AI, if there’s no pain, there’s no gain. Fatigue is normal: get over it.

Both perspectives are superficial. Even power users find AI tiring, so it’s difficult to attribute fatigue to training. 

The root cause of fatigue: the conflated roles of the AI ‘prosumer.’

AI differs from most technologies and work skills because it is rooted in a role conflict. Users play two roles simultaneously.

The user is never sure whether Generative AI acts as the subject doing the work or the object reflecting work that’s been done. 

AI users find themselves managing two sides at once: as producers of AI outputs and consumers of those outputs. Users are “prosumers”.

The prosumer role can be enervating because user attention is split between initiating and responding to outputs. Users are partially responsible for what’s created and must decide whether to use it. It’s not obvious when outputs are good enough to use, or whether additional work guiding the generation will result in more useful output to consume.  

Both producing outputs and consuming them can be sources of fatigue, so there’s never a state where the user can sit back and relax.  Instead, they must constantly switch between roles, each time encountering different stress factors.

Producing AI outputs

AI tools generate outputs, seemingly with little effort.  But while outputs are easy to generate, they are not necessarily useful. The gap between the attempt and the outcome defines the effort required.  

Generative AI is stochastic or probabilistic – more colloquially, its generation relies on “vibes”.  What you get will contain an element of surprise.  Users talk about vibe coding or vibe writing. In its most romantic formulation, the creation process is improvisational – users challenge the LLM to raise its game, and the LLM responds with a volley. Improvisation, however, requires cooperation between partners who can read each other’s actions and motives.  

Vibe coding is not the same experience as playing in a jazz combo. More often than not, the LLM seems to have a will of its own, and the user is fighting to contain it. LLMs don’t understand the user’s intent correctly, either underwhelming them with regurgitated material or presumptively moving in an unwanted direction. 

In most situations, users don’t want AI to improvise.  They want AI to deliver clear and certain outputs. But that’s not what Generative AI is designed to do. 

Users have expectations about how they want the final output. That expectation reflects their perceived ownership of what’s developed: does it look like something they did? Generative AI hijacks the user’s focus away from the creative expression of their expertise and imagination toward managing the bot’s generation.  The emphasis shifts from creation to administration.

AI has redefined knowledge work as coordinating, reviewing, and making decisions about bot actions and outputs. 

AI can make work robotic and difficult to pay attention to. AI is making many tasks boring for users. Users are less engaged as a result.

A researcher in Singapore notes that monitoring bots can be stressful: “When AI starts ‘acting for us’, humans do not necessarily get to rest. In many cases, we simply shift from ‘doing the work ourselves’ to ‘supervising AI doing the work’. And monitoring a system that may make mistakes or misunderstand instructions becomes a new psychological burden in itself.”

When I am coding now, I am mostly just an observer, not a creator anymore. I can see that even for the observer role, I might not be needed.

– Claude user, Anthropic survey

AI forces users to become quality control police. An Anthropic survey of users found “unreliability was the most common concern — 27% worry that AI won’t do what it’s supposed to.”

As one Claude user noted in the survey: “An assistant that sounds sure but is often wrong forces you to treat everything as suspect. Instead of freeing attention, it creates a permanent ‘fact-check tax.’”  

This vigilance is taxing and tiring. Siddhant Khare, an AI agent developer, notes: “For a perfectionist, this is torture. Because ‘almost right’ is worse than ‘completely wrong.’ Completely wrong, you throw away and start over. Almost right, you spend an hour tweaking. And tweaking AI output is uniquely frustrating because you’re fixing someone else’s design decisions – decisions that were made by a system that doesn’t share your taste, your context, or your standards.”

I ended up copying [AI] code and doing joyless trial-and-error fixes — it gave me real despair. A life you don’t direct is like watching a boring movie you can’t turn off. 

– Claude user, Anthropic survey

Unlike friends and colleagues, bots are unknown quantities. Khare observes: “The cruel irony is that AI-generated code requires more careful review than human-written code. When a colleague writes code, I know their patterns, their strengths, their blind spots. I can skim the parts I trust and focus on the parts I don’t. With AI, every line is suspect.”

Many users report that fixing AI errors is more tiring than doing the task themselves without AI.

AI use tends to intensify work. A comment on Cursor’s user forum notes: “The machines we now have can do stuff faster than we humans have evolved to handle. It’s going to be hard.”  Research studies back up that perception. 

In a study of AI users, Aruna Ranganathan and Xingqi Maggie Ye at the UC Berkeley Haas Business School concluded that “AI tools didn’t reduce work, they consistently intensified it.”  

Once the excitement of experimenting fades, workers can find that their workload has quietly grown and feel stretched from juggling everything that’s suddenly on their plate. That workload creep can in turn lead to cognitive fatigue, burnout, and weakened decision-making. The productivity surge enjoyed at the beginning can give way to lower quality work, turnover, and other problems.

– Ranganathan and Ye, Harvard Business Review

Ranganathan and Ye note several drivers of work intensification:

  • Task expansion: “Because AI can fill in gaps in knowledge, workers increasingly stepped into responsibilities that previously belonged to others.” 
  • More multitasking: “Many workers noted that they were doing more at once—and feeling more pressure—than before they used AI, even though the time savings from automation had ostensibly been meant to reduce such pressure.” 
  • Blurred boundaries between work and non-work: “Because AI made beginning a task so easy—it reduced the friction of facing a blank page or unknown starting point—workers slipped small amounts of work into moments that had previously been breaks.”

While the effects of work intensification are most acute among workers engaged in open-ended knowledge work, those who focus on closed-ended tasks also experience intensification.  These workers are most susceptible to replacement by AI because their tasks are more routine.  Their tasks have clear-cut goals and a clearly defined process.  The prompts they use will be simpler and may have even been developed by others.

Workers with closed-ended tasks can get more done in less time, which frees up time.  But a three-year study of these workers by ActivTrak also concluded that “AI does not reduce workloads.” The study concluded that even AI for routine work can be tiring because “the workday isn’t just shorter — it’s denser”:

  • Employees reported feeling “disengaged” – “a growing number are chronically under-challenged”
  • They use multiple AI tools and use standard software tools more intensively than before
  • Multitasking increased 
  • “Focused efficiency” (time doing work uninterrupted) dropped as AI adoption increased

AI can reset expectations for the time needed to do at work. When the pace of work accelerates and becomes the new norm, organizational expectations readjust to the faster pace. Workers feel pressure to maintain the new, elevated pace.

Consuming AI outputs

Everyone is now an AI consumer, whether they actively choose to be or not.  AI is not just for knowledge workers in the enterprise. 

AI tools have become pervasive online. Up to half of all online content is now AI-generated. Online users can suddenly find themselves to be AI consumers.  People who use AI in their personal lives face distinctive sources of AI fatigue. 

Facebook is the canary in the coal mine. Screen cap: EY

The first challenge is that the readability and appeal of text online are declining.  As AI tools generate more online content, the information is often more verbose.  People have to read “conversational” content that’s manadering and short on concrete substance.  

EY notes: “The digital landscape is showing noticeable signs of content fatigue. Readers are increasingly quick to spot AI-generated content at first sight by its predictable patterns and overly polished tone.” 

Empty AI-generated content is crowding out useful human-created content online. While some AI tools can produce acceptable content, many do not.  Consumers don’t have a say in how the content they encounter is generated. 

In addition to being fed more AI content, consumers find themselves taking on more “chores” because of AI.  These chores involve tasks that were previously done by outside professionals. AI is delegating work to end users. 

Consumers find themselves relying on AI tools to bypass expensive experts or because firms they do business with require them to use AI as part of self-service. 

“A.I. is now extending the chore economy into territory that once required years of training,” notes Oxford economist Carl Benedikt Frey.

The A.I. revolution involves a huge transfer of labor — not from worker to machine but from worker to consumer. The ability to do everything ourselves may be satisfying, but it can gradually overload us with busywork without our noticing. Tasks that we used to delegate will still be done. They will simply move out of the work force and into the household as new forms of invisible, unpaid labor.

— Carl Benedikt Frey

Frey describes what he calls “the A.I. trade-off: greater access but thinner expertise.” AI tools are widely available, but are far from perfect substitute for professional help.

He notes how elective task creep can gradually consume more of the user’s time.  “No single act of self-service feels like a major burden. We notice the accountant’s fee we didn’t pay. We rarely notice the evening we spent doing her job. There is a name for this: opportunity cost neglect — the well-documented tendency to overlook the value of what we give up when the cost is time rather than money.”

AI tools enter homes by making the same promise they did in the enterprise: they will save users time.  But the reality is that AI tools can suck up more time. 

Sources of AI fatigue

The permutations of AI fatigue can be gleaned by looking at power users. 

A study published last year in Annals of Neurosciences noted that “long-term AI use was significantly associated with mental exhaustion, attention strain, and information overload (r = 0.905), and inversely associated with decision-making self-confidence (r = −0.360).” 

The study suggests that negative effects emerge through prolonged AI use. For short-term AI use, users realize benefits. As a result of this asymmetrical relationship, users often start with a positive experience interacting with AI, and as they come to rely on it, the negative effects become more dominant.

Source: Annals of Neurosciences

The study shows the correlation between mental exhaustion and more specific cognitive factors.  Mental exhaustion is an umbrella term that encompasses different qualities that can sound alike but have distinct properties.

We can think about the effects of AI in terms of what users need to notice, remember, and decide.  It’s often not apparent how AI makes stealth mental demands on the user.  AI doesn’t explicitly ask the user to notice, recall, or decide something.  But the user is forced to do these tasks to ensure AI outputs match their needs.

AI reorients the object of attention toward more fragmentary data. Whereas in the past, knowledge workers were synthesizers of information, now AI is responsible for framing the bigger picture (the meaning), leaving users to attend to the details. 

AI narrows the scope of what users focus on. Users must judge the correctness of fragments of information, often without the benefit of knowing the context from which these fragments were derived. The details in the outputs have been decontextualized.

If the user is unsure of the appropriateness of a detail, they must reconstruct from whence it came.  It might have been derived from a myriad of sources or initiatives.  Reviewers of AI work lack a social context associated with human-developed work. The reviewer no longer knows who was responsible for the information or what their motivations were, and is unable to ask that person for clarification.  The user works in isolation within an opaque system, where information sources and provenance are unclear. 

Paul Leonardi at UC Santa Barbara identifies the problem of inference fatigue, “the mental work of constantly interpreting ambiguous digital communication.”  He notes: “ We get a snippet of data, and it’s not quite enough to tell us the whole picture. So, we have to fill in the blanks, and that takes effort.”  AI outputs are often vague and general, with essential context missing.

AI reshapes the nature of attention, forcing users to focus on more issues of lesser consequence. AI can generate huge quantities of information about details that may be far removed from the user’s immediate knowledge. Because AI increases the scope of issues that users can address, it makes users custodians of more and more details. Users may not care much about these details in terms of the intrinsic interest they offer them as individuals, yet they find themselves responsible for overseeing an increasing number of machine-generated factual assertions. Users find they have been delegated ownership of results for issues they didn’t choose.

Knowledge work switches from realizing a state of flow – defining what matters and how to achieve it – to maintaining vigilance over a machine. The user’s job is not creation, but the coordination of AI outputs.

Vigilance – monitoring your work in real-time – is exhausting.  Many knowledge workers discovered how tiring being continually visible online can be after experiencing Zoom fatigue.  In the case of AI, users are monitoring AI outputs they feel responsible for checking before they move forward.  

Users must monitor AI outputs to check whether any included details shouldn’t be there and determine what’s missing that should be there.  They must be on guard for presumptive insertions of statements or decisions by AI that go beyond what was intended or even allowed.

The user is thrust into a reactive mode, having to keep track of multiple details that were not responsible for developing. 

AI reshapes the focus of decisions toward quality checks. Knowledge workers now focus on monitoring and evaluating outputs, changing the kinds of decisions they make.  

Bots now set the pace of work, where outputs demand constant evaluation, which can lead to cognitive fatigue: thinking continually without rest. The user must understand what the stream of AI outputs means. But that thinking is generally not deep thinking about meaningful issues.  It is shallower.

The emphasis shifts to fixing the outputs, which requires numerous small decisions (microdecisions) that can cumulatively lead to decision fatigue. There’s no opportunity to set aside the decision to sleep on it, because the output needs immediate fixing in order to move forward.

A neurological journal described the focus on fixing AI outputs as resulting in “attenuated user agency” – a fancy, formal way of saying the user doesn’t feel they are in charge of the process.

Patterns of AI burden shifting

AI is driving us to do more work. 

Granted, AI can be a helpful tool, but it is a flawed helper.  AI-related stress can be traced to the need to compensate for its deficiencies.

Let’s return to the most obvious case of when AI isn’t helpful: workslop. 

Workslop uniquely uses machines to offload cognitive work to another human being. When coworkers receive workslop, they are often required to take on the burden of decoding the content, inferring missed or false context.

– Kate Niederhoffer et al, Harvard Business Review

Workslop is an example of shifting the burden from one user to another. 

All AI use involves shifting burdens to individuals to some degree.  We may not be cognizant that burdens have been dumped on us, only that the task seems more tiring than it should be.

We can see three patterns for how AI shifts the burden to individuals to decipher and compensate for bot outputs:

  • From one person who uses AI to another human receiving their output
  • From an AI bot to an AI user
  • From an organization to its customer via an AI self-service bot

In each of these cases, the party that’s expecting the output to be read evades responsibility for ensuring that the output is usable and useful. 

With workslop, the initiator ducks responsibility for the AI’s limitations and passes the problem to the consumer. 

Bots evade responsibility by tasking humans for feedback. Bots aren’t independent; they are (metaphorically) codependent on the human user.  

AI consumers are still far from experiencing reliable outputs from a single declarative prompt.  They must invest in iterative prompts or develop elaborate instructions. 

Organizations sidestep responsibility by forcing customers to use bots to do things themselves.

 These burdens deserve a cost accounting. 

Measuring the mental tokens of AI use

AI is turning knowledge tasks into factory operations. Rather than prioritizing the quality of knowledge generated (the best ideas), AI’s focus is on the efficiency of its generation (how cheaply decisions can be rendered). 

With Generative AI, outputs trump outcomes. That bias will dominate until AI can learn on its own without needing humans to fix and improve what it does. 

It’s hard for humans to measure the quality of AI outputs at scale. So, people tend to focus on what they can easily see and measure: the quantity of those outputs. 

As AI gets more complex, organizations are turning attention to AI costs.  The nascent field of tokenomics seeks to measure the processing effort required to produce AI outputs.  AI token usage can be a crude proxy for the effort humans must expend to produce the right outputs, since token usage reflects how AI instructions are processed.

Token processing consumes lots of energy. Data centers need additional power plants to support all the work being generated.

But it’s equally important to monitor the human energy that goes into AI outputs.  We need to recognize and weigh the tokens of human mental energy that AI devours.  

If human knowledge, motivation, and talent are wasted on checking and fixing AI outputs, that energy isn’t available to develop new knowledge. 

AI engineering fails to recognize an essential truth: Knowledge work is not a time management problem. Saving time does not lead to better outcomes. 

As long as humans must stay in the loop, they need to conserve their energy so they can create original knowledge.  Agency – being in charge of AI processes rather than being defined by them –  starts with awareness of how one expends energy.

– Michael Andrews

Categories
Content Experience User centered AI

Audio AI is hacking your brain

In previous posts, I have explored the behavioral dynamics of text-centric chatbots. Yet AI is also changing the way people relate to audio. The impact of audio AI on user experience is, in many ways, even more profound than textual AI.

Listening to audio is more passive than either reading text or watching videos. It can be less obvious how audio content influences user behavior, because clicks and eye movements aren’t involved. 

AI audio generation uses LLMs in combination with text-to-speech and speech-to-speech technologies. It’s having a far-reaching impact on user experiences, influencing attention, emotions, and decision-making. 

The rising consumption of personalized audio content

People are binging on digital audio content.  Smartphone and computer users have become “pod people”, constantly listening to audio while plugged into headphones, lost in their own world. 

Audio can distract people and detach them from their environment. We encounter individuals helmeted with headphones, walking in front of moving vehicles, or unaware that the barista is asking them a question. They’ve become withdrawn, absorbed in some content whose topic or purpose we can only speculate about.

The Washington Post recently published an article on how listening to digital audio is becoming addictive

Screen cap: Washington Post

Much has been made of the addictive nature of screens and of screen time as a metric of technology overuse. But audio use, too, captures many people’s waking hours, yet it hasn’t been studied nearly as much. – Washington Post

People are listening more to audio and thinking less in the process. 

The Post described the situation where “if you listen to so much on-demand audio that you don’t want to stop.”

One interviewee observed that they no longer have thinking time: “That mental processing time — I don’t really have that now, because I have always got a podcast on.”

Audio has become more personalized and inward-looking. Public audio, such as listening to the radio, can be pro-social: it can be heard by others and be a shared experience. Multiple people hear and can discuss the same content.  Digital audio, by contrast, is consumed privately. People withdraw from the world by listening to their personal audio stream. The feed is more user-specific and reflective of individual habits.  It’s anti-social: saying something to an individual wearing buds interrupts and annoys them.  

Another person interviewed said that listening to audio “makes it a little less tolerable, I think, to be around actual people. The on-demand, no-expectations social engagement with these parasocial personalities — I think it’s just a little addicting.”

And the addition of AI to digital audio is accelerating this trend.

Generative speech synthesis: turn content into audio

Text-to-speech (TTS) synthesis has been around for decades, but in recent years  it has improved in quality with more life-like voices and prosody. Its implementation has been turbocharged through its combination with LLMs.

Rather than simply reading text aloud, AI adapts written texts for audio by making the content more conversational in format and tone.

The paradigm shift occurred in 2024 when Google introduced a tool called Deep Dive that allowed anyone to convert a web article into a podcast-like audio.  Now incorporated as the “Audio Overview” in NotebookLM, the tool provides a chatty summary featuring two voices seeming to discuss the contents of an article.

Other vendors have followed Google’s lead.  Examples include:

Huxe, built by the developers of GoogleLM, embodied the boldest vision of how AI audio will change user behavior. The app suggested that a lack of time or expertise would no longer be a barrier to consuming content. 

Screen cap: Huxe

 Huxe promised to revolutionize how users consumed content:

  • “We gave millions of people the ability to turn their documents into podcasts— to finally ‘read’ that 200-page report while walking their dog. Teachers, lawyers, students all said the same thing: ‘I’m finally keeping up.’”
  • “This is radio made for you.”
  • “Turn any curiosity into a personal podcast. Whether it’s ‘Why is everyone talking about this new social app?’ or ‘Tell me the history of that building I walk past every day,’ you get a personalized, clear, audio explanation.”

Spoiler: Huxe shut down abruptly as I was writing this post, less than a year after launch. Already, the field of audio AI apps has become crowded, and Huxe folded. 

AI audio’s possibilities are vast.  But before getting too excited by their potential, we should understand how it’s being adopted in practice.  Unfortunately, audio AI is being used indiscriminately.  The combination of ease of creation and anticipated demand has prompted a slurry of AI-generated audio.

Bloomberg discusses the growing phenomenon of “podslop”.  It describes “the modern era of podcasting in which thousands of new shows are released into the world every day, with a sizable portion likely being AI-generated.”

Screen cap: Bloomberg

Interactive audio: Speech-to-speech synthesis

Interactive audio is made possible by speech-to-speech (STS) technology. Instead of simply reading existing text to the user, STS allows the user to speak to ask questions and get audio replies. STS makes audio conversational. 

Conversing with early speech-to-speech bots, like Siri or Alexa, felt stilted.  The conversation wasn’t open-ended, as with a human conversation.  Earlier implementations required combining speech-to-text, dialog management, and text-to-speech, which introduced latency and limited the scope of what could be addressed. Generative AI makes the orchestration between user and bot more seamless. 

Recent speech-to-speech applications are changing how people interact with voice-based solutions.

Huxe in particular promised greater interactivity than offered by traditional streams:

  • Jump in anytime — ask, react, or go deeper as you listen.
  • The audio “listens back” – Interrupt and say “wait, explain that differently” or “give me more technical detail” or “actually, what about this other thing?”

Huxe’s interactivity seemed appealing, putting the user in control.  But to be successful, it needed to effectively explain the content domain and allow the user to control how they want information.  

Delivering a truly seamless experience is challenging in practice. Voicebots and conversational design have always been limited by predefined scripts.  Even generative AI relies on scripts: bots need to recognize the user’s intent accurately and rely on prototypical conversational patterns to do so. These can simulate a human conversation, but don’t supply the same range of freedom. 

A more realistic guide to how STS will emerge will be how it is implemented by large enterprises to support self-service. 

Microsoft says its Azure AI Voice Live API is “ideal for scenarios where voice-driven interactions improve user experience,” such as:

  • Customer support, product catalog navigation, and self-service solutions
  • Voice-enabled learning companions and virtual tutors for interactive training
  • Administrative queries and public service information
  • Voice-enabled tools for employee support, career development, and training

Microsoft’s vision for the next-gen STS doesn’t seem all that different from the previous generation that lacked AI.  Adding AI doesn’t redefine the customer’s relationship with the organization.  It’s still about saving the organization money, and avoiding having customers talk to a human. 

STS voicebots are likely to employ the same operating framework that enterprises use for other forms of customer experience: the funnel. The goal will be to get as many people as possible to a specific conversion event. Customers may be told they can have a conversation with the bot, but the direction of the bot’s conversation will be predetermined. 

AI and attention

AI summaries let people read less.  AI audio makes reading optional. And attention spans are short-circuiting. 

Because we can listen at times we can’t read, listening is taking up a larger share of our time.  Many people are binging on audio. This trend pre-dates AI, but AI is amplifying it manyfold.

AI exists to accelerate the pace of activity, to squeeze more activity into available time. It’s affecting how users pay attention.  

Audio AI removes user decisions about content consumption. The stream feed decides what merits your attention. A reinforcing feedback loop tends to narrow choices.  Agentic skills may force choices on users they might not otherwise make.

Audio AI lulls users into not paying close attention.  Listening typically requires less effort than reading and can result in lower retention. Unlike audio, reading allows for previewing and reviewing text, broadening reflection on the content. Choosing the least-effort mode can result in less cognitive engagement

But for important topics, audio AI can force users to maintain attention to an exhausting degree.  If the content is critical to the user, audio is often the wrong medium. While “sit back and listen” sounds relaxing, it can be tiring over long stretches when it involves important material. 

Audio doesn’t pause when the user encounters distractions. There’s no chance to re-read what was missed.  AI audio may lack the informational redundancy associated with natural human conversations

Audio AI crowds out free attention, leaving no time for reflection. The audio paces the listener. The audio’s autoplay makes it harder to pause to think about what’s being said. It’s unlike a classroom discussion where points are elaborated or discussed from various perspectives

Audio has become a new intrusion.  Previously, users might monitor their screen time.  Now, plugged into pods and earbuds (and even smart glasses), they get real-time audio notifications.

AI audio is a voice whispering inside your head.

Sonification of emotions 

Voicebots now have personalities. Their designers work hard to make sure they don’t sound robotic.

Screen cap: Speechify

Voice has been a longstanding alternative mode of interaction with computers, one often dreaded by users. It’s tempting to view the incorporation of AI into audio as yet another stage in the evolution of accessibility. Such a view would overlook the emotional dimension of audio.

TTS is no longer a utilitarian technology. Early in my career, I volunteered at the Royal National Institute for the Blind and got my first exposure to how visually impaired people use screen readers.  I observed the utilitarian character of the computer-synthesized voice.  Users would zip through synthesized utterances at a blazing speed, attentively waiting to hear what they were seeking.  To my ears, the utterances were too quick and high-pitched to understand easily.  But those who relied on screen readers did not use voice at a conversational pace.

When the goal of TTS screen readers was assistive technology, its focus was on offering parity between interaction modes (visual scanning and audio scanning).  Plain voices read functional text: menus, options, headings, text bodies. Accents exist to foster intelligibility, not likeability.

The screen reader experience has morphed into making voicebots companions.  The design goal now is for computers to have a conversation with you, and in so doing, persuade you.

I elevate your text into impactful speech with deep meaning. “People will forget your words, but they will always remember, how those forgotten words made them feel.” – AIPRM Voicebot

Voicebots are now as focused on conveying a feeling as they are on conveying information.

Few people like to feel they can be emotionally manipulated.  Yet synthetic voice offers invisible pathways for brands to manipulate customers.  Even if you dismiss the idea that AI voices can change how you feel, there are countless vendors who are betting otherwise.

How do synthetic voices in your head affect you?

The first dimension is relateability. The voice and tone of the content are more visceral in audio than in written text. Users often remark that voicebots have overly enthusiastic voices. These bots aim to foster trust. Synthetic actors can cultivate a false sense of intimacy. The avatar can spoof the personality of the content’s real author (if there is one).

The second dimension is behavior. Designers are just starting to explore how user behavior may be different when interacting with audio rather than a screen. Audio offers new opportunities for brands to nudge users. It becomes easier to prompt users to say or repeat a phrase to prime them toward taking a desired action. 

Voicebots are seeking to optimize user trust. How do voicebots earn trust?

Voicebots prioritize enthusiasm over dispassion and objectivity. Voicebots are known for enthusiasm that borders on cultishness. But compared with other user situations, user defenses are lower when listening to bots. 

In real-life encounters, excessive enthusiasm accompanied by exaggerated body language may make users wary. The same is true with stilted TV commercials that try too hard to compress emotions into 30 seconds.  

Voicebots rely only on voice, and not visuals or gestures, to convey messages. It offers fewer telltale signs that a message might be inauthentic. And it won’t necessarily do so in a short burst. Voice content can run for a long session and be repeated, so users acclimate themselves to the enthusiasm and accept it as normal. 

Voicebots optimize their likability and resonance. Chatbots have become surrogate friends for many users.  The more a voice sounds and talks like your alter ego, the more you are likely to trust what they say.

It doesn’t take much imagination to foresee the development of voicebots that are targeted by a user’s personality. How the bot sounds will reflect what a user likes. Yet the danger is that the user will trust the synthetic personality rather than assign trust based on the voicebot’s content.

Voicebots encourage users to surrender control by leading them. Voicebots position themselves as advisors and guides. The Microsoft Voice API, for example, promotes its ability to allow developers to build “learning companions.”

The promise of a coach is appealing. Language learning apps are an early example of how voicebot coaching can be both helpful and overbearing. Voicebots can provide directed feedback to users, correcting their mistakes. They can also become menacing nags, pestering users with passive-aggressive reminders when they fail to keep up with the bot’s schedule. 

Unlike language learning, most voicebot content won’t involve drilling questions with right-or-wrong answers.  Real-life topics will involve nuances where being told what to do might lead the user to the wrong choice. Audio AI lacks the contextual affordances of text, which allows users to explore background, dig into concepts not understood, compare different perspectives, and see alternative pathways. 

Over-listening to the cult of productivity

Voicebots want to become advisors and emotional companions, ready to tell us what we need to hear, when we need it. Unlike text chatbots, voicebots can be used hands-free and without looking at a screen.  

Voicebots seem like the answer to coping with busy lives. You can multitask, learning a new skill or doing an online chore while exercising, driving, walking your dog, or watching your kids at the playground.

Without question, many people have busy lives. But voicebots can perversely make people even busier, encouraging them to squeeze yet more into their daily routines. 

As the Washington Post article noted, overstimulation, whether from screens or audio, takes a toll on our judgment and mental health. People need mental quiet: “When we’re not involved in some kind of cognitive task, the part of the brain called the default mode network takes over, [which] helps us regulate our emotions, helps us make sense. It’s how we form internal narratives about ourselves.”  

Audio AI can supply ready-made narratives, but users need to form their own. 

– Michael Andrews