Abstract:Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and discussion) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are as follows: Currently, voice assistants (VAs) rely on traditional language models for interaction, and these models have deficiencies in understanding user intentions and maintaining coherent multi - turn conversations. With the development of large - language models (LLMs), they perform excellently in text generation and context understanding, but are mainly designed for text - based interactions. Therefore, it is unclear how the interaction between users and LLM - driven voice assistants will evolve when the interaction mode changes from text to voice. Specifically, the paper aims to explore the following two questions: 1. **New interaction modes**: When users interact with LLM - driven voice assistants, will new and unique interaction modes different from single - turn inquiries emerge? 2. **Reducing errors and conversation interruptions**: Can the context - understanding ability of LLMs help reduce the errors and conversation interruptions common in current commercial voice assistants? To answer these questions, researchers conducted an exploratory study. By having participants (N = 20) use a ChatGPT - driven voice assistant to complete tasks in three different scenarios (medical self - diagnosis, creative planning, and discussion), they observed the users' interaction patterns and possible conversation interruptions. ### Research background Traditional voice assistants such as Alexa and Siri rely on traditional language models and mainly use a rule - based keyword recognition mechanism to determine user intentions. This makes it difficult for them to maintain coherent multi - turn conversations and they are vulnerable to inevitable errors (such as transcription errors and intention recognition errors). In contrast, large - language models (LLMs) have the ability to generate coherent and context - aware text and can show great potential in various text - centered applications, such as healthcare, education, and collaborative writing. However, empirical research on the interaction between users and LLM - driven voice assistants is still limited. ### Research method Researchers first integrated ChatGPT into Alexa skills and designed a conversation framework to handle ChatGPT API latency and Alexa timeout issues. Then, they conducted an exploratory qualitative study, having 20 participants interact with this ChatGPT - driven voice assistant. The tasks included medical self - diagnosis, creative travel planning, and discussing with opinionated AI. Through thematic analysis, researchers discovered common and scenario - specific interaction patterns. ### Main contributions 1. **Interaction patterns**: Demonstrated the diverse interaction patterns of people with LLM - driven voice assistants in different scenarios and presented the conversation recovery patterns initiated by voice assistants and users. 2. **Opportunities and challenges**: Discussed the advantages (such as context retention, adaptability, and reduction of conversation interruptions) and limitations (such as repetitiveness, over - sharing, and differences in mental models) of LLM - driven voice assistants. 3. **Design guidelines**: Provided design guidelines for adapting text - centered LLMs to voice interactions, such as adopting a hierarchical response structure, redesigning voice assistant prompts, and balancing advantages and challenges. Through this study, the authors hope to provide valuable insights for understanding and improving future LLM - driven voice assistants.

User Interaction Patterns and Breakdowns in Conversing with LLM-Powered Voice Assistants

Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors

Understanding User Experience in Large Language Model Interactions

Situated Understanding of Errors in Older Adults' Interactions with Voice Assistants: A Month-Long, In-Home Study

Task Supportive and Personalized Human-Large Language Model Interaction: A User Study

Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants

Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

Enabling Conversational Interaction with Mobile UI using Large Language Models

Enhancing user experience and trust in advanced LLM-based conversational agents

Intelligent Virtual Assistants with LLM-based Process Automation

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Towards Proactive Interactions for In-Vehicle Conversational Assistants Utilizing Large Language Models

Enhancing Pipeline-Based Conversational Agents with Large Language Models

"Mango Mango, How to Let The Lettuce Dry Without A Spinner?'': Exploring User Perceptions of Using An LLM-Based Conversational Assistant Toward Cooking Partner

Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond.

A General-Purpose Device for Interaction with LLMs

Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground

Ethical Challenges in the Development of Virtual Assistants Powered by Large Language Models

Model-Enhanced LLM-Driven VUI Testing of VPA Apps