Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors

Szeyi Chan,Shihan Fu,Jiachen Li,Bingsheng Yao,Smit Desai,Mirjana Prpa,Dakuo Wang
2024-09-03
Abstract:Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for continuous interaction. We observed that users show both verbal and nonverbal behaviors, though they know that the LLM-VA can not capture those nonverbal signals. Despite the prevalence of nonverbal behavior in human-human communication, there is no established analytical methodology or framework for exploring it in human-VA interactions. After analyzing 3 hours and 39 minutes of video recordings, we developed an analytical framework with three dimensions: 1) behavior characteristics, including both verbal and nonverbal behaviors, 2) interaction stages--exploration, conflict, and integration--that illustrate the progression of user interactions, and 3) stage transition throughout the task. This analytical framework identifies key verbal and nonverbal behaviors that provide a foundation for future research and practical applications in optimizing human and LLM-VA interactions.
Human-Computer Interaction
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to explore the behavioral characteristics of users when interacting with voice assistants based on large language models (LLM-VA) for complex tasks and proposes an analytical framework to study users' verbal and non-verbal behaviors. Specifically, the paper analyzes users' verbal and non-verbal behaviors during interactions with LLM-VA in a cooking task scenario and develops an analytical framework with three dimensions: 1. **Behavioral Characteristics**: Including verbal behaviors (such as voice interactions, inquiries, etc.) and non-verbal behaviors (such as eye contact, gestures, changes in tone, etc.). 2. **Interaction Stages**: Exploration, Conflict, and Integration stages, describing the progression of user interactions. 3. **Stage Transitions**: Studying the dynamic changes between stages, such as the transition from the Conflict stage to the Integration stage or regression from the Integration stage back to the Conflict stage. Through this framework, the paper hopes to provide a theoretical foundation and support for future optimization of human-LLM-VA interactions. The study also emphasizes the importance of non-verbal behaviors in human-computer interaction and validates the framework's effectiveness through actual data (3 hours and 39 minutes of video recordings). Ultimately, the goal of the paper is to improve the design of LLM-VA by understanding and analyzing these dynamic changes, making them more natural and efficient in serving users.