One Agent Too Many: User Perspectives on Approaches to Multi-agent Conversational AI

Christopher Clarke,Karthik Krishnamurthy,Walter Talamonti,Yiping Kang,Lingjia Tang,Jason Mars
2024-01-14
Abstract:Conversational agents have been gaining increasing popularity in recent years. Influenced by the widespread adoption of task-oriented agents such as Apple Siri and Amazon Alexa, these agents are being deployed into various applications to enhance user experience. Although these agents promote "ask me anything" functionality, they are typically built to focus on a single or finite set of expertise. Given that complex tasks often require more than one expertise, this results in the users needing to learn and adopt multiple agents. One approach to alleviate this is to abstract the orchestration of agents in the background. However, this removes the option of choice and flexibility, potentially harming the ability to complete tasks. In this paper, we explore these different interaction experiences (one agent for all) vs (user choice of agents) for conversational AI. We design prototypes for each, systematically evaluating their ability to facilitate task completion. Through a series of conducted user studies, we show that users have a significant preference for abstracting agent orchestration in both system usability and system performance. Additionally, we demonstrate that this mode of interaction is able to provide quality responses that are rated within 1% of human-selected answers.
Human-Computer Interaction,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the optimization of user interaction experience in multi - agent conversational AI systems. Specifically, the author explores the user interaction experiences in two different modes: one is that a single agent handles all tasks (one agent for all), and the other is that the user selects a specific agent to handle tasks (user choice of agents). With the wide deployment of conversational agents (such as Apple Siri, Google Assistant, and Amazon Alexa) in various application fields, users often need to learn and use multiple agents when completing complex tasks, which brings the problems of cognitive burden and decision - making overload. Therefore, the research aims to determine which mode can better support user interaction and provide the best overall performance by designing and evaluating two prototype systems - "One For All" and "Agent Select". To achieve this goal, the author has carried out the following work: 1. **Design and Implementation**: Developed two conversational agent prototypes, namely One For All and Agent Select. One For All distributes the user's query to multiple agents and selects the best response according to semantic relevance; while Agent Select allows the user to select a specific agent to handle their query. 2. **User Experience Research**: Through a series of user studies, 19 participants were recruited to use these two prototype systems in different task scenarios, and the user feedback on system usability and performance was collected. 3. **Data Analysis**: Analyzed the data of user studies, including system usability score (SUS), task - completion accuracy, and user satisfaction and other indicators, to evaluate the advantages and disadvantages of the two interaction modes. The research shows that users significantly prefer to use the single - agent interface (One For All), which scores higher in terms of system usability and system performance, and the quality of the responses it provides is close to the level of human - selected answers (with an error within 1%). In addition, the research also reveals several key challenges in designing and deploying multi - agent interaction systems and puts forward improvement suggestions. In conclusion, the main contribution of this paper lies in verifying the superiority of the single - agent parsing mode in multi - agent conversation systems through empirical research, and providing valuable insights and suggestions for future research and practical applications.