Abstract:This paper presents the design and evaluation of a novel multi-level LLM interface for supermarket robots to assist customers. The proposed interface allows customers to convey their needs through both generic and specific queries. While state-of-the-art systems like OpenAI's GPTs are highly adaptable and easy to build and deploy, they still face challenges such as increased response times and limitations in strategic control of the underlying model for tailored use-case and cost optimization. Driven by the goal of developing faster and more efficient conversational agents, this paper advocates for using multiple smaller, specialized LLMs fine-tuned to handle different user queries based on their specificity and user intent. We compare this approach to a specialized GPT model powered by GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) and qualitative participant feedback in a counterbalanced within-subjects experiment. Our findings show that our multi-LLM chatbot architecture outperformed the benchmarked GPT model across all 13 measured criteria, with statistically significant improvements in four key areas: performance, user satisfaction, user-agent partnership, and self-image enhancement. The paper also presents a method for supermarket robot navigation by mapping the final chatbot response to correct shelf numbers, enabling the robot to sequentially navigate towards the respective products, after which lower-level robot perception, control, and planning can be used for automated object retrieval. We hope this work encourages more efforts into using multiple, specialized smaller models instead of relying on a single powerful, but more expensive and slower model.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to design and evaluate a new multi-tiered large language model (LLM) interface to enhance the customer interaction capabilities of supermarket robots. Specifically, the paper attempts to address the following issues: 1. **Response Time and Performance Optimization**: Existing advanced systems like OpenAI's GPT, while easy to build and deploy, have limitations in response time, cost optimization, and strategic control for specific use cases. The paper proposes using multiple smaller, specialized LLMs to handle different types of user queries to improve response speed and efficiency. 2. **Diversity and Complexity of User Intentions**: Supermarket robots need to handle a variety of user needs, from simple product inquiries to complex high-level intentions (such as recommending dinner menus or items needed for a party). Existing single powerful models may experience hallucinations and errors when dealing with such diversity, affecting user trust. 3. **User Experience and Satisfaction**: The paper evaluates the performance of the multi-tiered LLM architecture versus a customized model based on GPT-4 Turbo in terms of user satisfaction, user-agent partnership, and self-image enhancement to verify the effectiveness of the new approach. 4. **Navigation and Automated Task Execution**: The paper also proposes a method to map the final chatbot responses to the correct shelf numbers, enabling the robot to sequentially navigate to the corresponding product locations and use low-level robot perception, control, and planning for automated object retrieval. ### Main Contributions - **Multi-tiered LLM Architecture**: Designed a multi-tiered LLM architecture that improves system response speed and efficiency by using multiple specialized LLMs to handle different types of user queries. - **Experimental Evaluation**: Conducted comparative experiments using the Artificial Social Agent Questionnaire (ASAQ) and qualitative feedback to verify the superiority of the multi-tiered LLM architecture across multiple evaluation metrics. - **Navigation and Automation**: Proposed a method that combines chatbot responses with robot navigation to achieve automated task execution. ### Research Question The main research question of the paper is: How does our novel multi-tiered LLM conversational agent perform in the Artificial Social Agent Questionnaire (ASAQ) and qualitative evaluations compared to a customized advanced GPT model?

Enhancing Supermarket Robot Interaction: A Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents

Teaching Machines to Converse

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

Simulating User Agents for Embodied Conversational-AI

Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop

Large Language Models in Consumer Electronic Retail Industry: An AI Product Advisor

GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task Assistants

Reflective Dialogues with a Humanoid Robot Integrated with an LLM and a Curated NLU System for Positive Behavioral Change in Older Adults

CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Comparative Analysis of Generic and Fine-Tuned Large Language Models for Conversational Agent Systems

Experimental Evaluation of Machine Learning Models for Goal-oriented Customer Service Chatbot with Pipeline Architecture

Purrfessor: A Fine-tuned Multimodal LLaVA Diet Health Chatbot

Combining Ontological Knowledge and Large Language Model for User-Friendly Service Robots

Next‐generation human‐robot interaction with ChatGPT and robot operating system

A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP

Building a hospitable and reliable dialogue system for android robots: a scenario-based approach with large language models

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System

The Conversation is the Command: Interacting with Real-World Autonomous Robot Through Natural Language

Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses