Bidirectional Human Interactive AI Framework for Social Robot Navigation

Tuba Girgin,Emre Girgin,Yigit Yildirim,Emre Ugur,Mehmet Haklidir
2024-05-04
Abstract:Trustworthiness is a crucial concept in the context of human-robot interaction. Cooperative robots must be transparent regarding their decision-making process, especially when operating in a human-oriented environment. This paper presents a comprehensive end-to-end framework aimed at fostering trustworthy bidirectional human-robot interaction in collaborative environments for the social navigation of mobile robots. In this framework, the robot communicates verbally while the human guides with gestures. Our method enables a mobile robot to predict the trajectory of people and adjust its route in a socially-aware manner. In case of conflict between human and robot decisions, detected through visual examination, the route is dynamically modified based on human preference while verbal communication is maintained. We present our pipeline, framework design, and preliminary experiments that form the foundation of our proposition.
Robotics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **to achieve reliable two - way human - machine interaction in a collaborative environment, especially for mobile robots in social navigation**. Specifically, the paper focuses on the following aspects: 1. **Enhancing trust**: In human - machine interaction (HRI), especially when robots operate in human environments, transparency and credibility are crucial. Collaborative robots need to be transparent about their decision - making processes to ensure human trust in robots. 2. **Two - way interaction in social navigation**: Many existing studies have overlooked the importance of two - way interaction and communication. This paper proposes a framework that allows robots to explain their actions through voice and adjust their paths according to human gesture feedback, thereby improving long - term cooperation and trust. 3. **Predicting and adjusting trajectories**: Robots can predict the trajectories of surrounding people and dynamically adjust their own paths according to the prediction results to avoid conflicts and adapt to social norms. If there are conflicts between human and robot decisions, robots will detect these conflicts through visual inspection and dynamically modify their routes according to human preferences. 4. **Explaining the decision - making process**: In order to establish long - term trust, robots need not only to perform tasks but also to explain their decision - making processes to humans. This includes explaining why a specific path is chosen and how adjustments are made according to human feedback. ### Core contributions of the paper - **Social navigation architecture based on Graph Attention Network (GAT)**: This architecture can predict trajectories based on the relationships between individuals in the environment. - **Trustworthy artificial intelligence module**: This module can explain the decisions made by robots based on visual feedback and predicted trajectories. - **Two - way human - machine interaction**: By recognizing hand gestures and providing voice responses, robots can explain their decision - making processes, thereby maintaining long - term cooperation and trust with humans. ### Method overview 1. **Human detection and location**: By fusing RGB images and point clouds generated by 2D LIDAR, use weak perspective projection and instance segmentation algorithms to detect and locate humans. 2. **Trajectory prediction**: Use a pre - trained LSTM encoder to encode trajectories into dense graphs, and then use GAT to predict future positions. 3. **Path planning and adjustment**: Dynamically adjust the robot path according to the predicted trajectory and visual feedback, and explain its behavior through voice. 4. **Gesture recognition**: Recognize five gestures (wait, turn left, turn right, continue, unknown) through the Mediapipe model and adjust the path according to the gestures. ### Preliminary experiments At present, researchers have completed preliminary experiments on some components, including human location, trajectory encoding, and gesture classification. Future plans include collecting data in a smart factory environment, verifying the performance of the GAT architecture, and evaluating the credibility and ease - of - use of the system through user surveys. In conclusion, this paper aims to enhance the credibility and interaction ability of mobile robots in social navigation by introducing a two - way audio - visual interaction framework, thereby promoting trust and comfort in human - machine collaborative environments.