Abstract:To accurately pour drinks into various containers is an essential skill for service robots. However, drink pouring is a dynamic process and difficult to model. Traditional deep imitation learning techniques for implementing autonomous robotic pouring have an inherent black-box effect and require a large amount of demonstration data for model training. To address these issues, an Explainable Hierarchical Imitation Learning (EHIL) method is proposed in this paper such that a robot can learn high-level general knowledge and execute low-level actions across multiple drink pouring scenarios. Moreover, with the EHIL method, a logical graph can be constructed for task execution, through which the decision-making process for action generation can be made explainable to users and the causes of failure can be traced out. Based on the logical graph, the framework is manipulable to achieve different targets while the adaptability to unseen scenarios can be achieved in an explainable manner. A series of experiments have been conducted to verify the effectiveness of the proposed method. Results indicate that EHIL outperforms the traditional behavior cloning method in terms of success rate, adaptability, manipulability, and explainability. Note to Practitioners—Pouring liquids is a common activity in people’s daily lives and all wet-lab industries. Drink pouring dynamic control is difficult to model, while the accurate perception of flow is challenging. To enable the robot to learn under unknown dynamics via observing the human demonstration, deep imitation learning can be used. To address the limitations of traditional deep neural networks, an Explainable Hierarchical Imitation Learning (EHIL) method is proposed in this paper. The proposed method enables the robot to learn a sequence of reasonable pouring phases for performing the task rather than simply execute the task via traditional behavior cloning. In this way, explainability and safety can be ensured. Manipulability can be achieved by reconstructing the logical graph. The target of this research is to obtain pouring dynamics via the learning method and realize the precise and quick pouring of drink from the source containers to various targeted containers with reliable performance, adaptability, manipulability, and explainability.

GMM Enabled by Multimodal Information Fusion Network for Detection and Motion Planning of Robotic Liquid Pouring

Robust Robotic Pouring using Audition and Haptics

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring

MFF-Net: Towards Efficient Monocular Depth Completion With Multi-Modal Feature Fusion

PourIt!: Weakly-supervised Liquid Perception from a Single Image for Visual Closed-Loop Robotic Pouring

A robot grasping detection network based on flexible selection of multi-modal feature fusion structure

Robot Gaining Accurate Pouring Skills through Self-Supervised Learning and Generalization

A Robust Hand Gesture Sensing and Recognition Based on Dual-Flow Fusion with FMCW Radar.

Pouring Dynamics Estimation Using Gated Recurrent Units

Accurate Robotic Pouring for Serving Drinks

Visual-Tactile Sensing for Real-time Liquid Volume Estimation in Grasping

Learning to Pour

Visual-and-Language Multimodal Fusion for Sweeping Robot Navigation Based on CNN and GRU

Explainable Hierarchical Imitation Learning for Robotic Drink Pouring

To Stir or Not to Stir: Online Estimation of Liquid Properties for Pouring Actions

Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation

Vision-Language Model-based Physical Reasoning for Robot Liquid Perception

Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

Multimodal Remote Sensing Data Classification Based on Gaussian Mixture Variational Dynamic Fusion Network

Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction