Abstract:Recognising intent in collaborative human robot tasks can improve team performance and human perception of robots. Intent can differ from the observed outcome in the presence of mistakes which are likely in physically dynamic tasks. We created a dataset of 1227 throws of a ball at a target from 10 participants and observed that 47% of throws were mistakes with 16% completely missing the target. Our research leverages facial images capturing the person's reaction to the outcome of a throw to predict when the resulting throw is a mistake and then we determine the actual intent of the throw. The approach we propose for outcome prediction performs 38% better than the two-stream architecture used previously for this task on front-on videos. In addition, we propose a 1-D CNN model which is used in conjunction with priors learned from the frequency of mistakes to provide an end-to-end pipeline for outcome and intent recognition in this throwing task.

What problem does this paper attempt to address?

This paper attempts to solve the problem of how to distinguish human intentions from actual results in physical dynamic throwing tasks. Specifically, the paper focuses on predicting the true intentions of throwers by capturing superficial cues such as human facial reactions in the presence of errors. ### Main problems of the paper 1. **Challenges in intention recognition**: - In human - robot collaborative tasks, recognizing intentions can improve team performance and human perception of robots. - There may be differences between intentions and observed results, especially in physical dynamic tasks where errors are likely to occur. - Traditional methods usually ignore intention recognition in error - prone situations, reducing the effectiveness of these methods in the real world. 2. **Dataset and experimental design**: - A dataset containing 1,227 throws was created, from 10 participants, of which 47% of the throws were incorrect and 16% completely missed the target. - The dataset was captured from three different perspectives, including front, side, and oblique - side views, to ensure data diversity and comprehensiveness. 3. **Proposed method**: - Facial images are used to capture the reactions of throwers to predict whether the throw is incorrect and further determine the actual intentions of the throwers. - A 1D - CNN model is proposed, combined with prior knowledge learned from error frequencies, providing an end - to - end pipeline to identify results and intentions in throwing tasks. - The performance of this method on front - view videos is 38% higher than that of the previous two - stream architectures. ### Formula representation - **Scoring formula** for determining the throwing frame (i.e., the frame at the end of the acceleration phase): \[ \text{score}_i=\frac{\|v_i\|^2-\max(V)}{\max(V)} \] where \(v_i\) represents the velocity of the throwing wrist joint in the \(i\) - th frame, and \(V\) is the \(L_2\) - norm vector of velocities in the entire throwing sample. - **Filter application**: To reduce noise, the Savitzky - Golay filter is used, fitting a second - order polynomial with a convolution window length of 11 steps. ### Conclusion This paper successfully solves the problem of recognizing the intentions of throwers in the presence of errors by introducing facial reactions and other superficial cues. This method not only improves the accuracy of intention recognition but also provides new ideas and methods for future human - machine interaction research.

That was not what I was aiming at! Differentiating human intent and outcome in a physically dynamic throwing task

Accurate Real-Time Ball Trajectory Estimation with Onboard Stereo Camera System for Humanoid Ping-Pong Robot

Real-time Accurate Ball Trajectory Estimation with "asynchronous" Stereo Camera System for Humanoid Ping-Pong Robot.

Where Are You Throwing the Ball? I Better Watch Your Body, Not Just Your Arm!

FauxThrow: Exploring the Effects of Incorrect Point of Release in Throwing Motions

TossingBot: Learning to Throw Arbitrary Objects With Residual Physics

Prediction of Intentions Behind a Single Human Action: an Application of Convolutional Neural Network

Acquisition and transfer of models of visuo-motor uncertainty in a throwing task

Advancing robots with greater dynamic dexterity: A large-scale multi-view and multi-modal dataset of human-human throw&catch of arbitrary objects

Dynamic Handover: Throw and Catch with Bimanual Hands

Whole-Body Dynamic Throwing with Legged Manipulators

Multi-camera tracking of mechanically thrown objects for automated in-plant logistics by cognitive robots in Industry 4.0

Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton Based Action Recognition

What Will I Do Next? The Intention from Motion Experiment

Enhancing Robotic Collaborative Tasks Through Contextual Human Motion Prediction and Intention Inference

Intelligent Tracking of Mechanically Thrown Objects by Industrial Catching Robot for Automated In-Plant Logistics 4.0

A Solution to Adaptive Mobile Manipulator Throwing

The Selecting Optimal Ball-Receiving Body Parts Using Pose Sequence Analysis and Sports Biomechanics

A novel approach for automatic detection and identification of inappropriate postures and movements of table tennis players

Throwing Objects into A Moving Basket While Avoiding Obstacles

Visual Reaction: Learning to Play Catch With Your Drone