That was not what I was aiming at! Differentiating human intent and outcome in a physically dynamic throwing task

Vidullan Surendran,Alan R. Wagner
DOI: https://doi.org/10.1007/s10514-022-10074-5
2024-10-27
Abstract:Recognising intent in collaborative human robot tasks can improve team performance and human perception of robots. Intent can differ from the observed outcome in the presence of mistakes which are likely in physically dynamic tasks. We created a dataset of 1227 throws of a ball at a target from 10 participants and observed that 47% of throws were mistakes with 16% completely missing the target. Our research leverages facial images capturing the person's reaction to the outcome of a throw to predict when the resulting throw is a mistake and then we determine the actual intent of the throw. The approach we propose for outcome prediction performs 38% better than the two-stream architecture used previously for this task on front-on videos. In addition, we propose a 1-D CNN model which is used in conjunction with priors learned from the frequency of mistakes to provide an end-to-end pipeline for outcome and intent recognition in this throwing task.
Robotics,Human-Computer Interaction
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to distinguish human intentions from actual results in physical dynamic throwing tasks. Specifically, the paper focuses on predicting the true intentions of throwers by capturing superficial cues such as human facial reactions in the presence of errors. ### Main problems of the paper 1. **Challenges in intention recognition**: - In human - robot collaborative tasks, recognizing intentions can improve team performance and human perception of robots. - There may be differences between intentions and observed results, especially in physical dynamic tasks where errors are likely to occur. - Traditional methods usually ignore intention recognition in error - prone situations, reducing the effectiveness of these methods in the real world. 2. **Dataset and experimental design**: - A dataset containing 1,227 throws was created, from 10 participants, of which 47% of the throws were incorrect and 16% completely missed the target. - The dataset was captured from three different perspectives, including front, side, and oblique - side views, to ensure data diversity and comprehensiveness. 3. **Proposed method**: - Facial images are used to capture the reactions of throwers to predict whether the throw is incorrect and further determine the actual intentions of the throwers. - A 1D - CNN model is proposed, combined with prior knowledge learned from error frequencies, providing an end - to - end pipeline to identify results and intentions in throwing tasks. - The performance of this method on front - view videos is 38% higher than that of the previous two - stream architectures. ### Formula representation - **Scoring formula** for determining the throwing frame (i.e., the frame at the end of the acceleration phase): \[ \text{score}_i=\frac{\|v_i\|^2-\max(V)}{\max(V)} \] where \(v_i\) represents the velocity of the throwing wrist joint in the \(i\) - th frame, and \(V\) is the \(L_2\) - norm vector of velocities in the entire throwing sample. - **Filter application**: To reduce noise, the Savitzky - Golay filter is used, fitting a second - order polynomial with a convolution window length of 11 steps. ### Conclusion This paper successfully solves the problem of recognizing the intentions of throwers in the presence of errors by introducing facial reactions and other superficial cues. This method not only improves the accuracy of intention recognition but also provides new ideas and methods for future human - machine interaction research.