An Integrated in Situ Image Acquisition and Annotation Scheme for Instance Segmentation Models in Open Scenes with a Human–Robot Interaction Approach

Liang Gong,Zhiyu Yang,Yihang Yao,Binhao Chen,Wenjie Wang,Xiaofeng Du,Yidong He,Chengliang Liu
DOI: https://doi.org/10.1109/thms.2022.3222021
2023-01-01
IEEE Transactions on Human-Machine Systems
Abstract:A large amount of data acquisition and annotation work is required to train a supervised machine learning model for open scenes. However, traditional manual approaches are inefficient. Here, a method is proposed for on-site image acquisition and semiautomatic annotation based on eye-tracking. This method uses the recognition capabilities and computational advantages of humans and machines to improve annotation efficiency, overcoming the bottleneck of AI-based approaches to the natural scenery understanding of field robots. The proposed method contains three advancements. First, we designed a head-mounted display with a built-in pose measurement module to achieve first-person teleoperation data acquisition, where a pseudoframe interpolation algorithm is designed to overcome the latency problem in immersive remote data transmission and to achieve efficient field data acquisition. Second, we propose an adaptive superpixel segmentation algorithm to reduce human–machine interactions based on eye-tracking. Third, since traditionally, the annotation process cannot provide feedback to the acquisition process and results in a low conversion rate. We proposed a new conversion rate index denoting the rate of transforming collected data into valid data to quantify the acquisition quality in real time. While achieving an annotation quality of 0.964 per the Dice index, which is approximately equal to that of the manual method, the proposed method improves the annotation efficiency by more than 3 times. Finally, the agricultural field experiments containing a real-life scene of robotic tomato-picking verified that the proposed method based on human–computer interaction can make full use of human perception and recognition intelligence.
What problem does this paper attempt to address?