CADDY Underwater Stereo-Vision Dataset for Human-Robot Interaction (HRI) in the Context of Diver Activities

Arturo Gomez Chavez,Andrea Ranieri,Davide Chiarella,Enrica Zereik,Anja Babić,Andreas Birk
DOI: https://doi.org/10.48550/arXiv.1807.04856
2018-07-13
Abstract:In this article we present a novel underwater dataset collected from several field trials within the EU FP7 project "Cognitive autonomous diving buddy (CADDY)", where an Autonomous Underwater Vehicle (AUV) was used to interact with divers and monitor their activities. To our knowledge, this is one of the first efforts to collect a large dataset in underwater environments targeting object classification, segmentation and human pose estimation tasks. The first part of the dataset contains stereo camera recordings (~10K) of divers performing hand gestures to communicate and interact with an AUV in different environmental conditions. These gestures samples serve to test the robustness of object detection and classification algorithms against underwater image distortions i.e., color attenuation and light backscatter. The second part includes stereo footage (~12.7K) of divers free-swimming in front of the AUV, along with synchronized IMUs measurements located throughout the diver's suit (DiverNet) which serve as ground-truth for human pose and tracking methods. In both cases, these rectified images allow investigation of 3D representation and reasoning pipelines from low-texture targets commonly present in underwater scenarios. In this paper we describe our recording platform, sensor calibration procedure plus the data format and the utilities provided to use the dataset.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to effectively collect and utilize data to improve the performance of tasks such as object classification, segmentation, and human pose estimation in the underwater environment, especially in human - robot interaction (HRI) between divers and autonomous underwater vehicles (AUVs). Specifically, the paper introduces an underwater stereo vision dataset named CADDY, aiming to address the following challenges: 1. **Underwater image distortion**: Images in the underwater environment are usually affected by color attenuation, light scattering, etc., resulting in a decline in image quality. Therefore, it is necessary to test the robustness of algorithms under these conditions. 2. **Lack of large - scale underwater datasets**: Due to the high difficulty and cost of underwater data collection, the number of existing underwater datasets is limited and not representative enough. The CADDY dataset fills this gap by providing a large number of annotated stereo images. 3. **Multi - modal data fusion**: The dataset contains not only stereo vision data but also synchronized inertial measurement unit (IMU) data, which provides a rich source of information for researchers and can be used to explore methods of 2D and 3D information fusion. 4. **Human pose estimation and tracking**: By precisely annotating the poses of divers, this dataset supports the development and testing of vision - based human pose estimation and tracking algorithms, especially in low - texture environments. ### Specific content of the dataset - **Gesture recognition part**: It contains approximately 10,000 pairs of annotated stereo images, showing the gesture actions performed by divers under different environmental conditions. These samples are used to test the robustness of gesture recognition algorithms. - **Pose estimation part**: It contains approximately 12,700 pairs of annotated stereo images, showing freely - swimming divers, accompanied by synchronized IMU measurement data, as a benchmark for human pose and tracking methods. ### Application scenarios of the dataset - **Object detection and classification**: Evaluate the performance of algorithms in the case of underwater image distortion. - **Human pose estimation**: Use IMU data and stereo images for pose estimation and tracking. - **2D and 3D information fusion**: Explore how to combine multi - modal data to improve algorithm performance. ### Summary By introducing the CADDY dataset, this paper solves the problem of insufficient data in human - robot interaction research in the underwater environment and provides a valuable resource for research in related fields.