SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey

Kien X. Nguyen,Fengchun Qiao,Arthur Trembanis,Xi Peng
2024-11-01
Abstract:A major obstacle to the advancements of machine learning models in marine science, particularly in sonar imagery analysis, is the scarcity of AI-ready datasets. While there have been efforts to make AI-ready sonar image dataset publicly available, they suffer from limitations in terms of environment setting and scale. To bridge this gap, we introduce SeafloorAI, the first extensive AI-ready datasets for seafloor mapping across 5 geological layers that is curated in collaboration with marine scientists. We further extend the dataset to SeafloorGenAI by incorporating the language component in order to facilitate the development of both vision- and language-capable machine learning models for sonar imagery. The dataset consists of 62 geo-distributed data surveys spanning 17,300 square kilometers, with 696K sonar images, 827K annotated segmentation masks, 696K detailed language descriptions and approximately 7M question-answer pairs. By making our data processing source code publicly available, we aim to engage the marine science community to enrich the data pool and inspire the machine learning community to develop more robust models. This collaborative approach will enhance the capabilities and applications of our datasets within both fields.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the major obstacle in the development of machine - learning models in the field of marine science, especially in sonar image analysis - the lack of large - scale, high - quality AI - ready datasets. Specifically, the existing public sonar image datasets have the following limitations: 1. **Limited environmental settings and scale**: Existing datasets are usually collected in specific environments (for example, experiments in water tanks) and cannot accurately reflect complex ocean conditions; or they are small in scale and cannot support the training of machine - learning models in a wide geographical area. 2. **Inconsistent labeling**: The naming of geological attributes is not unified between different datasets, making it difficult to integrate them into a large - scale dataset. 3. **Lack of multi - modal data**: The lack of tasks combining visual and language understanding limits the development of generative vision - language models. To solve these problems, the author introduced two datasets, SeafloorAI and SeafloorGenAI: - **SeafloorAI**: This is a large - scale, geographically widely - distributed multi - purpose sonar image dataset for mapping the seabed geological layers. It contains 696,515 sonar images and 827,220 segmentation masks, covering an area of 17,300 square kilometers of sea area. The dataset includes five geological layers: sediment, geomorphic area, habitat, fault, and fold, and all data are standardized to ensure the consistency of labeling. - **SeafloorGenAI**: This is an extended version of SeafloorAI, with a language component added to support the development of generative vision - language models. This dataset contains 696,000 detailed language descriptions and about 7 million question - answer pairs, enabling the model to interact through text queries and provide clear and understandable explanations. Through these two datasets, the author hopes to promote the research progress in the fields of marine science and machine learning, promote a more efficient automated process of seabed mapping, and improve the robustness and generalization ability of the model. ### Markdown representation of formulas Although this article does not involve complex formulas, for the sake of consistency, the following are the Markdown formula representations of some key concepts mentioned in the article: - **Slope**: \[ \text{Slope}=\frac{\Delta z}{\Delta d} \] where \(\Delta z\) is the elevation change and \(\Delta d\) is the distance change. - **Rugosity**: \[ \text{Rugosity}=\frac{\text{Surface area}}{\text{Plane area}} \] These formulas are helpful for better understanding the methods of calculating topographical features involved in the data processing process.