Abstract:Recent advancements in computer vision have significantly improved image analysis tasks. Yet, deep learning models often struggle when applied to domains outside their training distribution, such as in geosciences, where domain-specific data can be scarce. This study investigates the classification, segmentation, and interpretability of CT-scan images of rock samples, focusing on the application of modern computer vision techniques to geoscientific tasks. We compare a range of segmentation methods to assess their efficacy, efficiency, and adaptability in geological image analysis. The methods evaluated include Otsu thresholding, clustering techniques (K-means, fuzzy C-means), a supervised machine learning approach (Random Forest), and deep learning models (UNet, ResNet152, and DINOv2), using ten binary sandstone datasets and three multi-class calcite datasets. DINOv2 was selected for its promising results in feature extraction and its potential applicability in geoscientific tasks, prompting further assessment of its interpretability and effectiveness in processing CT-scanned rock data. For classification, a non-fine-tuned DINOv2 demonstrates strong performance in classifying rock images, even when the CT-scans are outside its original training set. In segmentation tasks, thresholding and clustering techniques, though computationally efficient, produce subpar results despite preprocessing efforts. In contrast, supervised methods achieve better performance. While deep learning methods demand greater computational resources, they require minimal intervention and offer superior generalization. A LoRA fine-tuned DINOv2, in particular, excels in out-of-distribution segmentation and outperforms other methods in multi-class tasks, even with limited data. Notably, the segmentation masks generated by DINOv2 often appear more accurate than the original targets, based on visual inspection.
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are how to effectively classify, segment, and interpret rock CT - scan images in geological image analysis. Specifically:
1. **Classification problem**: Study the CT - scan images of different types of rock samples (such as sandstone) and evaluate whether the model can accurately distinguish different types of rocks.
2. **Segmentation problem**: For multi - category rock samples (such as calcite), study how to accurately separate different material components (such as crude oil, brine, and rock matrix) from CT - scan images.
3. **Interpretability problem**: Explore the applicability and interpretability of modern computer vision techniques in geological science tasks, especially the performance of the DINOv2 model in this specific field.
### Background challenges
- **Data scarcity**: Data in the geological field is usually relatively scarce, especially in multi - category data sets.
- **Noise and complexity**: Geological images usually contain a high level of noise and complex gray - level distributions, which increase the difficulty of processing.
- **Limitations of existing methods**: Traditional threshold and clustering methods have high computational efficiency but perform poorly when dealing with complex and noisy data; supervised learning methods require a large amount of labeled data, and although deep - learning methods have superior performance, they require a large amount of data.
### Research objectives
By comparing multiple segmentation methods (including Otsu's threshold method, K - means, Fuzzy C - means, Random Forest, UNet, ResNet152, and DINOv2), the paper evaluates their effectiveness, efficiency, and adaptability in geological image analysis. It pays special attention to the potential of DINOv2 in feature extraction and processing geological CT - scan data, aiming to:
- Verify the strong performance of DINOv2 in classifying rock images without fine - tuning.
- Explore the performance of DINOv2 in segmentation tasks, especially in cases of scarce data and high noise.
- Provide empirical guidance to help geologists better use basic models such as DINOv2 for image analysis.
### Main contributions
- **Verify the effectiveness of DINOv2**: Research shows that DINOv2 performs excellently in rock image classification and segmentation tasks. Especially after LoRA fine - tuning, it can achieve better generalization ability on limited data.
- **Improve segmentation accuracy**: The segmentation masks generated by DINOv2 are often more accurate than the original targets in visual inspection, showing its advantages in processing geological images.
- **Promote the progress of geological image analysis**: By introducing advanced computer vision algorithms such as DINOv2, the reliability and repeatability of geological image segmentation are improved, providing strong support for fields such as digital rock physics.
In conclusion, this paper aims to solve the classification, segmentation, and interpretation problems in geological image analysis, especially in cases of scarce data and high noise, and explore and verify the application potential of the DINOv2 model.