Semi-Symmetrical, Fully Convolutional Masked Autoencoder for TBM Muck Image Segmentation

Ke Lei,Zhongsheng Tan,Xiuying Wang,Zhenliang Zhou
DOI: https://doi.org/10.3390/sym16020222
2024-02-13
Symmetry
Abstract:Deep neural networks are effectively utilized for the instance segmentation of muck images from tunnel boring machines (TBMs), providing real-time insights into the surrounding rock condition. However, the high cost of obtaining quality labeled data limits the widespread application of this method. Addressing this challenge, this study presents a semi-symmetrical, fully convolutional masked autoencoder designed for self-supervised pre-training on extensive unlabeled muck image datasets. The model features a four-tier sparse encoder for down-sampling and a two-tier sparse decoder for up-sampling, connected via a conventional convolutional neck, forming a semi-symmetrical structure. This design enhances the model's ability to capture essential low-level features, including geometric shapes and object boundaries. Additionally, to circumvent the trivial solutions in pixel regression that the original masked autoencoder faced, Histogram of Oriented Gradients (HOG) descriptors and Laplacian features have been integrated as novel self-supervision targets. Testing shows that the proposed model can effectively discern essential features of muck images in self-supervised training. When applied to subsequent end-to-end training tasks, it enhances the model's performance, increasing the prediction accuracy of Intersection over Union (IoU) for muck boundaries and regions by 5.9% and 2.4%, respectively, outperforming the enhancements made by the original masked autoencoder.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the scarcity of high - quality labeled data in the tunnel boring machine (TBM) muck - removal image segmentation task. Specifically: 1. **High cost of high - quality labeled data**: Although traditional deep neural network methods perform well in instance segmentation, they rely on a large amount of high - quality labeled data for training. The cost of obtaining these labeled data is very high, which limits their wide application. 2. **Complex and unclear muck - removal images**: Due to dust, fog in the construction environment and the rapidly moving muck - removal belt, TBM muck - removal images have poor image quality and unclear boundaries, and traditional image segmentation algorithms are difficult to achieve satisfactory results. 3. **Dense and complex muck - removal morphology**: Muck particles are usually closely packed or overlapped, forming a complex texture structure, which increases the difficulty of segmentation. To solve these problems, the author proposes a semi - symmetrical fully convolutional masked autoencoder (SS - FCMAE), aiming to reduce the dependence on labeled data through self - supervised pre - training and improve the segmentation performance of the model. Specific improvements include: 1. **Multi - layer sparse decoder**: Replace the lightweight decoder in the original MAE with a multi - layer sparse convolutional decoder to better capture the spatial relationships of low - level features, thereby improving the segmentation accuracy. 2. **Introduce HOG descriptors and Laplacian features**: Use the histogram of oriented gradients (HOG) descriptors and Laplacian features as new self - supervised targets to prevent the model from converging to a trivial solution and enhance the learning of contrast relationships and boundary features. Through these improvements, SS - FCMAE can perform effective self - supervised pre - training on a large - scale unlabeled muck - removal image data set and significantly improve the performance of the segmentation model in downstream end - to - end training tasks. Experimental results show that this method has a significant improvement in indicators such as intersection over union (IoU).