A Detection and Classification Method of Asphalt Pavement Crack based on Vision Transformer

Chen Guo,Yulong Yang,Bingxin Yang,Chen Zuo
DOI: https://doi.org/10.1145/3656766.3656970
2023-11-24
Abstract:Road surface distress detection is an important component of smart maintenance of transportation infrastructure, and crack damage is a common and far-reaching category of distress. Accurately and quickly detecting road cracks is of great significance for road prevention work. In recent years, the deep learning technology represented by convolutional neural networks has been widely used in the field of road damage detection. However, the convolutional neural network relies on a stack of convolutional layers to extract visual features from training images. Constrained by its ability to characterize long-distance correlations, it is challenging to apply convolutional neural network to tackle transverse cracks and longitudinal cracks in the real-world scenario. As an emerging deep learning architecture, Transformer model has received considerable attention in the fields of natural language processing and computer vision. In this paper, we employ the Vision Transformer (ViT) to realize road damage classification. First, the histogram equalization technique is adopted to fulfill image preprocessing. The program enhances image contrast and eliminates the influence of illumination variation. Second, ViT, ResNet, DenseNet, and EfficientNet are separately implemented. We expand the training dataset with the data augmentation technique. Third, the computer evaluates the classification quality by means of accuracy, F1-score, and recall. A group of datasets of varying size are used to examine deep neural networks. The experiment result indicates that ViT outperforms CNN models in terms of classification quality. The long-range crack structures are reasonably identified by a fine-tune ViT model.
Engineering,Computer Science
What problem does this paper attempt to address?