Vision transformer-based autonomous crack detection on asphalt and concrete surfaces

Elyas Asadi Shamsabadi,Chang Xu,Aravinda S. Rao,Tuan Nguyen,Tuan Ngo,Daniel Dias-da-Costa
DOI: https://doi.org/10.1016/j.autcon.2022.104316
IF: 10.3
2022-08-01
Automation in Construction
Abstract:Previous research has shown the high accuracy of convolutional neural networks (CNNs) in asphalt and concrete crack detection in controlled conditions. Yet, human-like generalisation remains a significant challenge for industrial applications where the range of conditions varies significantly. Given the intrinsic biases of CNNs, this paper proposes a vision transformer (ViT)-based framework for crack detection on asphalt and concrete surfaces. With transfer learning and the differentiable intersection over union (IoU) loss function, the encoder-decoder network equipped with ViT could achieve an enhanced real-world crack segmentation performance. Compared to the CNN-based models (DeepLabv3+ and U-Net), TransUNet with a CNN-ViT backbone achieved up to ~61% and ~3.8% better mean IoU on the original images of the respective datasets with very small and multi-scale crack semantics. Moreover, ViT assisted the encoder-decoder network to show a robust performance against various noisy signals where the mean Dice score attained by the CNN-based models significantly dropped (<10%).
construction & building technology,engineering, civil
What problem does this paper attempt to address?