An average pooling designed Transformer for robust crack segmentation

Zhaohui Chen,Elyas Asadi Shamsabadi,Sheng Jiang,Luming Shen,Daniel Dias-da-Costa
DOI: https://doi.org/10.1016/j.autcon.2024.105367
IF: 10.3
2024-06-01
Automation in Construction
Abstract:Crack detection in civil infrastructures has seen impressive accuracy achieved by Convolutional Neural Networks (CNNs) and Transformers. However, practical deployments demand models that are not only highly accurate and robust but also efficient. This paper presents PoolingCrack, a novel and efficient Transformer-based model that leverages a hierarchical structure to capture local and global information in visual data, enabling accurate recovery of crack maps in various conditions. The encoder incorporates an average pooling design that enhances computational efficiency compared to traditional self-attention modules in Transformers, whereas the decoder deploys feature alignment, which improves the feature fusion accuracy. Asphalt, concrete, and masonry crack segmentation results show that the proposed model can reach 0.4% to 6.8% higher mDS than the representative models despite requiring 36–62% fewer parameters and achieving more robustness and effectiveness, with up to 52% higher mDS against noises and other artifacts.
construction & building technology,engineering, civil
What problem does this paper attempt to address?