Vision transformer based classification of sewer defects weighted loss model
Chunhou Ji,Zhiqiang Xie,Rong Li,Zhibing Yang,ZhiQun Hou
DOI: https://doi.org/10.1016/j.tust.2024.106271
IF: 6.9
2024-12-02
Tunnelling and Underground Space Technology
Abstract:Sewers are a critical component of urban infrastructure, presenting significant challenges in efficiently detecting defects. Currently, Convolutional Neural Networks (CNNs) are widely applied in CCTV inspections for defects in sewers. However, images of these defects commonly suffer from dataset imbalance, and CNNs can only capture local features of images, lacking a comprehensive understanding of the global image and performing poorly in generalizing across imbalanced datasets. Therefore, we introduces a classification model for sewers based on the self-attention mechanism, SViT/CE, tailored for imbalanced datasets, and proposes the SViT/WCE and SViT/FL models. The results demonstrate that the SViT/CE model achieves precision, recall, and F1-score of 53.64%, 58.51%, and 55.97% respectively on the test set. The SViT/WCE model achieves 55.87%, 59.53%, and 57.64%; the SViT/FL model scores 54.13%, 62.91%, and 58.19%. In comparison, the best-performing model among the baseline models scores 41.56%, 42.86%, and 42.02% in precision, recall, and F1-score, respectively, with the SViT series models outperforming the baseline models across all three metrics. The SViT/WCE and SViT/FL models show improvements of 2.06%, 1.02%, 1.67%, and 0.49%, 4.40%, 2.22% over the SViT/CE model in precision, recall, and F1-score, respectively. This study provides an efficient self-attention mechanism model for the classification task of sewer image defects, ensuring the safety and stable operation of urban drainage infrastructure.
construction & building technology,engineering, civil