Construction material classification on imbalanced datasets using Vision Transformer (ViT) architecture

Maryam Soleymani,Mahdi Bonyani,Hadi Mahami,Farnad Nasirzadeh
DOI: https://doi.org/10.48550/arXiv.2108.09527
2022-09-07
Abstract:This research proposes a reliable model for identifying different construction materials with the highest accuracy, which is exploited as an advantageous tool for a wide range of construction applications such as automated progress monitoring. In this study, a novel deep learning architecture called Vision Transformer (ViT) is used for detecting and classifying construction materials. The robustness of the employed method is assessed by utilizing different image datasets. For this purpose, the model is trained and tested on two large imbalanced datasets, namely Construction Material Library (CML) and Building Material Dataset (BMD). A third dataset is also generated by combining CML and BMD to create a more imbalanced dataset and assess the capabilities of the utilized method. The achieved results reveal an accuracy of 100 percent in evaluation metrics such as accuracy, precision, recall rate, and f1-score for each material category of three different datasets. It is believed that the suggested model accomplishes a robust tool for detecting and classifying different material types. To date, a number of studies have attempted to automatically classify a variety of building materials, which still have some errors. This research will address the mentioned shortcoming and proposes a model to detect the material type with higher accuracy. The employed model is also capable of being generalized to different datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?