Improving classification of road surface conditions via road area extraction and contrastive learning

Linh Trinh,Ali Anwar,Siegfried Mercelis
2024-07-19
Abstract:Maintaining roads is crucial to economic growth and citizen well-being because roads are a vital means of transportation. In various countries, the inspection of road surfaces is still done manually, however, to automate it, research interest is now focused on detecting the road surface defects via the visual data. While, previous research has been focused on deep learning methods which tend to process the entire image and leads to heavy computational cost. In this study, we focus our attention on improving the classification performance while keeping the computational cost of our solution low. Instead of processing the whole image, we introduce a segmentation model to only focus the downstream classification model to the road surface in the image. Furthermore, we employ contrastive learning during model training to improve the road surface condition classification. Our experiments on the public RTK dataset demonstrate a significant improvement in our proposed method when compared to previous works.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of road surface condition classification. Specifically, the authors are concerned with how to improve the performance of road surface condition classification while maintaining a low computational cost. Traditional deep - learning methods usually process the entire image, which leads to a high computational cost and may contain information irrelevant to the road (such as buildings, vehicles, etc.), thus affecting the accuracy of classification. To solve these problems, the authors propose a new method, which includes the following two main steps: 1. **Road area extraction**: Extract only the road area from the original image through a segmentation model to reduce the interference of irrelevant information. 2. **Contrastive learning**: Introduce contrastive learning during the model training process to improve the consistency of semantic embedding features, thereby enhancing the classification performance. ### Method overview 1. **Road area extraction**: - Use an encoder - decoder model for binary - classification segmentation tasks to divide the image into road areas and non - road areas. - The segmentation model is trained using the binary cross - entropy loss function \( L_{\text{seg}} \): \[ L_{\text{seg}} = -\frac{1}{N} \sum_{i = 1}^{N} \left( y_i \log(P_i)+(1 - y_i) \log(1 - P_i) \right) \] where \( N \) is the number of pixels in the training batch, and \( y_i \) and \( P_i \) are the labeled and predicted pixel confidences, respectively. 2. **Classification model**: - The extracted road area is sent as input to the classification model, which aims to classify it into multiple categories \( C_1, C_2,\ldots, C_n \). - During the training process, contrastive learning is used to improve the classification task. For a pair of samples \( (x_i, x_j) \), their corresponding embedding features are \( p_i \) and \( p_j \), respectively, and the contrastive loss function \( L_{\text{ct}} \) is defined as follows: \[ L_{\text{ct}}(x_i, x_j)=-\log\frac{\exp\left(\frac{\text{SIM}(p_i, p_j)}{\tau}\right)}{\sum_{k = 1}^{K} I(x_i, x_k)\cdot\exp\left(\frac{\text{SIM}(p_i, p_k)}{\tau}\right)} \] where \( I(x_i, x_k) \) is an indicator function: \[ I(x_i, x_k)= \begin{cases} 0 & \text{if } x_i \text{ and } x_k \text{ are in the same class}\\ 1 & \text{otherwise} \end{cases} \] The similarity function \( \text{SIM}(p_i, p_j) \) is calculated using the cosine distance: \[ \text{SIM}(p_i, p_j)\approx\cos(p_i, p_j)=\frac{p_i^T\times p_j}{||p_i||\times||p_j||} \] 3. **Total loss function**: - The total loss function \( L \) combines the classification cross - entropy loss \( L_{\text{ce}} \) and the contrastive loss \( L_{\text{ct}} \): \[ L = L_{\text{ce}}+\lambda L_{\text{ct}} \]