Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems

S. Madeh Piryonesi,Tamer E. El-Diraby
DOI: https://doi.org/10.1061/jpeodx.0000175
2020-06-01
Abstract:This study explores the performance regime of different classification algorithms as they are applied to the analysis of asphalt pavement deterioration data. The aim is to examine how different algorithms deal with the typically limited and low-quality data sets in the infrastructure asset management domain, and whether better configurations of relevant algorithms help overcome these limitations. Furthermore, the emphasis on choosing the most affordable attributes (e.g., temperature and precipitation levels) makes the results reproducible to smaller municipalities. This analysis used the data of more than 3,000 examples of road sections, which were retrieved from the Long-Term Pavement Performance (LTPP) database. The algorithms examined in this study include two types of decision trees, naïve Bayes classifier, naïve Bayes coupled with kernels, logistic regression, k-nearest neighbors (k-NN), random forest, and gradient boosted trees. The performance of these algorithms is compared, and their weaknesses and strengths are discussed. They were all applied to predict the deterioration of pavement condition index (PCI). Of specific importance is the positive role of ensemble learning. It is shown how using higher efficiencies by using ensemble learning can compensate for data shortcomings. The accuracy of some of the models in predicting the PCI after 3 years exceeded 90%. Suggestions are made to improve the performance of some algorithms. For instance, the naïve Bayes classifier was coupled with kernel estimates to achieve a better accuracy. It is demonstrated that using kernel estimates can increase the accuracy of the naïve Bayes classifier dramatically. Further, the study examines the impact of data segmentation. Data were divided into four different climatic regions. The accuracy of prediction was sufficiently high after segmentation, with the highest accuracy in the dry and nonfreeze zone and the lowest performance in the region with a wet and freezing climate.
transportation science & technology,engineering, civil
What problem does this paper attempt to address?