The Influence of Data Preparation on Outlier Detection in Driveability Data

Andreas Ramsauer,Petra Martina Baumann,Cornelia Lex
DOI: https://doi.org/10.1007/s42979-021-00607-7
2021-04-20
SN Computer Science
Abstract:Outlier detection in multivariate data is an important topic across various disciplines, especially when dealing with high amounts of data. This publication focuses on the practical impact of data preparation techniques for outlier detection in driveability data. Driveability (also referred to as drive quality) is a key decisive factor for the marketability of a vehicle, as the final decision to buy a vehicle is mostly made after a test drive. During the vehicle development process, driveability targets are constantly monitored by tracking of objective performance indicators derived from sensor signals and/or simulation models. With the variables of interest for driveability evaluation being of highly different magnitude, data scaling methods, also referred to as data normalization methods, are applied and the impact on the outlier detection is discussed. Specifically, three different data preparation techniques suitable for multivariate data are applied to three selected datasets. After scaling the data, the outlier detection is performed by the well-established DBSCAN algorithm. Parameters of the investigated techniques are varied and the effect on the detected outliers is discussed in detail. The discussion is further aided by statistical exploration of the outliers identified by DBSCAN with different scaling techniques and a comparison with human-detected outliers. The results demonstrate advantages and disadvantages of the three investigated approaches.
What problem does this paper attempt to address?