Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data.

Haitao Zhang,Wenhai Ma,Qilong Han,Zhiqiang Ma
DOI: https://doi.org/10.1007/978-981-99-5968-6_14
2023-01-01
Abstract:This paper aims to address the problems of data imbalance, parameter adjustment complexity, and low accuracy in high-dimensional data anomaly detection. To address these issues, an autoencoder and data augmentation-based anomaly detection model for high-dimensional sparse data is proposed (SEAOD). First, the model solves the problem of imbalanced data by using the weighted SMOTE algorithm and ENN algorithm to fill in the minority class samples and generate a new dataset. Then, an attention mechanism is employed to calculate the feature similarity and determine the structure of the neural network so that the model can learn the data features. Finally, the data are dimensionally reduced based on the autoencoder, and the sparse high-dimensional data are mapped to a low-dimensional space for anomaly detection, overcoming the impact of the curse of dimensionality on detection algorithms. The experimental results show that on 15 public datasets, this model outperforms other comparison algorithms. Furthermore, it was validated on industrial air quality datasets and achieved the expected results with practicality.
What problem does this paper attempt to address?