A Clustering-based Framework for Highly Imbalanced Fault Detection with the Applications on High-Speed Trains

Min Qian,Yan-Fu Li
DOI: https://doi.org/10.1109/qrs-c55045.2021.00112
2021-01-01
Abstract:In recent years, an increasing part of the world population is relying on high-speed trains (HST) for their daily travels. The mission-critical sub-systems, such as the braking system, of HST play a central role in traveling safety. However, the braking systems are highly reliable such that the defects in these systems are relatively rare. Such phenomenon results in very limited failure records in large amounts of operation data and thus renders the normal/fault ratio of the dataset imbalanced. Targeting the high imbalance classification in HST, we propose a clustering-based classification framework, which includes a deep imbalanced clustering (DIC) algorithm and a newly designed iterative clustering-based sample elimination (ICSE) algorithm specialized for imbalanced classification. DIC uses a unique objective function design to concentrate the minority samples as much as possible to improve the performance of subsequent ICSE undersampling. ICSE algorithm removes the majority samples based on the clustering results, without losing much of the majority sample information. Empirical validations are conducted on a real-world HST dataset including the comparisons to seven popular imbalanced classification methods. The experimental results exhibit that the proposed undersampling framework can greatly improve the performance of a variety of imbalanced classification methods, and is an effective preprocessing method for highly imbalanced datasets.
What problem does this paper attempt to address?