Research on Oversampling Algorithm for Imbalanced Datasets Based on ARIMA Model

Gang Chen,Xiaomei Guo
DOI: https://doi.org/10.1109/ccdc52312.2021.9602084
2021-01-01
Abstract:Imbalanced data classification is an important research topic in machine learning field. Conventional classification algorithms are not ideal for imbalanced datasets. In this paper, we propose an oversampling algorithm based on time series forecasting model. Based on the randomness of the data, the minority class data are transformed into time series. Then according to the particularity of time series forecasting, a series of tests before modeling are performed on the minority class data, which ensure that the conversion sequences conform to the principle of time series modeling. After that, the minority class data are oversampled through the fitted ARIMA model, so that the dataset is balanced. Finally, selecting eight datasets from UCI and KEEL repositories, the proposed algorithm is compared with other oversampling algorithms and the decision tree classifier is used to perform classification experiments. Experimental results show that the proposed algorithm is more effective than other algorithms.
What problem does this paper attempt to address?