Proceedings of the IJCAI 2017 Workshop on Learning in the Presence of Class Imbalance and Concept Drift (LPCICD'17)

Shuo Wang,Leandro L. Minku,Nitesh Chawla,Xin Yao
DOI: https://doi.org/10.48550/arXiv.1707.09425
2017-07-29
Abstract:With the wide application of machine learning algorithms to the real world, class imbalance and concept drift have become crucial learning issues. Class imbalance happens when the data categories are not equally represented, i.e., at least one category is minority compared to other categories. It can cause learning bias towards the majority class and poor generalization. Concept drift is a change in the underlying distribution of the problem, and is a significant issue specially when learning from data streams. It requires learners to be adaptive to dynamic changes. Class imbalance and concept drift can significantly hinder predictive performance, and the problem becomes particularly challenging when they occur simultaneously. This challenge arises from the fact that one problem can affect the treatment of the other. For example, drift detection algorithms based on the traditional classification error may be sensitive to the imbalanced degree and become less effective; and class imbalance techniques need to be adaptive to changing imbalance rates, otherwise the class receiving the preferential treatment may not be the correct minority class at the current moment. Therefore, the mutual effect of class imbalance and concept drift should be considered during algorithm design. The aim of this workshop is to bring together researchers from the areas of class imbalance learning and concept drift in order to encourage discussions and new collaborations on solving the combined issue of class imbalance and concept drift. It provides a forum for international researchers and practitioners to share and discuss their original work on addressing new challenges and research issues in class imbalance learning, concept drift, and the combined issues of class imbalance and concept drift. The proceedings include 8 papers on these topics.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to design a robust linear classifier in the case of class imbalance. Specifically, the paper focuses on how to improve linear discriminant analysis (LDA) to enhance classification performance under the condition of heteroscedasticity, that is, when the covariance matrices of different classes of data are not equal. The traditional LDA assumes that the data distribution of each class has the same covariance matrix, which is often not valid in practical applications, especially in the presence of class imbalance, and this assumption will lead to a decline in classification performance. ### Main contributions of the paper 1. **Proposed a new optimization method**: - The paper proposed a method for designing a linear classifier based on the Bayesian optimal principle, which optimizes classifier parameters by minimizing the misclassification probability. - The author derived the first - order and second - order optimality conditions of this problem and proposed an efficient gradient - descent optimization algorithm based on these conditions. 2. **Dealing with the class imbalance problem**: - The paper paid special attention to the impact of class imbalance on classifier performance. In the case of class imbalance, traditional LDA and other methods may be biased towards the majority class, resulting in poor classification performance for the minority class. - The algorithm proposed by the author still performs well in the case of class imbalance and can improve the recognition ability of the minority class while maintaining the overall classification performance. 3. **Experimental verification**: - The paper verified the effectiveness of the proposed method through experiments on an artificial data set and five real - world data sets. - The experimental results show that, compared with the existing heteroscedastic LDA methods, linear support vector machines (SVM), and other traditional methods, the proposed method performs excellently in terms of classification accuracy and the area under the ROC curve (AUC). ### Formula summary - **Misclassification probability**: \[ p_e=\pi_1 P(y < w_0|C_1)+\pi_2 P(y\geq w_0|C_2) \] where \(\pi_1\) and \(\pi_2\) are the prior probabilities of the two classes respectively. - **First - order optimality conditions**: \[ \frac{\partial p_e}{\partial w} = 0,\quad\frac{\partial p_e}{\partial w_0}=0 \] - **Update of weight vector and threshold**: \[ w_{i + 1}=w_i-\alpha\frac{\partial p_e}{\partial w_i} \] \[ w_{0,i + 1}=w_{0,i}-\alpha\frac{\partial p_e}{\partial w_{0,i}} \] - **Calculation of optimal \(s\)**: \[ s^*=\frac{\sigma_1^*z_2^*-\sigma_2^*z_1^*}{\sigma_1^*z_2^*-\sigma_2^*z_1^*} \] where \(\sigma_1^* = w^{*T}\Sigma_1 w^*\), \(\sigma_2^* = w^{*T}\Sigma_2 w^*\), \(z_1^* = w_0^* - w^{*T}\bar{x}_1^*/\sigma_1^*\), \(z_2^* = w_0^* - w^{*T}\bar{x}_2^*/\sigma_2^*\). ### Conclusion This paper solves the problem of how to design a robust linear classifier under the conditions of class imbalance and heteroscedasticity by proposing a new optimization method. The experimental results show that the proposed method performs excellently on multiple data sets, especially when dealing with the class imbalance problem, it can significantly improve the classification performance of the minority class.