Impact Analysis of Classification Performance for Cross-Validation of Imbalance Spliting Data

Zhao Cunxiu,Wang Ruibo,Li Jihong
2013-01-01
Abstract:Cross-validation is widely used in the model generalization error estimation.In particular,the 2 fold cross-validation has been widely used in the classification model's comparison.Using 2 fold cross-validation method in the Logistic regression model and characteristics(independent variable) values are 0 or 1 when studing the model's performance.The results show that precision,recall rate,F value and the accurate rate of 2 fold cross-validation deviation estimation are minimum when the distribution of categories are same or similar in the 2 fold cross-validation,the estimation of deviation increases with the 2 fold cross-validation category difference.The estimation of model's performance is significant degraded when class distributions of 2 fold data sets diverge.Therefore,we should try to keep the distribution of each data category consistency with sample when using cross-validation segmentation data.
What problem does this paper attempt to address?