An Improved Method for Combination Feature Selection in Web Click-Through Data Mining

Hongwei Zhao,Yongfeng Huang
DOI: https://doi.org/10.1109/isise.2012.92
2012-01-01
Abstract:An important way to analyze the web click-through data is to build up a 2-class linear classifier, and select a key subset from user's features which mainly decided the hit result. But in many circumstances, the fitting accuracy is not good as the model only considers original features. We often add combination features which are products of the original features to the classifier model to improve the accuracy. Meanwhile, the combination features cause a serious problem. They dramatically increase the number of features, which is called "feature dimension explosion". Traditional algorithms can hardly afford this because they need to input all the features at the beginning of processing. Grafting method provides an incremental way to solve the problem, which only adds one feature at a time. However, Grafting method has very low efficiency when the dimension of the feature space is huge and sparse. In this paper, we propose an adaptive Grafting algorithm and PV filter method to solve the feature dimension explosion problem. Our algorithm significantly improves the computational efficiency by educing the steps of model optimizing, and reduces the scale of feature space by applying a very simple filter strategy to make the algorithm work effectively. Our experiments on real data show that we can easily generate and select ombination features by using the adaptive Grafting algorithm and PV filter method, which significantly raises the fitting accuracy of the model.
What problem does this paper attempt to address?