A Block-Based Support Vector Machine Approach to the Protein Homology Prediction Task in KDD Cup 2004

Yan Fu,Ruixiang Sun,Qiang Yang,Simin He,Chunli Wang,Haipeng Wang,Shiguang Shan,Junfa Liu,Wen Gao
DOI: https://doi.org/10.1145/1046456.1046475
2004-01-01
Abstract:This paper describes our solution for the protein homology prediction task in KDD Cup 2004 competition. This task is modeled as a supervised learning problem with multiple performance metrics. Several key characteristics make the problem both novel and challenging, including the concept of data blocks and the presence of large-scale and imbalanced training data. These features make a naive application of the traditional classification algorithms infeasible. Our approach focuses on making full use of the abundant information within the blocks, and developing a new technique for reducing and balancing training data to make the support vector machine applicable to this kind of large-scale and imbalanced learning tasks.
What problem does this paper attempt to address?