Solving the Small Sample Size Problem in Protein Subcellular Localization Prediction.

Tong Wang,Xiaoxia Cao,Tian Xia,Zhizhen Yang
DOI: https://doi.org/10.1109/bmei.2012.6513152
2012-01-01
Abstract:In this paper, a new system is proposed to improve the performance of protein subcellular localization prediction. First of all, the protein sequences are quantized into a high dimension space using an effective sequence encoding scheme. However, the problem caused by such representation is small sample size problem, where the data dimension is much larger than the sample size. To sort out this problem, a new dimension reduction algorithm is introduced. It extracts the essential features from the high dimension feature space and does not suffer from small sample size problem. Then, an efficient classifier is employed to recognize the subcellular localization of proteins according to the new features after dimension reduction.
What problem does this paper attempt to address?