PCA for predicting quaternary structure of protein

Tong Wang,Hongbin Shen,Lixiu Yao,Jie Yang,Kuochen Chou
DOI: https://doi.org/10.1007/s11460-008-0084-5
2008-01-01
Frontiers of Electrical and Electronic Engineering in China
Abstract:The number and arrangement of subunits that form a protein are referred to as quaternary structure. Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system. With the explosion of protein sequences generated in the Post-Genomic Age, it is vital to develop an automated method to deal with such a challenge. To explore this problem, we adopted an approach based on the pseudo position-specific score matrix (Pse-PSSM) descriptor, proposed by Chou and Shen, representing a protein sample. The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated information. However, incorporating all these effects into a descriptor may cause ‘high dimension disaster’. To overcome such a problem, the fusion approach was adopted by Chou and Shen. A completely different approach, linear dimensionality reduction algorithm principal component analysis (PCA) is introduced to extract key features from the high-dimensional Pse-PSSM space. The obtained dimension-reduced descriptor vector is a compact representation of the original high dimensional vector. The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems, such as predicting the quaternary structure of proteins.
What problem does this paper attempt to address?