Simplified amino acid alphabets based on deviation of conditional probability from random background.

Xin Liu,Di Liu,Ji Qi,Wei-Mou Zheng
DOI: https://doi.org/10.1103/PhysRevE.66.021906
IF: 2.707
2002-01-01
Physical Review E
Abstract:The primitive data for deducing the Miyazawa-Jernigan contact energy or blocks substitution matrix (BLOSUM) consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such a conditional probability from random background, a scheme for the reduction of the amino acid alphabet is proposed. It is observed that an evident discrepancy exists between the reduced alphabets obtained from the raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking a homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.
What problem does this paper attempt to address?