Size Dependent Complexity of Sequences in Protein Families

J. Li,J. Wang,W. Wang
DOI: https://doi.org/10.1140/epjb/e2005-00333-x
2005-01-01
The European Physical Journal B
Abstract:The size dependent complexity of protein sequences in various families in the FSSP database is characterized by sequence entropy, sequence similarity and sequence identity. As the average length L-f of sequences in the family increases, an increasing trend of the sequence entropy and a decreasing trend of the sequence similarity and sequence identity are found. As L-f increases beyond 250, a saturation of the sequence entropy, the sequence similarity and the sequence identity is observed. Such a saturated behavior of complexity is attributed to the saturation of the probability P-g of global (long-range) interactions in protein structures when L-f > 250. It is also found that the alphabet size of residue types describing the sequence diversity depends on the value of L-f, and becomes saturated at 12.
What problem does this paper attempt to address?