protr: R package for generating various numerical representation schemes of protein sequence

Nan Xiao,Qing-Song Xu
2015-01-01
Abstract:The protr package offers a unique and comprehensive toolkit for generating various numerical representation schemes of protein sequence. The descriptors included are extensively utilized in Bioinformatics and Chemogenomics research. The commonly used descriptors listed in protr include amino acid composition, autocorrelation, CTD, conjoint traid, quasi-sequence order, pseudo amino acid composition, and profile-based descriptors derived by PositionSpecific Scoring Matrix (PSSM). The descriptors for proteochemometric (PCM) modeling, includes the scales-based descriptors derived by principal components analysis, factor analysis, multidimensional scaling, amino acid properties (AAindex), 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.), and BLOSUM/PAM matrix-derived descriptors. The protr package also integrates the function of parallelized similarity computation derived by pairwise protein sequence alignment and Gene Ontology (GO) semantic similarity measures. ProtrWeb, the web server built on protr, is located at: http://protr.org.
What problem does this paper attempt to address?