POSSUM: a Bioinformatics Toolkit for Generating Numerical Sequence Feature Descriptors Based on PSSM Profiles

Jiawei Wang,Bingjiao Yang,Jerico Revote,Andre Leier,Tatiana T. Marquez-Lago,Geoffrey Webb,Jiangning Song,Kuo-Chen Chou,Trevor Lithgow
DOI: https://doi.org/10.1093/bioinformatics/btx302
IF: 5.8
2017-01-01
Bioinformatics
Abstract:A Summary: Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of descriptors. Here, we present POSSUM ( Position-Specific Scoring matrix-based feature generator for machine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM-based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to facilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research.
What problem does this paper attempt to address?