Abstract:Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.

Identification of Novel Bacterial Microproteins Encoded by Small Open Reading Frames Using a Computational Proteogenomics Workflow

Integrated De Novo Gene Prediction and Peptide Assembly of Metagenomic Sequencing Data

ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects

The Cryptic Bacterial Microproteome

No country for old methods: New tools for studying microproteins

Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform

A catalog of small proteins from the global microbiome

Rp3: Ribosome profiling-assisted proteogenomics improves coverage and confidence during microprotein discovery

Pro-SMP finder–A systematic approach for discovering small membrane proteins in prokaryotes

ProTInSeq: transposon insertion tracking by ultra-deep DNA sequencing to identify translated large and small ORFs

Moving Toward Metaproteogenomics: A Computational Perspective on Analyzing Microbial Samples via Proteogenomics

Towards the characterization of the hidden world of small proteins in Staphylococcus aureus, a proteogenomics approach

sPepFinder expedites genome-wide identification of small proteins in bacteria

Smorfunction: a Tool for Predicting Functions of Small Open Reading Frames and Microproteins

Bioinformatics-based strategies for rapid microorganism identification by mass spectrometry

Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning

OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques

Proteins à la carte: riboproteogenomic exploration of bacterial N-terminal proteoform expression

Proteogenomic Analysis and Global Discovery of Posttranslational Modifications in Prokaryotes

Identification of new genes on a whole genome scale using saturated reporter transposon mutagenesis

Challenges in computational discovery of bioactive peptides in 'omics data