Abstract:Highlights • Identification of bioactive peptides is studied from an applicative standpoint • Sequence encoding and binary classification techniques are systematically tested • Individual functional classes require ad hoc encoding-classifier combinations • Studied models, trained on relevant data, can be more accurate than available tools • CICERON predicts the functions of peptides of interest over nine biological classes Bioactive peptides are short amino acid chains possessing biological activity and exerting physiological effects relevant to human health. Despite their therapeutic value, their identification remains a major problem, as it mainly relies on time-consuming in vitro tests. While bioinformatic tools for the identification of bioactive peptides are available, they are focused on specific functional classes and have not been systematically tested on realistic settings. To tackle this problem, bioactive peptide sequences and functions were here gathered from a variety of databases to generate a unified collection of bioactive peptides from microbial fermentation. This collection was organized into nine functional classes including some previously studied and some unexplored such as immunomodulatory, opioid and cardiovascular peptides. Upon assessing their sequence properties, four alternative encoding methods were tested in combination with a multitude of machine learning algorithms, from basic classifiers like logistic regression to advanced algorithms like BERT. Tests on a total of 171 models showed that, while some functions are intrinsically easier to detect, no single combination of classifiers and encoders worked universally well for all classes. For this reason, we unified all the best individual models for each class and generated CICERON (Classification of bIoaCtive pEptides fRom micrObial fermeNtation), a classification tool for the functional classification of peptides. State-of-the-art classifiers were found to underperform on our realistic benchmark dataset compared to the models included in CICERON. Altogether, our work provides a tool for real-world peptide classification and can serve as a benchmark for future model development. Graphical abstract Download : Download high-res image (176KB) Download : Download full-size image

Challenges in computational discovery of bioactive peptides in 'omics data

Integrated De Novo Gene Prediction and Peptide Assembly of Metagenomic Sequencing Data

Splanchnic circulatory responses to ouabain in shock.

Bioprospecting and marine 'omics': surfing the deep blue sea for novel bioactive proteins and peptides

Computational tools for exploring sequence databases as a resource for antimicrobial peptides

The Search for Peptide-Based Therapeutics Using Computational Tools

Bioinformatics tools for the study of bioactive peptides from vegetal sources: evolution and future perspectives

[Iconographic rubric. Metastatic pulmonary calcifications in a child].

CSM‐peptides: A computational approach to rapid identification of therapeutic peptides

Combining mass spectrometry and machine learning to discover bioactive peptides

Computational peptide discovery with a genetic programming approach

Classification of bioactive peptides: a systematic benchmark of models and encodings

A review of the developmental and reproductive toxicity of styrene.

Challenges and advances in genome mining of ribosomally synthesized and post-translationally modified peptides (RiPPs)

Screening of Novel Bioactive Peptides from Goat Casein: In Silico to In Vitro Validation

Improved design and screening of high bioactivity peptides for drug discovery

Computational Exploration of the Global Microbiome for Antibiotic Discovery

Profiling a Community-Specific Function Landscape for Bacterial Peptides Through Protein-Level Meta-Assembly and Machine Learning

Discovery of antimicrobial peptides in the global microbiome with machine learning

A new genome-mining tool redefines the lasso peptide biosynthetic landscape

Synthetic-bioinformatic natural product-inspired peptides