Abstract:ABSTRACT Fungal secondary metabolites (SMs) contribute to the diversity of fungal ecological communities, niches, and lifestyles. Many fungal SMs have one or more medically and industrially important activities (e.g., antifungal, antibacterial, and antitumor). The genes necessary for fungal SM biosynthesis are typically located right next to each other in the genome and are known as biosynthetic gene clusters (BGCs). However, whether fungal SM bioactivity can be predicted from specific attributes of genes in BGCs remains an open question. We adapted machine learning models that predicted SM bioactivity from bacterial BGC data with accuracies as high as 80% to fungal BGC data. We trained our models to predict the antibacterial, antifungal, and cytotoxic/antitumor bioactivity of fungal SMs on two data sets: (i) fungal BGCs (data set comprised of 314 BGCs) and (ii) fungal (314 BGCs) and bacterial BGCs (1,003 BGCs). We found that models trained on fungal BGCs had balanced accuracies between 51% and 68%, whereas training on bacterial and fungal BGCs had balanced accuracies between 56% and 68%. The low prediction accuracy of fungal SM bioactivities likely stems from the small size of the data set; this lack of data, coupled with our finding that including bacterial BGC data in the training data did not substantially change accuracies currently limits the application of machine learning approaches to fungal SM studies. With >15,000 characterized fungal SMs, millions of putative BGCs in fungal genomes, and increased demand for novel drugs, efforts that systematically link fungal SM bioactivity to BGCs are urgently needed. IMPORTANCE Fungi are key sources of natural products and iconic drugs, including penicillin and statins. DNA sequencing has revealed that there are likely millions of biosynthetic pathways in fungal genomes, but the chemical structures and bioactivities of >99% of natural products produced by these pathways remain unknown. We used artificial intelligence to predict the bioactivities of diverse fungal biosynthetic pathways. We found that the accuracies of our predictions were generally low, between 51% and 68%, likely because the natural products and bioactivities of only very few fungal pathways are known. With >15,000 characterized fungal natural products, millions of putative biosynthetic pathways present in fungal genomes, and increased demand for novel drugs, our study suggests that there is an urgent need for efforts that systematically identify fungal biosynthetic pathways, their natural products, and their bioactivities.

A deep learning genome-mining strategy for biosynthetic gene cluster prediction

Deep-BGCpred: A unified deep learning genome-mining framework for biosynthetic gene cluster prediction

Predicting biological activity from biosynthetic gene clusters using neural networks

CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products

A Natural Product Chemist's Guide to Unlocking Silent Biosynthetic Gene Clusters

Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

Deep learning-driven prediction of drug mechanism of action from large-scale chemical-genetic interaction profiles

Bioprospecting Through Cloning of Whole Natural Product Biosynthetic Gene Clusters

Navigating and expanding the roadmap of natural product genome mining tools

Supporting supervised learning in fungal Biosynthetic Gene Cluster discovery: new benchmark datasets

Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria

Discovery of microbial natural products by activation of silent biosynthetic gene clusters

Targeting Bacterial Genomes for Natural Product Discovery

Predicting fungal secondary metabolite activity from biosynthetic gene cluster data using machine learning

Prediction of gene cluster function based on transcriptional regulatory networks uncovers a novel locus required for desferrioxamine B biosynthesis

Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

Expanding the genome information on for biosynthetic gene cluster discovery

Genome Mining Reveals Novel Biosynthetic Gene Clusters in Entomopathogenic Bacteria

An atlas of bacterial secondary metabolite biosynthesis gene clusters

HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes