Abstract:Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to quickly and accurately identify, annotate, and analyze secondary metabolite biosynthesis gene clusters in bacterial and fungal genomic sequences. Specifically, it faces the following challenges: 1. **Chemical Diversity**: Secondary metabolites have a high degree of chemical diversity, including multiple compound classes such as polyketides, non - ribosomal peptides, and terpenes. 2. **Existence of Unknown Enzymes**: Many gene clusters contain enzymes with unknown functions, increasing the difficulty of identification and annotation. 3. **Tool Dispersity**: Existing bioinformatics tools and resources are scattered, lacking a comprehensive platform to handle multiple types of secondary metabolite gene clusters. To solve these problems, the authors developed antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), which is a pipeline tool that can comprehensively cover known secondary metabolite compound classes. It can not only identify gene clusters but also compare these regions with related gene clusters in the database and integrate or cross - link all previously available secondary - metabolite - specific gene analysis methods to provide an interactive view. In this way, antiSMASH aims to accelerate the discovery process of new drug candidates, especially in areas such as antibiotics, anticancer drugs, and cholesterol - lowering drugs, thereby promoting the application of microbial secondary metabolites in the medical field. ### Formula Presentation Although this article does not involve complex mathematical formulas, it uses simplified formula representations when describing certain calculation methods. For example, when ClusterBlast compares gene clusters, the calculation formula for the similarity score \( S \) is as follows: \[ S = h + H + s + S + B \] where: - \( h \) is the number of genes in the query gene with significant matches. - \( H \) is the number of genes in the core query gene with significant matches. - \( s \) is the number of gene pairs with conserved collinearity. - \( S \) is the number of gene pairs with conserved collinearity involving core genes. - \( B \) is the core gene bonus score (if at least one core gene has a match in the target cluster, add 3 points). This formula is used to measure the similarity between two gene clusters and helps researchers quickly evaluate the functional relationships between different gene clusters. If you have more specific questions or need further explanation, please feel free to let us know!

antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences

antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline

Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation

The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes

Sequence-based classification of type II polyketide synthase biosynthetic gene clusters for antiSMASH

SYNTERUPTOR: mining genomic islands for non-classical specialised metabolite gene clusters

Synteruptor: mining genomic islands for non-classical specialized metabolite gene clusters

gutSMASH predicts specialized primary metabolic pathways from the human gut microbiota

Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences

Exploring Newer Biosynthetic Gene Clusters in Marine Microbial Prospecting

Genome Features and AntiSMASH Analysis of an Endophytic Strain Fusarium sp. R1

Computational Methods for Identification of Novel Secondary Metabolite Biosynthetic Pathways by Genome Analysis

A Scalable Platform to Discover Antimicrobials of Ribosomal Origin

Mini review: Genome mining approaches for the identification of secondary metabolite biosynthetic gene clusters in Streptomyces

Motif-independent de novo detection of secondary metabolite gene clusters—toward identification from filamentous fungi

Engineering fungal secondary metabolism: a roadmap to novel compounds.

Synthetic Biology Tools for Novel Secondary Metabolite Discovery in Streptomyces

Analysis of the Genomic Sequences and Metabolites of Bacillus velezensis YA215