antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences

Marnix H. Medema,Kai Blin,Peter Cimermancic,Victor de Jager,Piotr Zakrzewski,Michael A. Fischbach,Tilmann Weber,Eriko Takano,Rainer Breitling
DOI: https://doi.org/10.1093/nar/gkr466
IF: 14.9
2011-06-14
Nucleic Acids Research
Abstract:Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.
biochemistry & molecular biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to quickly and accurately identify, annotate, and analyze secondary metabolite biosynthesis gene clusters in bacterial and fungal genomic sequences. Specifically, it faces the following challenges: 1. **Chemical Diversity**: Secondary metabolites have a high degree of chemical diversity, including multiple compound classes such as polyketides, non - ribosomal peptides, and terpenes. 2. **Existence of Unknown Enzymes**: Many gene clusters contain enzymes with unknown functions, increasing the difficulty of identification and annotation. 3. **Tool Dispersity**: Existing bioinformatics tools and resources are scattered, lacking a comprehensive platform to handle multiple types of secondary metabolite gene clusters. To solve these problems, the authors developed antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), which is a pipeline tool that can comprehensively cover known secondary metabolite compound classes. It can not only identify gene clusters but also compare these regions with related gene clusters in the database and integrate or cross - link all previously available secondary - metabolite - specific gene analysis methods to provide an interactive view. In this way, antiSMASH aims to accelerate the discovery process of new drug candidates, especially in areas such as antibiotics, anticancer drugs, and cholesterol - lowering drugs, thereby promoting the application of microbial secondary metabolites in the medical field. ### Formula Presentation Although this article does not involve complex mathematical formulas, it uses simplified formula representations when describing certain calculation methods. For example, when ClusterBlast compares gene clusters, the calculation formula for the similarity score \( S \) is as follows: \[ S = h + H + s + S + B \] where: - \( h \) is the number of genes in the query gene with significant matches. - \( H \) is the number of genes in the core query gene with significant matches. - \( s \) is the number of gene pairs with conserved collinearity. - \( S \) is the number of gene pairs with conserved collinearity involving core genes. - \( B \) is the core gene bonus score (if at least one core gene has a match in the target cluster, add 3 points). This formula is used to measure the similarity between two gene clusters and helps researchers quickly evaluate the functional relationships between different gene clusters. If you have more specific questions or need further explanation, please feel free to let us know!