MGnify: the microbiome sequence data analysis resource in 2023

Lorna Richardson,Ben Allen,Germana Baldi,Martin Beracochea,Tony Burdett,Josephine Burgin,Juan Caballero-Pérez,Guy Cochrane,Tom Curtis,Alejandra Escobar-Zepeda,Varsha Kale,Anton Korobeynikov,Shriya Raj,Ekaterina Sakharova,Santiago Sanchez,Maxwell L Bileschi,Lucy J Colwell,Tatiana A Gurbich,Alexander B Rogers,Darren J Wilkinson,Robert D Finn
DOI: https://doi.org/10.1093/nar/gkac1080
IF: 14.9
2022-12-08
Nucleic Acids Research
Abstract:The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment. Overview of MGnify resources: the assembly and annotation of microbiome-derived sequences from a broad range of environments has given rise to new insights into microbial diversity and the functional repertoire they encode.
biochemistry & molecular biology
What problem does this paper attempt to address?