Updates to the Alliance of Genome Resources Central Infrastructure

Suzanne A Aleksander,Anna V Anagnostopoulos,Giulia Antonazzo,Valerio Arnaboldi,Helen Attrill,Andrés Becerra,Susan M Bello,Olin Blodgett,Yvonne M Bradford,Carol J Bult,Scott Cain,Brian R Calvi,Seth Carbon,Juancarlos Chan,Wen J Chen,J Michael Cherry,Jaehyoung Cho,Madeline A Crosby,Jeffrey L De Pons,Peter D’Eustachio,Stavros Diamantakis,Mary E Dolan,Gilberto dos Santos,Sarah Dyer,Dustin Ebert,Stacia R Engel,David Fashena,Malcolm Fisher,Saoirse Foley,Adam C Gibson,Varun R Gollapally,L Sian Gramates,Christian A Grove,Paul Hale,Todd Harris,G Thomas Hayman,Yanhui Hu,Christina James-Zorn,Kamran Karimi,Kalpana Karra,Ranjana Kishore,Anne E Kwitek,Stanley J F Laulederkind,Raymond Lee,Ian Longden,Manuel Luypaert,Nicholas Markarian,Steven J Marygold,Beverley Matthews,Monica S McAndrews,Gillian Millburn,Stuart Miyasato,Howie Motenko,Sierra Moxon,Hans-Michael Muller,Christopher J Mungall,Anushya Muruganujan,Tremayne Mushayahama,Robert S Nash,Paulo Nuin,Holly Paddock,Troy Pells,Norbert Perrimon,Christian Pich,Mark Quinton-Tulloch,Daniela Raciti,Sridhar Ramachandran,Joel E Richardson,Susan Russo Gelbart,Leyla Ruzicka,Gary Schindelman,David R Shaw,Gavin Sherlock,Ajay Shrivatsav,Amy Singer,Constance M Smith,Cynthia L Smith,Jennifer R Smith,Lincoln Stein,Paul W Sternberg,Christopher J Tabone,Paul D Thomas,Ketaki Thorat,Jyothi Thota,Monika Tomczuk,Vitor Trovisco,Marek A Tutaj,Jose-Maria Urbano,Kimberly Van Auken,Ceri E Van Slyke,Peter D Vize,Qinghua Wang,Shuai Weng,Monte Westerfield,Laurens G Wilming,Edith D Wong,Adam Wright,Karen Yook,Pinglei Zhou,Aaron Zorn,Mark Zytkovicz,Alliance of Genome Resources Consortium,Peter D'Eustachio
DOI: https://doi.org/10.1093/genetics/iyae049
IF: 4.402
2024-03-30
Genetics
Abstract:The Alliance of Genome Resources (Alliance) is an extensible coalition of knowledgebases focused on the genetics and genomics of intensively-studied model organisms. The Alliance is organized as individual knowledge centers with strong connections to their research communities and a centralized software infrastructure, discussed here. Model organisms currently represented in the Alliance are budding yeast, C. elegans, Drosophila, zebrafish, frog, laboratory mouse, laboratory rat, and the Gene Ontology Consortium. The project is in a rapid development phase to harmonize knowledge, store it, analyze it, and present it to the community through a web portal, direct downloads, and Application Programming Interfaces (APIs). Here we focus on developments over the last two years. Specifically, we added and enhanced tools for browsing the genome (JBrowse), downloading sequences, mining complex data (AllianceMine), visualizing pathways, full-text searching of the literature (Textpresso), and sequence similarity searching (SequenceServer). We enhanced existing interactive data tables and added an interactive table of paralogs to complement our representation of orthology. To support individual model organism communities, we implemented species-specific "landing pages" and will add disease-specific portals soon; in addition, we support a common community forum implemented in Discourse software. We describe our progress towards a central persistent database to support curation, the data modeling that underpins harmonization, and progress towards a state-of-the art literature curation system with integrated Artificial Intelligence and Machine Learning (AI/ML).
genetics & heredity
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the following aspects: 1. **Data integration and standardization**: The paper describes how the Alliance of Genome Resources (Alliance) project integrates data from different model organism databases through a centralized software infrastructure to achieve data standardization and unification. This includes various types of data such as genes, diseases, Gene Ontology (GO), orthology, expression, alleles, variants, and genomic sequences in FASTA format. 2. **Improving the efficiency of data access and analysis**: In order to improve the efficiency of researchers' access to and analysis of these data, the Alliance has developed a series of tools and functions, such as: - **JBrowse**: A tool for browsing genomes, which supports the download of DNA and amino acid sequences. - **AllianceMine**: An advanced search and retrieval tool that allows complex queries across species. - **SimpleMine**: A simplified search tool suitable for biologists without programming skills. - **Pathway Displays**: Tools for displaying metabolic pathways and signal transduction pathways, including GO Causal Activity Models (GO - CAMs) and Reactome pathway models. 3. **Enhancing data visualization and interactivity**: The paper introduces the newly added functions in the Alliance's gene pages, such as the **Paralogy** section, which shows the homologous relationships of genes; the **Sequence Detail Widget**, which provides detailed information on gene sequences; and the **Model Organism BLAST**, which optimizes the user experience of sequence alignment tools. 4. **Supporting the addition of new knowledge bases**: The paper specifically mentions the addition of Xenbase (the Xenopus laevis knowledge base) and discusses the challenges and technical solutions encountered in the process of adding a new knowledge base, such as data upload, gene homology, and the handling of polyploid genes. 5. **Continuous improvement and expansion**: The Alliance is committed to continuously improving its data model and infrastructure to support more data types and functions. For example, the paper mentions that a more efficient literature annotation system is being developed, integrating artificial intelligence and machine - learning technologies to improve the accuracy and integrity of data. In conclusion, this paper aims to solve the problem of data silos among model organism databases by constructing a unified, modular, and extensible platform, improve data availability and research efficiency, and thus support research in comparative genomics and human biology, health, and disease.