Computational Strategies in Nutrigenetics: Constructing a Reference Dataset of Nutrition-Associated Genetic Polymorphisms

Giovanni Maria De Filippis,Maria Monticelli,Alessandra Pollice,Tiziana Angrisano,Bruno Hay Mele,Viola Calabro`
DOI: https://doi.org/10.1101/2023.08.04.23293659
2024-06-21
Abstract:Objective: This study aimed to build a comprehensive dataset of human genetic polymorphisms associated with nutrition by integrating data from multiple sources, including the LitVar database, PubMed, and the GWAS catalog. Such a resource could facilitate the exploration of genetic polymorphisms associated with nutrition-related traits. Methods: We developed a Python pipeline to streamline the integration and analysis of genetic polymorphism data associated with nutrition. We employed the MeSH ontology as a framework to aggregate relevant genetic data. The pipeline comprises five distinct modules that go through the following steps: data extraction from LitVar and PubMed articles, generation of a joint dataset by data merging, generation of comprehensive MeSH term lists, filtering of the joint dataset using the selected MeSH sets, lexical analysis and augmentation of the dataset with data from of the GWAS catalog dataset. Results: We successfully aggregated a wide range of papers and data on genetic polymorphism and nutrition-related traits into a single dataset. Cross-referencing with the GWAS catalog dataset provided information about possible effects or risk alleles associated with the identified genetic polymorphisms. The nutrigenetic dataset we developed is a tool for nutritionists and researchers, serving as a preliminary benchmark for personalized nutrition interventions based on genetic testing. Conclusion: The pipeline presented here consolidates and organizes information on genetic polymorphisms associated with nutrition, enabling comprehensive analysis and exploration of gene-diet interactions. Overall, the method contributes to advancing personalized nutrition interventions and nutrigenomics research. The flexible nature of the system allows its application to other investigations related to genetic polymorphisms.
What problem does this paper attempt to address?