The tidyomics ecosystem: Enhancing omic data analyses

William J Hutchison,Timothy J Keyes,Helena L Crowell,Charlotte Soneson,Wancen Mu,Ji-Eun Park,Eric S Davis,Abdullah A Nahid,Ming Tang,Victor Yuan,Pierre-Paul Axisa,Jonathan W Kitt,Chi-Lam Poon,Noriaki Sato,Miha Kosmac,Jacques Serizay,Raphael Gottardo,Martin Morgan,Stuart Lee,Michael Lawrence,Stephanie C Hicks,Garry P Nolan,Kara L Davis,Anthony T Papenfuss,Michael I Love,Stefano Mangiola
DOI: https://doi.org/10.1101/2023.09.10.557072
2024-05-22
Abstract:The growth of omic data presents evolving challenges in data manipulation, analysis, and integration. Addressing these challenges, Bioconductor1 provides an extensive community-driven biological data analysis platform. Meanwhile, tidy R programming2 offers a revolutionary standard for data organisation and manipulation. Here, we present the tidyomics software ecosystem, bridging Bioconductor to the tidy R paradigm. This ecosystem aims to streamline omic analysis, ease learning, and encourage cross-disciplinary collaborations. We demonstrate the effectiveness of tidyomics by analysing 7.5 million peripheral blood mononuclear cells from the Human Cell Atlas3, spanning six data frameworks and ten analysis tools.
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Challenges in Data Manipulation and Analysis**: With the advancement of high-throughput technologies, the amount of data in fields such as genomics, epigenomics, transcriptomics, spatial analysis, and multi-omics has increased dramatically, bringing new challenges to data processing, exploration, analysis, integration, and interpretation. 2. **Interoperability between Existing Frameworks**: Current bioinformatics analysis frameworks (such as Bioconductor) are powerful but lack in terms of interoperability and simplification with other data science ecosystems (such as tidyverse). 3. **Lowering the Learning Curve**: By constructing a unified standard for data representation and manipulation, the learning difficulty for researchers handling diverse omics data can be reduced. 4. **Code Readability and Reusability**: Enhancing code readability and reusability allows researchers to focus more on the biological questions themselves rather than the technical details. By developing the tidyomics ecosystem, the authors hope to combine the powerful functionalities of Bioconductor with the intuitive data representation and manipulation of tidyverse, thereby achieving more efficient and user-friendly omics data analysis tools.