m6AConquer: a Data Resource for Unified Quantification and Integration of m6A Detection Techniques

Xichen Zhao,Haokai Ye,Tenglong Li,Daniel J Rigden,Zhen Wei
DOI: https://doi.org/10.1101/2024.09.10.612173
2024-09-14
Abstract:N6-methyladenosine (m6A) is the most prevalent RNA modification in mammalian cells and the most extensively studied epitranscriptomic mark. More than 10 m6A detection techniques have been proposed to measure m6A stoichiometry at either the site or peak level. However, these detection techniques are processed through heterogeneous pipelines, using different computational filters and reference features, leading to difficulties in fully harnessing data integration and analysis across orthogonal m6A detection techniques. Our m6AConquer (Consistent Quantification of External m6A RNA Modification Data) tackles this challenge by establishing a consistent multi-omics data-sharing standard, summarizing quantitative m6A data from 10 detection techniques using a unified reference feature set. Furthermore, we standardize site calling and m6A count matrix normalization procedures across platforms through a computational framework that accounts for over-dispersion in m6A levels. Available m6A detection techniques can be categorized into four types: antibody-assisted, chemical-assisted, enzyme-assisted, and direct-RNA sequencing. We leverage this categorization to develop a reproducibility-based integration framework that enables the reliable detection of high-confidence m6A sites confirmed across orthogonal techniques. Empirical evaluations report that both the site-calling and the integration framework outperform common alternatives, enhancing biological relevance. We apply interpretable machine learning models on our integrated high-confidence sites, and the results consistently identify proximity to intron-exon junctions as the driving predictor of m6A site coordinates across different techniques, demonstrating the high quality of the data curated in m6AConquer. m6AConquer webserver is freely accessible under: http://rnamd.org/m6aconquer.
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the current data integration challenge among different m6A detection techniques. Specifically: 1. **Heterogeneous data pipelines**: Currently, more than 10 m6A detection techniques are used to measure the modification state of m6A. However, these techniques are processed through different computational filters and reference features, resulting in data that is difficult to integrate and analyze. 2. **Lack of data standardization**: Existing m6A databases (such as m6A - Atlas2, MeT - DB2, RMBase2, and DirectRMD) lack a consistent and efficient data - sharing framework. Each database uses different statistical assumptions and filtering criteria, leading to inconsistent site features. 3. **Insufficient absolute quantitative data**: In existing databases, very little absolute quantitative m6A data, such as the data provided by GLORI and eTAM - seq, is collected, which limits the precise analysis of the m6A modification state. To address these challenges, the authors developed a new database - m6AConquer, aiming to solve these problems in the following ways: - **Establish a unified data - sharing standard**: m6AConquer aggregates quantitative m6A data from 10 detection techniques by using a unified set of reference features to ensure data consistency. - **Standardize site calling and m6A count matrix normalization**: Through a computational framework, m6AConquer standardizes the site - calling and m6A count matrix normalization processes across different platforms, taking into account the over - dispersion of m6A levels. - **A data - integration framework based on reproducibility**: Using the irreproducible discovery rate (IDR) method, m6AConquer can reliably detect high - confidence m6A sites confirmed across orthogonal techniques, enhancing biological relevance. Through these methods, m6AConquer aims to provide an efficient and consistent multi - omics data - sharing platform to promote the in - depth development of m6A modification research.