STAREG: Statistical replicability analysis of high throughput experiments with applications to spatial transcriptomic studies

Yan Li,Xiang Zhou,Rui Chen,Xianyang Zhang,Hongyuan Cao
DOI: https://doi.org/10.1371/journal.pgen.1011423
IF: 4.5
2024-10-04
PLoS Genetics
Abstract:Replicable signals from different yet conceptually related studies provide stronger scientific evidence and more powerful inference. We introduce STAREG, a statistical method for replicability analysis of high throughput experiments, and apply it to analyze spatial transcriptomic studies. STAREG uses summary statistics from multiple studies of high throughput experiments and models the the joint distribution of p -values accounting for the heterogeneity of different studies. It effectively controls the false discovery rate (FDR) and has higher power by information borrowing. Moreover, it provides different rankings of important genes. With the EM algorithm in combination with pool-adjacent-violator-algorithm (PAVA), STAREG is scalable to datasets with millions of genes without any tuning parameters. Analyzing two pairs of spatially resolved transcriptomic datasets, we are able to make biological discoveries that otherwise cannot be obtained by using existing methods. Irreplicable research wastes time, money, and/or resources. Approximately $28 billion is estimated to be spent on preclinical research that cannot be replicated every year in the United States alone. Possible causes of irreplicable research may include experimental design, laboratory practices, and data analysis. We focus on data analysis. The past two decades have witnessed the expansion and increased availability of genomic data from high-throughput experiments. Due to privacy concerns or logistic reasons, raw data can be difficult to access but summary data such as p -values are readily available. We introduce STAREG, which jointly analyzes p -values from multiple genomic datasets that target the same scientific question with different populations or different technologies. This allows us to have more convincing and robust findings. STAREG is computationally scalable with solid statistical analysis. Moreover, it is versatile, platform-independent, and only requires p -values as input. By analyzing data sets from spatially resolved transcriptomic studies, we make biological discoveries that otherwise cannot be obtained with existing methods.
genetics & heredity
What problem does this paper attempt to address?