Abstract:INTRODUCTION: After several decades' development, meta-analysis has become the pillar of evidence-based medicine. However, heterogeneity is still the threat to the validity and quality of such studies. Currently, Q and its descendant I(2) (I square) tests are widely used as the tools for heterogeneity evaluation. The core mission of this kind of test is to identify data sets from similar populations and exclude those are from different populations. Although Q and I(2) are used as the default tool for heterogeneity testing, the work we present here demonstrates that the robustness of these two tools is questionable.METHODS AND FINDINGS: We simulated a strictly normalized population S. The simulation successfully represents randomized control trial data sets, which fits perfectly with the theoretical distribution (experimental group: p = 0.37, control group: p = 0.88). And we randomly generate research samples Si that fits the population with tiny distributions. In short, these data sets are perfect and can be seen as completely homogeneous data from the exactly same population. If Q and I(2) are truly robust tools, the Q and I(2) testing results on our simulated data sets should not be positive. We then synthesized these trials by using fixed model. Pooled results indicated that the mean difference (MD) corresponds highly with the true values, and the 95% confidence interval (CI) is narrow. But, when the number of trials and sample size of trials enrolled in the meta-analysis are substantially increased; the Q and I(2) values also increase steadily. This result indicates that I(2) and Q are only suitable for testing heterogeneity amongst small sample size trials, and are not adoptable when the sample sizes and the number of trials increase substantially.CONCLUSIONS: Every day, meta-analysis studies which contain flawed data analysis are emerging and passed on to clinical practitioners as "updated evidence". Using this kind of evidence that contain heterogeneous data sets leads to wrong conclusion, makes chaos in clinical practice and weakens the foundation of evidence-based medicine. We suggest more strict applications of meta-analysis: it should only be applied to those synthesized trials with small sample sizes. We call upon that the tools of evidence-based medicine should keep up-to-dated with the cutting-edge technologies in data science. Clinical research data should be made available publicly when there is any relevant article published so the research community could conduct in-depth data mining, which is a better alternative for meta-analysis in many instances.

Measuring inconsistency in meta-analyses

Basics of meta‐analysis: I2 is not an absolute measure of heterogeneity

Quantifying heterogeneity in a meta‐analysis

DG Altman Measuring inconsistency in meta-analyses., 2003, 327

The Dilemma of Heterogeneity Tests in Meta-Analysis: A Challenge from a Simulation Study

Performance of Between-study Heterogeneity Measures in the Cochrane Library.

Altman DG. Measuring inconsistency in meta-analyses

Basics of meta‐analysis: I2 is not an absolute measure of heterogeneity. Research Synthesis Methods, 8 (1), 5–18

An alternative measure for quantifying the heterogeneity in meta-analysis

How to understand and report heterogeneity in a meta-analysis: The difference between I-squared and prediction intervals

Different meta-analysis methods can change judgements about imprecision of effect estimates: a meta-epidemiological study

Commentary: heterogeneity in meta-analysis should be expected and appropriately quantified

Evaluation of inconsistency in networks of interventions

Avoiding common mistakes in meta-analysis: Understanding the distinct roles of Q, I-squared, tau-squared, and the prediction interval in reporting heterogeneity

A comparison of heterogeneity variance estimators in simulated random‐effects meta‐analyses

Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice

Characteristics of a loop of evidence that affect detection and estimation of inconsistency: a simulation study

Inconsistency identification in network meta-analysis via stochastic search variable selection

Quantifying Replicability and Consistency in Systematic Reviews

Understanding variability: the role of meta-analysis of variance

GRADE guidelines: 7. Rating the quality of evidence—inconsistency