Abstract:Author summaryIt is well known that nonrandom selection in one-sample Mendelian Randomization (MR) can result in biased estimates and inflated type I error rates. Actually, two-sample MR analyses are more prone to be affected by nonrandom selection than one-sample MR analyses, because two samples for genome-wide association studies (GWAS) may be selected each under different mechanisms from the source population. Summary-level genetic association statistics in two-sample MR may be derived from different study designs such as case-control, case-only and cohort studies, which further inevitably affect the causal effect estimation of exposure on outcome. In this study, we firstly propose a theorem for causal effect invariance under different selection mechanisms. In the simulation, we design 49 combinations of nonrandom selection mechanisms in sample I and sample II, which are widespread in practical applications. The simulation results reveal that the selection mechanisms in sample II have a larger influence on biases and type I error rates than those in sample I. As an illustrative example, we find the nonrandom selection in sample II (coronary heart disease patients) can magnify the causal effect estimation of obesity on the HbA1c levels. Nonrandom selection in one-sample Mendelian Randomization (MR) results in biased estimates and inflated type I error rates only when the selection effects are sufficiently large. In two-sample MR, the different selection mechanisms in two samples may more seriously affect the causal effect estimation. Firstly, we propose sufficient conditions for causal effect invariance under different selection mechanisms using two-sample MR methods. In the simulation study, we consider 49 possible selection mechanisms in two-sample MR, which depend on genetic variants (G), exposures (X), outcomes (Y) and their combination. We further compare eight pleiotropy-robust methods under different selection mechanisms. Results of simulation reveal that nonrandom selection in sample II has a larger influence on biases and type I error rates than those in sample I. Furthermore, selections depending on X+Y, G+Y, or G+X+Y in sample II lead to larger biases than other selection mechanisms. Notably, when selection depends on Y, bias of causal estimation for non-zero causal effect is larger than that for null causal effect. Especially, the mode based estimate has the largest standard errors among the eight methods. In the absence of pleiotropy, selections depending on Y or G in sample II show nearly unbiased causal effect estimations when the casual effect is null. In the scenarios of balanced pleiotropy, all eight MR methods, especially MR-Egger, demonstrate large biases because the nonrandom selections result in the violation of the Instrument Strength Independent of Direct Effect (InSIDE) assumption. When directional pleiotropy exists, nonrandom selections have a severe impact on the eight MR methods. Application demonstrates that the nonrandom selection in sample II (coronary heart disease patients) can magnify the causal effect estimation of obesity on HbA1c levels. In conclusion, nonrandom selection in two-sample MR exacerbates the bias of causal effect estimation for pleiotropy-robust MR methods.

Interpretation of two-sample Mendelian randomization for binary exposures and outcome

Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates

A robust two-sample Mendelian Randomization method integrating GWAS with multi-tissue eQTL summary statistics

Mendelian randomization analysis with pleiotropy-robust log-linear model for binary outcomes

MR-LDP: a Two-Sample Mendelian Randomization for GWAS Summary Statistics Accounting for Linkage Disequilibrium and Horizontal Pleiotropy

Powerful genome-wide design and robust statistical inference in two-sample summary-data Mendelian randomization

Bayesian Mendelian Randomization Analysis for Latent Exposures Leveraging GWAS Summary Statistics for Traits Co-Regulated by the Exposures

Reciprocal causation mixture model for robust Mendelian randomization analysis using genome-scale summary data

Mendelian Randomization: Concepts and Scope

Approximation of Bias and Mean-Squared Error in Two-Sample Mendelian Randomization Analyses.

Overlapping-sample Mendelian randomisation with multiple exposures: a Bayesian approach

An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings

Mendelian randomization: genetic anchors for causal inference in epidemiological studies

Interpretation of Mendelian randomization using a single measure of an exposure that varies over time

Mendelian Randomization Analysis in Observational Epidemiology

Unveiling challenges in Mendelian randomization for gene–environment interaction

Impact of nonrandom selection mechanisms on the causal effect estimation for two-sample Mendelian randomization methods

Mendelian randomization analysis using multiple biomarkers of an underlying common exposure

Mendelian randomization analysis of a time‐varying exposure for binary disease outcomes using functional data analysis methods

MR-BOIL: Causal Inference in One-Sample Mendelian Randomization for Binary Outcome with Integrated Likelihood Method.

MR-Corr2: a Two-Sample Mendelian Randomization Method That Accounts for Correlated Horizontal Pleiotropy Using Correlated Instrumental Variants