Abstract:Causal effect moderation investigates how the effect of interventions (or treatments) on outcome variables changes based on observed characteristics of individuals, known as potential effect moderators. With advances in data collection, datasets containing many observed features as potential moderators have become increasingly common. High-dimensional analyses often lack interpretability, with important moderators masked by noise, while low-dimensional, marginal analyses yield many false positives due to strong correlations with true moderators. In this paper, we propose a two-step method for selective inference on time-varying causal effect moderation that addresses the limitations of both high-dimensional and marginal analyses. Our method first selects a relatively smaller, more interpretable model to estimate a linear causal effect moderation using a Gaussian randomization approach. We then condition on the selection event to construct a pivot, enabling uniformly asymptotic semi-parametric inference in the selected model. Through simulations and real data analyses, we show that our method consistently achieves valid coverage rates, even when existing conditional methods and common sample splitting techniques fail. Moreover, our method yields shorter, bounded intervals, unlike existing methods that may produce infinitely long intervals.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to conduct inference of time - varying causal effect moderation in high - dimensional data while ensuring the validity and interpretability of the inference. Specifically, researchers hope to find a method that can not only screen out important moderating variables from a large number of potential moderating variables, but also avoid the problems existing in existing methods, such as noise interference in high - dimensional analysis and false - positive results in low - dimensional marginal analysis. ### Specific description of the problem 1. **Challenges in high - dimensional data analysis**: - With the support of modern data collection techniques, many data sets contain a large number of potential effect moderators. These high - dimensional data make important moderating variables easily masked by noise, resulting in difficult - to - interpret models. 2. **Limitations of low - dimensional marginal analysis**: - Although low - dimensional marginal analysis can simplify the problem, due to strong correlations, it is prone to produce many false - positive results, that is, incorrectly identifying unrelated variables as moderating variables. 3. **Deficiencies in existing methods**: - Existing conditional selective inference methods and common sample splitting techniques cannot maintain effective coverage under certain parameter values and may generate infinitely long confidence intervals, thus leading to unreliable inferences. ### The method proposed in the paper To solve the above problems, the paper proposes a two - step method for selective inference: 1. **Step 1: Select a relatively small and more interpretable model**: - Use randomized LASSO to select important moderating variables from high - dimensional data. By introducing a Gaussian randomization term, this method can retain more information during the model selection process, thereby improving the validity of the inference. 2. **Step 2: Construct selective confidence intervals**: - Construct selective confidence intervals on the selected model. By conditioning on the selection event, construct a pivot, thereby achieving consistent asymptotic semi - parametric inference. This ensures the validity and reliability of the inference in the selected model. ### Advantages of the method - **Effective coverage**: Through simulation and actual data analysis, this method can consistently achieve the required coverage and perform well even when existing conditional methods and common sample splitting techniques fail. - **Shorter confidence intervals**: Compared with existing methods, the method generates shorter and bounded confidence intervals, avoiding the generation of infinitely long intervals. - **Flexibility**: By adjusting the size of the randomization term, a trade - off can be made between model selection and inference, thereby achieving a balance between prediction accuracy and inference reliability. In conclusion, this paper aims to provide a new selective inference method to meet the challenges in high - dimensional time - varying causal effect moderation analysis and ensure the validity and interpretability of the inference.

Selective Inference for Time-Varying Effect Moderation

Estimating Causal Moderation Effects with Randomized Treatments and Non-Randomized Moderators

Estimation and inference for the mediation effect in a time-varying mediation model

CAUSAL INFERENCE FOR CONTINUOUS-TIME PROCESSES WHEN COVARIATES ARE OBSERVED ONLY AT DISCRETE TIMES

Estimating Time-Varying Direct and Indirect Causal Excursion Effects with Longitudinal Binary Outcomes

Causal Inference using Multivariate Generalized Linear Mixed-Effects Models with Longitudinal Data

11. Identification and Estimation of Causal Effects with Time-Varying Treatments and Time-Varying Outcomes

Causal Inference with High-dimensional Discrete Covariates

Efficient and flexible causal mediation with time-varying mediators, treatments, and confounders

Causal Inference for a Hidden Treatment

Estimating Time‐Varying Exposure Effects Through Continuous‐Time Modelling in Mendelian Randomization

Causal Inference With Two Versions of Treatment

Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects

Assessing Time-Varying Causal Effect Moderation in Mobile Health

Inference for Individual Mediation Effects and Interventional Effects in Sparse High-Dimensional Causal Graphical Models

Estimating Causal Effects With Partial Covariates For Clinical Interpretability

Causal Effects for Time-Varying Treatments and Outcomes 2 Composite Causal Effects for Time-Varying Treatments and Time-Varying Outcomes

A New Covariate Selection Strategy for High Dimensional Data in Causal Effect Estimation with Multivariate Treatments

Retrospective causal inference with multiple effect variables

Through the lens of causal inference: Decisions and pitfalls of covariate selection

Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters