Selective Inference for Time-Varying Effect Moderation

Soham Bakshi,Walter Dempsey,Snigdha Panigrahi
2024-11-25
Abstract:Causal effect moderation investigates how the effect of interventions (or treatments) on outcome variables changes based on observed characteristics of individuals, known as potential effect moderators. With advances in data collection, datasets containing many observed features as potential moderators have become increasingly common. High-dimensional analyses often lack interpretability, with important moderators masked by noise, while low-dimensional, marginal analyses yield many false positives due to strong correlations with true moderators. In this paper, we propose a two-step method for selective inference on time-varying causal effect moderation that addresses the limitations of both high-dimensional and marginal analyses. Our method first selects a relatively smaller, more interpretable model to estimate a linear causal effect moderation using a Gaussian randomization approach. We then condition on the selection event to construct a pivot, enabling uniformly asymptotic semi-parametric inference in the selected model. Through simulations and real data analyses, we show that our method consistently achieves valid coverage rates, even when existing conditional methods and common sample splitting techniques fail. Moreover, our method yields shorter, bounded intervals, unlike existing methods that may produce infinitely long intervals.
Methodology,Statistics Theory,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to conduct inference of time - varying causal effect moderation in high - dimensional data while ensuring the validity and interpretability of the inference. Specifically, researchers hope to find a method that can not only screen out important moderating variables from a large number of potential moderating variables, but also avoid the problems existing in existing methods, such as noise interference in high - dimensional analysis and false - positive results in low - dimensional marginal analysis. ### Specific description of the problem 1. **Challenges in high - dimensional data analysis**: - With the support of modern data collection techniques, many data sets contain a large number of potential effect moderators. These high - dimensional data make important moderating variables easily masked by noise, resulting in difficult - to - interpret models. 2. **Limitations of low - dimensional marginal analysis**: - Although low - dimensional marginal analysis can simplify the problem, due to strong correlations, it is prone to produce many false - positive results, that is, incorrectly identifying unrelated variables as moderating variables. 3. **Deficiencies in existing methods**: - Existing conditional selective inference methods and common sample splitting techniques cannot maintain effective coverage under certain parameter values and may generate infinitely long confidence intervals, thus leading to unreliable inferences. ### The method proposed in the paper To solve the above problems, the paper proposes a two - step method for selective inference: 1. **Step 1: Select a relatively small and more interpretable model**: - Use randomized LASSO to select important moderating variables from high - dimensional data. By introducing a Gaussian randomization term, this method can retain more information during the model selection process, thereby improving the validity of the inference. 2. **Step 2: Construct selective confidence intervals**: - Construct selective confidence intervals on the selected model. By conditioning on the selection event, construct a pivot, thereby achieving consistent asymptotic semi - parametric inference. This ensures the validity and reliability of the inference in the selected model. ### Advantages of the method - **Effective coverage**: Through simulation and actual data analysis, this method can consistently achieve the required coverage and perform well even when existing conditional methods and common sample splitting techniques fail. - **Shorter confidence intervals**: Compared with existing methods, the method generates shorter and bounded confidence intervals, avoiding the generation of infinitely long intervals. - **Flexibility**: By adjusting the size of the randomization term, a trade - off can be made between model selection and inference, thereby achieving a balance between prediction accuracy and inference reliability. In conclusion, this paper aims to provide a new selective inference method to meet the challenges in high - dimensional time - varying causal effect moderation analysis and ensure the validity and interpretability of the inference.