Abstract:The estimation of the ratio of two density probability functions is of great interest in many statistics fields, including causal inference. In this study, we develop an ensemble estimator of density ratios with a novel loss function based on super learning. We show that this novel loss function is qualified for building super learners. Two simulations corresponding to mediation analysis and longitudinal modified treatment policy in causal inference, where density ratios are nuisance parameters, are conducted to show our density ratio super learner's performance empirically.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of estimating the ratio of two probability density functions in statistics. Specifically, the author developed a density - ratio ensemble estimator based on super learning and proposed a new loss function to construct this super learner. The density ratio is an important research object in many statistical fields, such as mediation analysis in causal inference and longitudinal modified treatment policies (LMTPs). In such problems, the density ratio is usually a nuisance parameter that needs to be estimated. Through simulation experiments, the author demonstrated the performance of their proposed density - ratio super learner in these scenarios.
### Core problems of the paper
1. **Estimation of density ratio**:
- The density ratio has a wide range of applications in statistics, especially in causal inference. For example, in mediation analysis, the density ratio is used to estimate the natural direct effect (NDE) and the natural indirect effect (NIE).
- In longitudinal modified treatment policies, the density ratio is used to estimate the causal effect after intervention.
2. **Construction of super learner**:
- Super learning is a method of constructing an ensemble of machine - learning models, and the optimal model combination is determined by cross - validation.
- The author proposed a new loss function specifically for the estimation of the density ratio to improve the performance of the super learner.
3. **Simulation experiments**:
- The author verified their method through two simulation experiments. The first experiment involved mediation analysis, and the second experiment involved longitudinal modified treatment policies.
- The experimental results showed that the proposed density - ratio super learner outperformed the baseline estimators in most cases, especially when the sample size was small.
### Specific problems
- **Density - ratio parameter**:
- It is defined as \(\psi_0(x_1, x_2)=\frac{p_0(x_1|x_2,\lambda = 1)}{p_0(x_1|x_2,\lambda = 0)}\), where \(x_1\) and \(x_2\) are random variables, and \(\lambda\) is a discrete random variable.
- **Applications in causal inference**:
- **Mediation analysis**: Estimating the natural direct effect (NDE) and the natural indirect effect (NIE), where the density ratio is a key intermediate parameter.
- **Longitudinal modified treatment policies**: Estimating the causal effect after intervention, where the density ratio is also an important nuisance parameter.
- **Construction of super learner**:
- Determine the optimal model combination through cross - validation to ensure the prediction performance of the super learner on new data.
- **New loss function**:
- The proposed loss function has the form \(L(O,\psi)=-I(\lambda = 1)\log\psi(x_1,x_2)+I(\lambda = 0)\log\psi(x_1,x_2)\), which ensures that the risk is minimized when \(\psi=\psi_0\).
### Conclusion
The paper demonstrated the effectiveness of the proposed density - ratio super learner through theoretical analysis and simulation experiments. This method not only performs well in small - sample cases but also can provide reliable estimates in large - sample cases. In addition, this method can be applied to other fields where the density ratio needs to be estimated, such as the covariate shift problem in general machine learning.