Abstract:We construct a semiparametric estimator in case-control studies where the gene and the environment are assumed to be independent. A discrete or continuous parametric distribution of the genes is assumed in the model. A discrete distribution of the genes can be used to model the mutation or presence of certain group of genes. A continuous distribution allows the distribution of the gene effects to be in a finite-dimensional parametric family and can hence be used to model the gene expression levels. We leave the distribution of the environment totally unspecified. The estimator is derived through calculating the efficiency score function in a hypothetical setting where a close approximation to the samples is random. The resulting estimator is proved to be efficient in the hypothetical situation. The efficiency of the estimator is further demonstrated to hold in the case-control setting as well.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to construct a semi - parametrically efficient estimator in a case - control study for estimating the effects of genes and the environment on disease status. Specifically, the author considers that in the general population, the occurrence of the disease (\(D = 1\)) follows a logistic model \(\text{logit}\{P(D = 1)\}=m(G, E)\), where \(G\) represents an individual's genetic characteristics and \(E\) represents environmental factors. It is further assumed that \(G\) and \(E\) are independent, and we are interested in the effects of genes, the environment, and their interactions on disease status. Thus, the model can be expressed as \(m(g, e)=\beta_c+\beta_1g+\beta_2e+\beta_3ge\). The paper assumes that the parametric form of the gene distribution \(q(g,\beta_4)\) is known, where \(\beta_4\) is an unknown finite - dimensional parameter, while the environmental distribution \(\eta(e)\) is completely unspecified.
The main contributions of the paper are as follows:
1. **Theoretical contribution**: It is proved that the classical semi - parametric theory can be applied in case - control studies without re - deriving the theory or relying on the results of non - independently and identically distributed samples.
2. **Methodological contribution**: By calculating the semi - parametrically efficient score function and constructing a semi - parametric estimator, this estimator is proved to be efficient under the assumed conditions and also remains efficient in the case - control setting.
3. **Practical application**: Numerical examples are provided to show the performance of the proposed estimator and it is compared with existing methods. In particular, its performance in the discrete gene model is close to that of Chatterjee and Carroll (2005).
Overall, this paper aims to provide an effective statistical method for dealing with gene - environment interactions in case - control studies, thereby improving the efficiency and accuracy of estimation.