Information Recovery in a Study with Surrogate Endpoints

SX Chen,DHY Leung,J Qin
DOI: https://doi.org/10.1198/016214503000000972
IF: 4.369
2003-01-01
Journal of the American Statistical Association
Abstract:Recently, there has been a lot of interest in statistical methods for analyzing data with surrogate endpoints. In this article, we consider parameter estimation from a model that relates a variable Y to a set of covariates, X, in the presence of a surrogate, S. We assume that the data are made up of two random samples from the population, a validation set where (Y, X, S) are observed on every subject and a nonvalidation set where only (X, S) are measured. We show how information from the nonvalidation set can be incorporated to improve upon estimation of a parameter P using the validation data only. The method we suggest does not require knowledge on the joint distribution between (Y, S), given X. It is based on a two-sample empirical likelihood that simultaneously combines the estimating equations from the validation set and the nonvalidation set. The proposed nonparametric likelihood formulation brings a few attractive features to the inference in P. First, the maximum empirical likelihood estimate is more efficient than that using only the validation sample. Second, confidence regions can be readily constructed without the need to estimate the variance-covariance matrix. Finally, the coverage of the confidence regions can be further improved by an empirical Bartlett correction based on the bootstrap. We show that the method gives favorable results in simulation studies.
What problem does this paper attempt to address?