Addressing Multiple Detection Limits with Semiparametric Cumulative Probability Models

Yuqi Tian,Chun Li,Shengxin Tu,Nathan T. James,FrankE. Harrell,BryanE. Shepherd,Frank E. Harrell,Bryan E. Shepherd
DOI: https://doi.org/10.1080/01621459.2024.2315667
IF: 4.369
2024-02-13
Journal of the American Statistical Association
Abstract:Detection limits (DLs), where a variable cannot be measured outside of a certain range, are common in research. DLs may vary across study sites or over time. Most approaches to handling DLs in response variables implicitly make strong parametric assumptions on the distribution of data outside DLs. We propose a new approach to deal with multiple DLs based on a widely used ordinal regression model, the cumulative probability model (CPM). The CPM is a rank-based, semiparametric linear transformation model that can handle mixed distributions of continuous and discrete outcome variables. These features are key for analyzing data with DLs because while observations inside DLs are continuous, those outside DLs are censored and generally put into discrete categories. With a single lower DL, CPMs assign values below the DL as having the lowest rank. With multiple DLs, the CPM likelihood can be modified to appropriately distribute probability mass. We demonstrate the use of CPMs with DLs via simulations and a data example. This work is motivated by a study investigating factors associated with HIV viral load 6 months after starting antiretroviral therapy in Latin America; 56% of observations are below lower DLs that vary across study sites and over time. Supplementary materials for this article are available online including a standardized description of the materials available for reproducing the work.
statistics & probability
What problem does this paper attempt to address?