Type I Tobit Bayesian Additive Regression Trees for censored outcome regression

Eoghan O’Neill
DOI: https://doi.org/10.1007/s11222-024-10434-4
IF: 2.3241
2024-05-26
Statistics and Computing
Abstract:Censoring occurs when an outcome is unobserved beyond some threshold value. Methods that do not account for censoring produce biased predictions of the unobserved outcome. This paper introduces Type I Tobit Bayesian Additive Regression Tree (TOBART-1) models for censored outcomes. Simulation results and real data applications demonstrate that TOBART-1 produces accurate predictions of censored outcomes. TOBART-1 provides posterior intervals for the conditional expectation and other quantities of interest. The error term distribution can have a large impact on the expectation of the censored outcome. Therefore, the error is flexibly modeled as a Dirichlet process mixture of normal distributions. An R package is available at https://github.com/EoghanONeill/TobitBART.
statistics & probability,computer science, theory & methods
What problem does this paper attempt to address?
The paper introduces a new method called Type I Tobit Bayesian Additive Regression Trees (TOBART-1) for handling censored outcome regression. Censoring occurs when the true value of an outcome is not fully observed, typically because it falls below or above a certain threshold. Standard regression models can produce biased predictions when dealing with censored data. ### Key Points - **Problem Addressed:** The authors address the issue of biased predictions in censored outcome regression. They propose TOBART-1, which combines the Bayesian Type I Tobit model with Bayesian Additive Regression Trees (BART). - **Methodology:** - TOBART-1 models the latent (uncensored) outcome as a sum of trees, allowing for nonlinear relationships between predictors and the outcome. - The error term is modeled flexibly using a Dirichlet process mixture of normal distributions to accommodate non-normal error structures. - The method provides posterior intervals for various quantities of interest, such as conditional expectations and probabilities of censoring. - **Advantages Over Existing Methods:** - TOBART-1 accounts for model uncertainty and can model the error term non-parametrically. - It outperforms other methods like Tobit gradient boosted trees (Grabit), Tobit Gaussian Process models, and standard linear Tobit models. - Unlike methods that rely on cross-validation for parameter tuning, TOBART-1 does not require extensive tuning and handles uncertainty in the error term vari