Abstract:Residuals in normal regression are used to assess a model's goodness-of-fit (GOF) and discover directions for improving the model. However, there is a lack of residuals with a characterized reference distribution for censored regression. In this paper, we propose to diagnose censored regression with normalized randomized survival probabilities (RSP). The key idea of RSP is to replace the survival probability of a censored failure time with a uniform random number between 0 and the survival probability of the censored time. We prove that RSPs always have the uniform distribution on $(0,1)$ under the true model with the true generating parameters. Therefore, we can transform RSPs into normally-distributed residuals with the normal quantile function. We call such residuals by normalized RSP (NRSP residuals). We conduct simulation studies to investigate the sizes and powers of statistical tests based on NRSP residuals in detecting the incorrect choice of distribution family and non-linear effect in covariates. Our simulation studies show that, although the GOF tests with NRSP residuals are not as powerful as a traditional GOF test method, a non-linear test based on NRSP residuals has significantly higher power in detecting non-linearity. We also compared these model diagnostics methods with a breast-cancer recurrent-free time dataset. The results show that the NRSP residual diagnostics successfully captures a subtle non-linear relationship in the dataset, which is not detected by the graphical diagnostics with CS residuals and existing GOF tests.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiency of regression model diagnostic methods for censored data in survival analysis. Specifically, when dealing with censored data, existing residuals (such as Cox - Snell residuals) no longer have a clear reference distribution, making it difficult to conduct effective model diagnostics. To overcome this problem, the author proposes a new method, that is, using Normalized Randomized Survival Probabilities (NRSP) to diagnose censored regression models.
### Main problems:
1. **Limitations of existing methods**: The traditional Cox - Snell residuals are no longer exponentially distributed when dealing with censored data, so it is difficult to conduct effective model diagnostics.
2. **Lack of effective diagnostic tools**: Existing model diagnostic methods perform poorly in detecting non - linear effects of the model and other model misspecifications.
### Solutions:
1. **Introducing Randomized Survival Probabilities (RSP)**: By replacing the survival probability of the censoring time with a uniform random number between 0 and the survival probability of this censoring time, RSP always follows a uniform distribution on (0, 1) under the true model.
2. **Normalizing RSP (NRSP)**: Transform RSP into residuals of a standard normal distribution through the normal quantile function, so that rich normal regression diagnostic tools can be used for model diagnostics.
### Research methods:
1. **Theoretical proof**: It is proved that RSP follows a uniform distribution on (0, 1) under the true model, and the normalized NRSP residuals follow a standard normal distribution.
2. **Simulation study**: Through simulation study, the performance of statistical tests based on NRSP residuals in detecting incorrect distribution family selection and non - linear effects is verified.
3. **Practical application**: The NRSP residual diagnostic method is applied to the breast cancer recurrence time data set, demonstrating its effectiveness in actual data.
### Main conclusions:
1. **Effectiveness of NRSP residuals**: NRSP residuals show high efficiency in detecting model misspecifications (such as incorrect distribution family selection and non - linear effects).
2. **Effect in practical application**: In the breast cancer recurrence time data set, NRSP residuals successfully capture the non - linear relationships in the data, while traditional diagnostic methods fail to detect this.
### Formula presentation:
- **Randomized Survival Probabilities (RSP)**:
\[
S_R(T_i, d_i, U_i)=\begin{cases}
S_i(T_i) & \text{if } T_i \text{ is uncensored, i.e., } d_i = 1\\
U_iS_i(T_i) & \text{if } T_i \text{ is censored, i.e., } d_i = 0
\end{cases}
\]
where \( U_i \) is a uniform random number on (0, 1), and \( S_i(T_i) \) is the assumed survival function given covariates \( x_i \).
- **Normalized RSP residuals (NRSP)**:
\[
r_{\text{NRSP}}(T_i, d_i, U_i)=\Phi^{-1}(S_R(T_i, d_i, U_i))
\]
where \( \Phi^{-1} \) is the inverse cumulative distribution function of the standard normal distribution.
Through these methods, the author provides a new and effective tool to diagnose censored regression models, especially performing excellently in detecting model misspecifications.