Random Survival Forest for Censored Functional Data

Elvira Romano,Giuseppe Loffredo,Fabrizio Maturo
DOI: https://doi.org/10.48550/arXiv.2407.15340
2024-07-22
Abstract:This paper introduces a Random Survival Forest (RSF) method for functional data. The focus is specifically on defining a new functional data structure, the Censored Functional Data (CFD), for dealing with temporal observations that are censored due to study limitations or incomplete data collection. This approach allows for precise modelling of functional survival trajectories, leading to improved interpretation and prediction of survival dynamics across different groups. A medical survival study on the benchmark SOFA data set is presented. Results show good performance of the proposed approach, particularly in ranking the importance of predicting variables, as captured through dynamic changes in SOFA scores and patient mortality rates.
Methodology,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to handle functional data with time - censoring in survival analysis. Specifically, the author proposes a new method - Random Survival Forest (RSF), which is specifically designed to deal with the time - censoring problem in functional data (Censored Functional Data, CFD). This method aims to improve the ability to interpret and predict survival dynamics among different groups by accurately modeling functional survival trajectories. ### Main problems: 1. **Handling of time - censored data**: In many clinical studies, due to research limitations or incomplete data collection, time - observation data may be censored. Traditional survival analysis methods are difficult to effectively handle such data, especially when the time - varying process needs to be considered. 2. **Combination of functional data analysis and survival analysis**: How to apply functional data analysis (Functional Data Analysis, FDA) techniques to survival analysis to capture dynamic patterns in time - varying data and improve the accuracy of survival prediction. ### Solutions: 1. **Defining Censored Functional Data (CFD)**: The author introduces the concept of Censored Functional Data, focusing on data reconstruction within the actual observation period rather than extending the curve throughout the follow - up period. This method ensures that the available information is maximally utilized during the study period, avoiding over - extension or interpolation. 2. **Using Functional Principal Component Analysis (FPCA)**: Extract the principal components of functional data through FPCA to capture relationships and dynamic patterns in the data, thereby achieving more accurate and interpretable survival prediction. 3. **Constructing Functional Random Survival Forest (FRSF)**: Based on the features extracted by FPCA, construct a random survival forest to handle irregular functional data, especially in the presence of censored data. FRSF reduces the correlation between trees by randomly selecting subsets of observations and feature subsets at each split node, improving the generalization performance of the model. ### Application example: The paper uses a well - known "SOFA" dataset to evaluate the performance of the proposed method. The results show that this method performs excellently in ranking the importance of predictor variables and can accurately predict survival dynamics through the dynamically changing SOFA scores and patient mortality. ### Key contributions: - **Innovative data structure**: Proposes a new functional data structure, Censored Functional Data (CFD). - **Advanced analysis method**: Combines functional data analysis and random survival forest, providing a new method for handling complex time - censored data. - **Practical application verification**: Verifies the effectiveness and practicality of the method through application on a real - world dataset. In conclusion, this paper aims to solve the modeling and prediction problems of time - censored functional data in survival analysis by introducing new data structures and analysis methods, providing a powerful tool for clinical research.