Abstract:Real-time individual endpoint prediction has always been a challenging task but of great clinic utility for both patients and healthcare providers. With 6,879 chronic kidney disease stage 4 (CKD4) patients as a use case, we explored the feasibility and performance of gated recurrent units with decay that models Weibull probability density function (GRU-D-Weibull) as a semi-parametric longitudinal model for real-time individual endpoint prediction. GRU-D-Weibull has a maximum C-index of 0.77 at 4.3 years of follow-up, compared to 0.68 achieved by competing models. The L1-loss of GRU-D-Weibull is ~66% of XGB(AFT), ~60% of MTLR, and ~30% of AFT model at CKD4 index date. The average absolute L1-loss of GRU-D-Weibull is around one year, with a minimum of 40% Parkes serious error after index date. GRU-D-Weibull is not calibrated and significantly underestimates true survival probability. Feature importance tests indicate blood pressure becomes increasingly important during follow-up, while eGFR and blood albumin are less important. Most continuous features have non-linear/parabola impact on predicted survival time, and the results are generally consistent with existing knowledge. GRU-D-Weibull as a semi-parametric temporal model shows advantages in built-in parameterization of missing, native support for asynchronously arrived measurement, capability of output both probability and point estimates at arbitrary time point for arbitrary prediction horizon, improved discrimination and point estimate accuracy after incorporating newly arrived data. Further research on its performance with more comprehensive input features, in-process or post-process calibration are warranted to benefit CKD4 or alike terminally-ill patients.

Random Bits Regression: a Strong General Predictor for Big Data

Random Bits Forest: a Strong Classifier/Regressor for Big Data

A Non-Parametric Method for Building Predictive Genetic Tests on High-Dimensional Data

A Random-effects Approach to Regression Involving Many Categorical Predictors and Their Interactions

Stock Price Prediction Based on Optimized Random Forest Model.

ON BLOCKWISE AND REFERENCE PANEL-BASED ESTIMATORS FOR GENETIC DATA PREDICTION IN HIGH DIMENSIONS

X-RIM: Extreme Recurrent Independent Mechanisms for Noise-resistant and Interpretable Stroke Risk Prediction

The relation of sulfhydryl groups in ferritin to its vasodepressor activity.

On block-wise and reference panel-based estimators for genetic data prediction in high dimensions

Bank Financial Risk Prediction Model Based on Big Data

A Framework for Enhancing Stock Investment Performance by Predicting Important Trading Points with Return-Adaptive Piecewise Linear Representation and Batch Attention Multi-Scale Convolutional Recurrent Neural Network

Random feature baselines provide distributional performance and feature selection benchmarks for clinical and 'omic machine learning

Parallel matrix factorization for binary response

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Discrimination, calibration, and point estimate accuracy of GRU-D-Weibull architecture for real-time individualized endpoint prediction

Robust and efficient subsampling algorithms for massive data logistic regression

RNN-BOF: A Multivariate Global Recurrent Neural Network for Binary Outcome Forecasting of Inpatient Aggression

Logistic Regression Bias Correction for Large Scale Data with Rare Events

An Examination of On-Line Machine Learning Approaches for Pseudo-Random Generated Data

Optimization Algorithm of Intelligent Warehouse Management System Based on Reinforcement Learning

Stable Prediction Across Unknown Environments