Effective Learning of Probabilistic Models for Clinical Predictions from Longitudinal Data

Shuo Yang
DOI: https://doi.org/10.48550/arXiv.1811.00749
2018-11-02
Abstract:With the expeditious advancement of information technologies, health-related data presented unprecedented potentials for medical and health discoveries but at the same time significant challenges for machine learning techniques both in terms of size and complexity. Those challenges include: the structured data with various storage formats and value types caused by heterogeneous data sources; the uncertainty widely existing in every aspect of medical diagnosis and treatments; the high dimensionality of the feature space; the longitudinal medical records data with irregular intervals between adjacent observations; the richness of relations existing among objects with similar genetic factors, location or socio-demographic background. This thesis aims to develop advanced Statistical Relational Learning approaches in order to effectively exploit such health-related data and facilitate the discoveries in medical research. It presents the work on cost-sensitive statistical relational learning for mining structured imbalanced data, the first continuous-time probabilistic logic model for predicting sequential events from longitudinal structured data as well as hybrid probabilistic relational models for learning from heterogeneous structured data. It also demonstrates the outstanding performance of these proposed models as well as other state of the art machine learning models when applied to medical research problems and other real-world large-scale systems, reveals the great potential of statistical relational learning for exploring the structured health-related data to facilitate medical research.
Machine Learning
What problem does this paper attempt to address?