Temporal Pattern Mining for Analysis of Longitudinal Clinical Data: Identifying Risk Factors for Alzheimer's Disease

Annette Spooner,Gelareh Mohammadi,Perminder S. Sachdev,Henry Brodaty,Arcot Sowmya
DOI: https://doi.org/10.48550/arXiv.2209.04793
2022-09-11
Abstract:A novel framework is proposed for handling the complex task of modelling and analysis of longitudinal, multivariate, heterogeneous clinical data. This method uses temporal abstraction to convert the data into a more appropriate form for modelling, temporal pattern mining, to discover patterns in the complex, longitudinal data and machine learning models of survival analysis to select the discovered patterns. The method is applied to a real-world study of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure. The patterns discovered were predictive of AD in survival analysis models with a Concordance index of up to 0.8. This is the first work that performs survival analysis of AD data using temporal data collections for AD. A visualisation module also provides a clear picture of the discovered patterns for ease of interpretability.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively analyze complex longitudinal clinical data to identify risk factors for Alzheimer's disease (AD). Specifically, the researchers proposed a new framework - Clinical Temporal Pattern Mining (C - TPM) - for processing and analyzing multivariate, heterogeneous longitudinal clinical data and discovering high - risk patterns related to Alzheimer's disease from it. ### The core challenges of the problem include: 1. **Data complexity**: Clinical data are usually multivariate, heterogeneous, and irregularly collected over time, resulting in sparse data points and uneven distribution. 2. **Data interpretability**: Any analysis results must be understandable and trusted by clinicians to ensure their practical application value. 3. **Rare pattern mining**: Since Alzheimer's disease is relatively rare, related patterns occur less frequently in the general population, and special methods are required to identify these rare but important patterns. ### Key elements of the solution: 1. **Temporal Abstraction**: Convert the original data into higher - level time concepts for better representation and comparison of different types of variables. 2. **Temporal Pattern Mining**: Use improved algorithms (such as C - TPM based on PrefixSpan) to mine patterns in time series, especially those with high relative risks. 3. **Survival Analysis Models**: Evaluate the predictive ability of the discovered patterns through machine - learning methods, especially in survival analysis. 4. **Visualisation Module**: Provide clear and easy - to - understand pattern displays to help clinicians understand the results. ### Application background: This method was applied to two real - world Alzheimer's disease studies - the Sydney Memory and Ageing Study (MAS) and the Older Australian Twins Study (OATS). Through these studies, C - TPM successfully discovered patterns predictive of Alzheimer's disease, with a Concordance index of up to 0.8, showing good predictive performance. ### Formula explanation: - **Relative Risk (RR)**: \[ RR=\frac{P(\text{event}|\text{exposure})}{P(\text{event}|\text{no exposure})} \] - **Odds Ratio (OR)**: \[ OR = \frac{\frac{P(\text{event}|\text{exposure})}{1 - P(\text{event}|\text{exposure})}}{\frac{P(\text{event}|\text{no exposure})}{1 - P(\text{event}|\text{no exposure})}} \] These formulas are used to evaluate the strength of the association between patterns and Alzheimer's disease diagnosis, thereby helping to identify high - risk patterns.