Clustering of timed sequences -- Application to the analysis of care pathways

Thomas Guyet,Pierre Pinson,Enoal Gesny
2024-10-18
Abstract:Improving the future of healthcare starts by better understanding the current actual practices in hospital settings. This motivates the objective of discovering typical care pathways from patient data. Revealing typical care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms. In this article, we adapt two methods developed for time series to the clustering of timed sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences. This approach is experimented with and evaluated on synthetic and real-world data.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively cluster sequences of events with timestamps in healthcare path analysis. Specifically, the author focuses on how to define a semantically appropriate metric and the corresponding clustering algorithm to reveal typical healthcare paths. This involves two main challenges: 1. **Defining an appropriate metric**: Since healthcare paths are composed of events with timestamps, these events vary greatly in time and symbol, and the number of events may also be different for different patients. Therefore, the traditional vector - space representation method is not applicable. This requires the development of a metric that can consider both the temporal order and time interval of events. 2. **Constructing an average time series**: In order to use these metrics in the clustering algorithm, it is necessary to be able to calculate the average of a set of time series. This not only helps the clustering algorithm to run, but also provides a representative sequence for interpreting the clustering results. To solve these problems, the author proposes the following methods: - **Adapting the drop - DTW metric**: The dynamic time warping (DTW) method is extended, and the drop - DTW metric is introduced, which allows certain meaningless events to be ignored during the alignment process. This method can better handle the temporal order and time interval of events. - **The average time series algorithm based on drop - DTW**: An algorithm similar to the DBA method is proposed to calculate the average of a set of time series. This algorithm can calculate a representative average time series while maintaining the temporal order of events. Through these methods, the author hopes to more accurately identify and understand the typical patterns in healthcare paths, thereby improving future healthcare practices.