Sequence Analysis as an approach to characterize variables that unfold over time: implementation and practical considerations for epidemiologists

Lucia Pacca,Krsitina Van Dang,Leah R Koenig,Catherine dP Duarte,S. Amina Gaye,Amal Harrati,Anusha M Vable
DOI: https://doi.org/10.1101/2024.06.18.24308957
2024-06-19
Abstract:Characterizing longitudinal trajectories of social exposures or health outcomes is a persistent challenge, but can be accomplished with sequence analysis, a data-driven approach that can differentiate timing, order and duration of events. We present practical guidance on implementing sequence analysis for epidemiologists with the goal of providing clear advice on decision points and tradeoffs. We introduce the three main steps of sequence analysis: (1) coding longitudinal processes as trajectories of ordered events for a set of individuals, (2) measuring dissimilarity between individual trajectories, and (3) performing cluster analysis to group similar trajectories. Each of these steps presents researchers with several decision points, such as data cleaning rules, options for evaluating sequence dissimilarity, and choices of clustering algorithms to group trajectories. After outlining each of the sequence analysis steps, we provide an applied example of sequence analysis in which we create and group transition-to-retirement trajectories from age 51-75 for a sample of 9,189 Health and Retirement Study participants using self-reported employment information, then estimate the association between transition-to-retirement groups and self-rated health. Our paper seeks to guide epidemiologists through the analytic decisions and implementation challenges of sequence analysis as this approach is increasingly implemented and undergoes methodological advances.
Epidemiology
What problem does this paper attempt to address?