328 Data Driven Approach Identifies Hidradenitis Suppurativa Subtypes in Electronic Health Records

A. Bell,K. Babbush,A. Khan,M. Hayes,J. Connolly,F. Mentch,P. Sleiman,H. Hakonarson,E. Mukherjee,G. Hripcsak,K. Kiryluk,C. Weng,S. Cohen,L. Wheless,L. Petukhova
DOI: https://doi.org/10.1016/j.jid.2021.02.350
2021-01-01
Abstract:Hidradenitis suppurativa (HS) is a prevalent inflammatory skin disease that is associated with a high burden of comorbidities. Heterogeneity in clinical presentation coupled with variation in treatment response suggests that among people who share an HS diagnosis there exist different biological causes of disease. Obscure etiological heterogeneity creates inefficiencies in healthcare and attenuates power in clinical trials and research studies. While previous studies have attempted to identify HS subtypes on the basis of characteristics primarily related to skin lesions, our group hypothesizes that distributions of comorbidities can be used to identify medically relevant HS subtypes. We implemented machine learning algorithms to investigate comorbidity patterns using longitudinal data from the eMERGE consortium (project NT227) that contains 368,331 diagnosis codes from 668 HS participants capturing on average 16 years of observations per person. Using a tensor factorization (TF) method we identified five disease subtypes characterized by the onset of different sets of diseases prior to an initial HS diagnosis, including (1) neuropsychiatric, (2) joint, (3) metabolic, (4) cardiopulmonary, or (5) obesity and acne. We next leveraged the feature weighting scheme identified by TF to develop subtype phenotype scores (SPS) for research participants. Unsupervised clustering of participant SPS in the eMERGE cohort and in an independent cohort both indicate that most HS participants can be assigned to a single subtype. This work suggests that patterns of HS comorbidities can be used to identify disease subtypes. Future studies are aimed at determining the biological and clinical relevance of our work.
What problem does this paper attempt to address?