Abstract 4968: Discovery of cancer-associated factors in 293.5 million diagnosis records using statistical machine learning analysis

Md Ashad Alam,Nick Duesbery,Daniel Daniel Fort
DOI: https://doi.org/10.1158/1538-7445.am2024-4968
IF: 11.2
2024-03-31
Cancer Research
Abstract:Cancer, a leading global public health issue, ranks as the second leading cause of death in the United States, with projections indicating 1,958,310 new cases and 609,820 deaths in 2023. Accurately quantifying specific pre-disease diagnosis factors associated with cancer, as well as predicting cancer through disease trajectories and imaging modalities, presents a complex and formidable challenge. A comprehensive temporal trajectory analysis of cancer diseases, using disease trajectories as an insight within large cohorts in the USA, has not yet been developed. Our analysis used a comprehensive EHR dataset stretching back to 1999, encompassing 293,501,891 records and involving 3,019,978 patients within Ochsner Health, a large integrated health system across the State of Louisiana. According to our investigation, those diagnosed at the ICD chapter level with diseases of the genitourinary system (genitourinary diseases, ICD-10 Chapter 14, relative risk (RR=1.65)) or endocrine nutritional and metabolic diseases (obesity, ICD-10 Chapter 4, RR=1.35) have an elevated risk of being diagnosed with cancer in the subsequent five years. The top four individual diagnosis ICD-10 codes associated with increased risk are N60 (Benign Mammary dysplasia, RR=7.80), M34 (Systemic sclerosis, RR=7.72), R43 (Disturbances of smell and taste, RR=6.93), and R92 (Abnormal and inconclusive findings on diagnostic imaging of the breast, RR=6.59). Furthermore, in our investigation of specific types of cancer risk across 17 cancer types (Breast, Skin, Prostate, Lung, Pancreatic, etc.), we observed that the most common diseases associated with breast and skin cancer involved immune mechanisms (RR=8.35 and 8.68), genitourinary issues (RR=4.13 and 2.48), obesity (RR=3.42 , 3.35), and cardiovascular conditions (ICD-10 Chapter 9, RR=1.73 and 2.19). For prostate cancer, associations were found with obesity (RR=3.50), human immunodeficiency virus (HIV, ICD-10 Chapter 1, RR=2.39). Across lung cancer, involvement with chronic viral hepatitis (ICD-10 Chapter 1, RR=4.00), cardiovascular issues (RR=3.52), genitourinary problems (RR=4.13), and obesity (RR=3.42) were indicated. Our comprehensive analysis has the potential to contribute to the early detection, diagnosis trajectories, and improved understanding of the pathological processes underlying cancer in patients worldwide. Citation Format: Md Ashad Alam, Nick Duesbery, Daniel Daniel Fort. Discovery of cancer-associated factors in 293.5 million diagnosis records using statistical machine learning analysis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4968.
oncology
What problem does this paper attempt to address?