Leveraging Social Determinants of Health in Alzheimer's Research Using LLM-Augmented Literature Mining and Knowledge Graphs

Tianqi Shang,Shu Yang,Weiqing He,Tianhua Zhai,Dawei Li,Bojian Hou,Tianlong Chen,Jason H. Moore,Marylyn D. Ritchie,Li Shen
2024-10-05
Abstract:Growing evidence suggests that social determinants of health (SDoH), a set of nonmedical factors, affect individuals' risks of developing Alzheimer's disease (AD) and related dementias. Nevertheless, the etiological mechanisms underlying such relationships remain largely unclear, mainly due to difficulties in collecting relevant information. This study presents a novel, automated framework that leverages recent advancements of large language model (LLM) and natural language processing techniques to mine SDoH knowledge from extensive literature and integrate it with AD-related biological entities extracted from the general-purpose knowledge graph PrimeKG. Utilizing graph neural networks, we performed link prediction tasks to evaluate the resultant SDoH-augmented knowledge graph. Our framework shows promise for enhancing knowledge discovery in AD and can be generalized to other SDoH-related research areas, offering a new tool for exploring the impact of social determinants on health outcomes. Our code is available at: <a class="link-external link-https" href="https://github.com/hwq0726/SDoHenPKG" rel="external noopener nofollow">this https URL</a>
Artificial Intelligence,Computation and Language,Computers and Society,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is: how social determinants of health (SDoH) influence the risk of Alzheimer's disease (AD) and related dementias, and the potential mechanisms between these factors and the biological processes of AD. Although existing research indicates that SDoH significantly impact AD risk, the specific mechanisms remain unclear, primarily due to the difficulty in collecting relevant data. To this end, this study proposes an automated framework based on large language models (LLM) and natural language processing techniques to mine SDoH knowledge from a vast amount of literature and integrate it with AD-related biological entities into a knowledge graph to enhance AD research and knowledge discovery. Specifically, this study aims to: 1. **Develop an automated pipeline**: Utilize pre-trained large language models (such as GPT-4) and advanced natural language processing techniques to extract SDoH information from a large body of literature. 2. **Construct an SDoH knowledge graph**: Integrate the extracted SDoH knowledge with existing biomedical knowledge graphs (such as PrimeKG) to form an enhanced social determinants of health knowledge graph. 3. **Evaluate the utility of the knowledge graph**: Use graph neural networks (GNN) for link prediction tasks to assess the application effectiveness of the SDoH-enhanced knowledge graph in AD research, particularly its potential in discovering new connections and relationships. Through these methods, this study hopes to provide new tools and perspectives for the etiological research of AD, especially in exploring the impact of social determinants on health outcomes.