Communication Deficits due to Alzheimer’s Disease: A Multimodal Vision-and-Language Analysis (Preprint)

Ziming Liu,Parker Collier,Eun Jin Paek,Si On Yoon,Devin Casenhiser,Wenjun Zhou,Skylar Simpson,Xiaopeng Zhao
DOI: https://doi.org/10.2196/preprints.40904
2022-01-01
Abstract:BACKGROUND Referential communication refers to one’s capacity to successfully describe a target object or an idea to a conversational partner. Previous research has demonstrated that referential communication tasks (RCTs) combined with natural language processing techniques can achieve superior performance of detecting communication deficits in people with Alzheimer’s Disease (AD). Therefore, a deep understanding of attributes for the enhanced performance on AD classification is crucial. OBJECTIVE In this study, we aim to apply machine learning to quantify the relevance of each descriptive expression produced by participants to the corresponding images and test if this relevance score is different between people with AD and cognitively healthy older adults. METHODS We used the CLIP model to calculate the semantic association between the participants’ transcripts and the corresponding images being described. Statistical analyses were then conducted to examine the differences between people with AD and cognitively healthy older adults as well as their communication performances in different experimental conditions. RESULTS The analysis results are significantly different between the two groups. Moreover, the results vary significantly across different experimental conditions in the cognitively healthy group, but not in the AD group. CONCLUSIONS This paper is the first study on multimodal vision-language analysis of RCTs using CLIP. The study reveals the feasibility of applying multimodal vision-and-language analysis to differentiate the semantic representation performance between cognitively healthy older adults and people with AD. The difference may be related to communication deficits in vision-language association. Further research is needed to evaluate the potential of using CLIP for automatic dementia screening using interactive image-based description tasks.
What problem does this paper attempt to address?