The SPEAK study rationale and design: A linguistic corpus-based approach to understanding thought disorder
J M M Bayer,J Spark,M Krcmar,M Formica,K Gwyther,A Srivastava,A Selloni,M Cotter,J Hartmann,A Polari,Z R Bilgrami,C Sarac,A Lu,Alison R Yung,A McGowan,P McGorry,J L Shah,G A Cecchi,R Mizrahi,B Nelson,C M Corcoran
DOI: https://doi.org/10.1016/j.schres.2022.12.048
Abstract:Aim: Psychotic symptoms are typically measured using clinical ratings, but more objective and sensitive metrics are needed. Hence, we will assess thought disorder using the Research Domain Criteria (RDoC) heuristic for language production, and its recommended paradigm of "linguistic corpus-based analyses of language output". Positive thought disorder (e.g., tangentiality and derailment) can be assessed using word-embedding approaches that assess semantic coherence, whereas negative thought disorder (e.g., concreteness, poverty of speech) can be assessed using part-of-speech (POS) tagging to assess syntactic complexity. We aim to establish convergent validity of automated linguistic metrics with clinical ratings, assess normative demographic variance, determine cognitive and functional correlates, and replicate their predictive power for psychosis transition among at-risk youths. Methods: This study will assess language production in 450 English-speaking individuals in Australia and Canada, who have recent onset psychosis, are at clinical high risk (CHR) for psychosis, or who are healthy volunteers, all well-characterized for cognition, function and symptoms. Speech will be elicited using open-ended interviews. Audio files will be transcribed and preprocessed for automated natural language processing (NLP) analyses of coherence and complexity. Data analyses include canonical correlation, multivariate linear regression with regularization, and machine-learning classification of group status and psychosis outcome. Conclusions: This prospective study aims to characterize language disturbance across stages of psychosis using computational approaches, including psychometric properties, normative variance and clinical correlates, important for biomarker development. SPEAK will create a large archive of language data available to other investigators, a rich resource for the field.