Classification of Schizophrenic Traits in Transcriptions of Audio Spectra from Patient Literature: Artificial Intelligence Models Enhanced by Geometric Properties

Paulo César F. Marques,Lucas Rafael F. Soares,André Victor de A. Araujo,Arthur Ribeiro Monteiro,Arthur Almeida Leitão Batista,Túlio Farias Pimentel,Lis de Lima Calheiros,Maria Helena N. S. Padilla,André Pacheco,Fabio Queda,João Ricardo M. Oliveira,José Luiz de Lima Filho,Silvana Bocanegra,Jones Albuquerque
DOI: https://doi.org/10.1101/2024.04.05.24305390
2024-04-07
Abstract:Schizophrenia is a severe mental illness that affects approximately 1% of the global population and presents significant challenges for patients, families, and healthcare professionals. Characterized by symptoms such as delusions, hallucinations, disorganized speech or behavior, and cognitive impairment, this condition has an early onset and chronic trajectory, making it a debilitating challenge. Schizophrenia also imposes a substantial burden on society, exacerbated by the stigma associated with mental disorders. Technological advancements, such as computerized semantic, linguistic, and acoustic analyses, are revolutionizing the understanding and assessment of communication alterations, a significant aspect in various severe mental illnesses. Early and accurate diagnosis is crucial for improving prognosis and implementing appropriate treatments. In this context, the advancement of Artificial Intelligence (AI) has provided new perspectives for the treatment of schizophrenia, with machine learning techniques and natural language processing allowing a more detailed analysis of clinical, neurological, and behavioral data sets. The present article aims to present a proposal for computational models for the identification of schizophrenic traits in texts. The database used in this article was created with 139 excerpts of patients’ speeches reported in the book “Memories of My Nervous Disease” by German judge Daniel Paul Schreber, classifying them into three categories: 1 - schizophrenic, 2 - with schizophrenic traits and 3 - without any relation to the disorder. Of these speeches, 104 were used for training the models and the others 35 for validation.Three classification models were implemented using features based on geometric properties of graphs (number of vertices, number of cycles, girth, vertex of maximum degree, maximum clique size) and text entropy. Promising results were observed in the classification, with the Decision Tree-based model [1] achieving 100% accuracy, the KNN-k-Nearest Neighbor model observed with 62.8% accuracy, and the ‘centrality-based’ model with 59% precision. The high precision rates, observed when geometric properties are incorporated into Artificial Intelligence Models, suggest that the models can be improved to the point of capturing the language deviation traits that are indicative of schizophrenic disorders. In summary, this study paves the way for significant advances in the use of geometric properties in the field of psychiatry, offering a new data-based approach to the understanding and therapy of schizophrenia.
Psychiatry and Clinical Psychology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify and classify language features related to schizophrenia by using artificial intelligence (AI) models and geometric properties. Specifically, the research aims to: 1. **Improve the accuracy of early diagnosis of schizophrenia**: Schizophrenia is a serious mental illness, and its symptoms include delusions, hallucinations, disorganized speech or behavior, and cognitive impairment. Early and accurate diagnosis is crucial for improving prognosis and implementing appropriate treatment. 2. **Utilize geometric properties and natural language processing techniques**: The research introduced geometric properties (such as the number of vertices, the number of cycles, the girth, the vertex with the maximum degree, the size of the maximum clique in a graph, etc.) and text entropy as features, and combined machine - learning models to classify patient voice - transcribed texts. 3. **Verify the effectiveness of the model**: By using 139 patient voice - transcribed texts extracted from the autobiography "Memoirs of My Nervous Illness" by the German judge Daniel Paul Schreber, which are divided into three categories (definite schizophrenia, schizophrenia - like features, control group without relevant features), the effect of training and validating the model is carried out. ### Main methods - **Data source**: 139 voice - transcribed texts were extracted from Daniel Paul Schreber's autobiography and divided into three categories. - **Feature extraction**: Geometric properties (such as the number of vertices in a graph, the number of cycles, etc.) and text entropy were used as features. - **Model implementation**: Three classification models were implemented: - A decision - tree - based model - A K - Nearest Neighbors (KNN) model - A centrality - based model ### Results - **Decision - tree model**: It reached 100% in all performance metrics (precision, recall, accuracy). - **KNN model**: The precision is 66%, the recall is 62.8%, and the accuracy is 62.8%. - **Centrality - based model**: The precision for schizophrenia texts is 59%, and the precision for the control group is 63%. ### Discussion Although the decision - tree model performs well, the risk of over - fitting needs to be vigilant. The KNN model and the centrality - based model perform less than expected, which may be due to the "curse of dimensionality" problem brought by high - dimensional data. Future research can explore different algorithm configurations and feature combinations to improve the performance of the model. ### Conclusion This research demonstrates the feasibility of using geometric properties and natural language processing techniques in the classification of schizophrenia language features, providing new tools for mental health professionals, which is helpful for detecting schizophrenia symptoms earlier and monitoring the progress of the disease and treatment response. Future work will be extended to the clinical environment to further verify the practical effects of these models. ### Examples of formulas The formulas mentioned in the discussion are as follows: - **Shannon entropy**: \[ H(x)=-\sum_{x \in X} p(x) \log(p(x)) \] - **Betweenness centrality of node v**: \[ CB(v)=\sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} \] where: - \(CB(v)\) is the betweenness centrality of node \(v\). - \(\sigma_{st}\) is the total number of shortest paths from node \(s\) to node \(t\). - \(\sigma_{st}(v)\) is the number of shortest paths from node \(s\) to node \(t\) passing through node \(v\). Through these methods and techniques, the research provides a new data - driven approach for the diagnosis and management of schizophrenia.