Abstract:Mental health diseases affect children's lives and well-beings which have received increased attention since the COVID-19 pandemic. Analyzing psychiatric clinical notes with topic models is critical to evaluating children's mental status over time. However, few topic models are built for longitudinal settings, and most existing approaches fail to capture temporal trajectories for each document. To address these challenges, we develop a dynamic topic model with consistent topics and individualized temporal dependencies on the evolving document metadata. Our model preserves the semantic meaning of discovered topics over time and incorporates heterogeneity among documents. In particular, when documents can be categorized, we propose a classifier-free approach to maximize topic heterogeneity across different document groups. We also present an efficient variational optimization procedure adapted for the multistage longitudinal setting. In this case study, we apply our method to the psychiatric clinical notes from a large tertiary pediatric hospital in Southern California and achieve a 38% increase in the overall coherence of extracted topics. Our real data analysis reveals that children tend to express more negative emotions during state shutdowns and more positive when schools reopen. Furthermore, it suggests that sexual and gender minority (SGM) children display more pronounced reactions to major COVID-19 events and a greater sensitivity to vaccine-related news than non-SGM children. This study examines children's mental health progression during the pandemic and offers clinicians valuable insights to recognize disparities in children's mental health related to their sexual and gender identities.

What problem does this paper attempt to address?

This paper aims to solve the problem of longitudinal topic modeling in children's mental health records. Specifically, existing topic models have the following deficiencies when dealing with longitudinal data: 1. **Temporal Consistency**: Most existing dynamic topic models (such as Dynamic LDA) may have changing topics at different time points, resulting in inconsistent semantic meanings of topics and making it difficult to track the trend of each topic over time. 2. **Individual Heterogeneity**: Existing models are usually unable to capture the longitudinal heterogeneity between documents, that is, the change in topic proportions of different individuals at different time points. 3. **Group Differences**: Existing supervised LDA methods use classifiers to distinguish different groups, but this method depends on the performance of the classifier and it is difficult to directly reflect the differences in topic distributions among different groups. To overcome these challenges, the paper proposes a new multi - stage dynamic variational auto - encoder neural topic model (Heterogeneous Classifier - Free Dynamic Topic Model, HCF - DTM), which has the following features: - **Temporal Consistency**: HCF - DTM ensures that the semantic meaning of topics remains consistent throughout the study period by assuming the existence of a time - invariant word - topic matrix \(\beta\). - **Individual Heterogeneity**: The model introduces document metadata and group information and captures the change in topic proportions of each document at different time points through the function \(f_t\), thereby better reflecting the individual's longitudinal heterogeneity. - **Group Differences**: HCF - DTM adopts a classifier - free method and enhances the differences between groups by maximizing the distance of topic proportion distributions among different groups. Specifically, the generation process of HCF - DTM is as follows: 1. **Time - consistent Word - Topic Distribution**: \[ \beta \sim \mathcal{N}(\beta_0, \delta^2 I) \] where \(\beta_0\) is the time - invariant mean prior, and \(\delta^2 I\) is the diagonal covariance matrix, ensuring the orthogonality and difference between topics. 2. **Topic Proportions of Each Document at Time Point \(t\)**: \[ \eta_{t,d,1:K} | \eta_{t - 1,d,1:K}, X_{d,t}, Y_d, \phi_t \sim \mathcal{N}(f_t(\eta_{t - 1,d,1:K}, X_{d,t}, Y_d), a^2 I) \] \[ \theta_{t,d,1:K} = \sigma(\eta_{t,d,1:K}) \] where \(f_t\) is a parameterized function used to capture the mean trend of topic proportions, and \(\sigma\) is the softmax function, ensuring that the topic proportions are within the probability simplex. 3. **Generation of Each Word**: \[ W_{t,d,j} \sim \text{Mult}(\theta_{t,d,1:K} \cdot \sigma(\beta)^T) \] Through the above - mentioned generation process, HCF - DTM can not only identify time - consistent topics but also capture the individual's longitudinal heterogeneity, and maximize the topic differences among different groups through a classifier - free method. This makes the model more effective in analyzing children's mental health records and can reveal the mental health change trends of different groups during the epidemic.

Dynamic Topic Language Model on Heterogeneous Children's Mental Health Clinical Notes

Discovering treatment pattern in traditional Chinese medicine clinical cases using topic model and domain knowledge

Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study

Topic modeling on clinical social work notes for exploring social determinants of health factors

Enhanced Sentiment Analysis and Topic Modeling During the Pandemic Using Automated Latent Dirichlet Allocation

Applying Bayesian hyperparameter optimization towards accurate and efficient topic modeling in clinical notes

An integrated latent Dirichlet allocation and Word2vec method for generating the topic evolution of mental models from global to local

Modeling trajectories of mental health: challenges and opportunities

Discovering Mental Health Research Topics with Topic Modeling

Longitudinal Sentiment Topic Modelling of Reddit Posts

Prediction-Constrained Topic Models for Antidepressant Recommendation

The Problem of Semantic Shift in Longitudinal Monitoring of Social Media: A Case Study on Mental Health During the COVID-19 Pandemic

Discovering topic structures of a temporally evolving document corpus

Dynamic topic modelling for exploring the scientific literature on coronavirus: an unsupervised labelling technique

A Structural Topic and Sentiment-Discourse Model for Text Analysis

Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers

Dimensional Measures of Psychopathology in Children and Adolescents Using Large Language Models

Reflecting the trends in the academic landscape of special education using probabilistic dynamic topic modeling

Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter

Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions

The Point of View of a Sentiment: Towards Clinician Bias Detection in Psychiatric Notes