Predicting Suicidal Ideation, Planning, and Attempts among the Adolescent Population of the United States

Hamed Khosravi,Imtiaz Ahmed,Avishek Choudhury
DOI: https://doi.org/10.3390/healthcare12131262
2024-06-26
Health Care
Abstract:Suicide is the second leading cause of death among individuals aged 5 to 24 in the United States (US). However, the precursors to suicide often do not surface, making suicide prevention challenging. This study aims to develop a machine learning model for predicting suicide ideation (SI), suicide planning (SP), and suicide attempts (SA) among adolescents in the US during the coronavirus pandemic. We used the 2021 Adolescent Behaviors and Experiences Survey Data. Class imbalance was addressed using the proposed data augmentation method tailored for binary variables, Modified Synthetic Minority Over-Sampling Technique. Five different ML models were trained and compared. SHapley Additive exPlanations analysis was conducted for explainability. The Logistic Regression model, identified as the most effective, showed superior performance across all targets, achieving high scores in recall: 0.82, accuracy: 0.80, and area under the Receiver Operating Characteristic curve: 0.88. Variables such as sad feelings, hopelessness, sexual behavior, and being overweight were noted as the most important predictors. Our model holds promise in helping health policymakers design effective public health interventions. By identifying vulnerable sub-groups within regions, our model can guide the implementation of tailored interventions that facilitate early identification and referral to medical treatment.
health care sciences & services,health policy & services
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict suicidal ideation (SI), suicide plans (SP), and suicide attempts (SA) among the American adolescent group, especially during the COVID - 19 pandemic. Suicide is the second - leading cause of death among people aged 5 to 24 in the United States, but the precursors of suicide are often not obvious, making prevention work very difficult. Therefore, this paper aims to predict these suicidal behaviors through machine - learning models to help timely identify and intervene with high - risk individuals, thereby reducing the suicide rate. ### Main research objectives: 1. **Develop prediction models**: Use the 2021 Youth Behavior and Experience Survey data to develop machine - learning models to predict suicidal ideation, suicide plans, and suicide attempts. 2. **Handle data imbalance problems**: Propose an improved Synthetic Minority Over - sampling Technique (Modified SMOTE) specifically to deal with data imbalance problems for binary features. 3. **Model evaluation and optimization**: Compare five different machine - learning models (decision tree, random forest, support vector machine, logistic regression, and extreme gradient boosting), and further optimize the best model through hyper - parameter tuning and feature selection. 4. **Interpret model results**: Use SHapley Additive exPlanations (SHAP) analysis to interpret the important features predicted by the model, in order to better understand which factors are most critical for predicting suicidal behavior. ### Research background: - **The severity of the suicide problem**: Suicide is an important public health problem worldwide, especially more prominent among adolescents. - **The impact of the pandemic**: The COVID - 19 pandemic has exacerbated the mental health problems of adolescents and increased the risk of suicide. - **Limitations of existing research**: Although existing research has used machine - learning methods to predict suicidal behavior, most studies have focused on clinical data, which are difficult to obtain widely, limiting the application scope of the models. ### Research methods: - **Data sources**: Use the 2021 Youth Behavior and Experience Survey data, covering the questionnaire responses of 7,705 participants. - **Data pre - processing**: Deal with missing values, convert categorical variables, remove highly correlated variables, etc. - **Data balancing**: Use the improved SMOTE technique to deal with data imbalance problems. - **Model training and evaluation**: Train and evaluate five machine - learning models, and select the best - performing model for further optimization. - **Model interpretation**: Use SHAP analysis to interpret the key features predicted by the model. ### Research results: - **Model performance**: The logistic regression model performs best on all target variables, with a recall rate of 0.82, an accuracy rate of 0.80, and an AUC of 0.88. - **Important features**: Factors such as sadness, hopelessness, sexual behavior, and weight management are identified as important features for predicting suicidal behavior. ### Conclusions: - **Effectiveness of the model**: The proposed model performs well in predicting suicidal ideation, suicide plans, and suicide attempts, with a high recall rate and accuracy rate. - **Practical applications**: This model can help health policy - makers design effective public health activities and reduce the suicide risk of adolescents through early identification and intervention. - **Future directions**: Although the model performs well, it still needs to further verify its generalization ability in different populations and be used in combination with clinical evaluations. Through these studies, the author hopes to provide powerful tools and technical support for adolescent suicide prevention.