Abstract:Machine learning is increasingly and ubiquitously being used in the medical domain. Evaluation metrics like accuracy, precision, and recall may indicate the performance of the models but not necessarily the reliability of their outcomes. This paper assesses the effectiveness of a number of machine learning algorithms applied to an important dataset in the medical domain, specifically, mental health, by employing explainability methodologies. Using multiple machine learning algorithms and model explainability techniques, this work provides insights into the models' workings to help determine the reliability of the machine learning algorithm predictions. The results are not intuitive. It was found that the models were focusing significantly on less relevant features and, at times, unsound ranking of the features to make the predictions. This paper therefore argues that it is important for research in applied machine learning to provide insights into the explainability of models in addition to other performance metrics like accuracy. This is particularly important for applications in critical domains such as healthcare.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper primarily explores the reliability and interpretability of machine learning models in the field of mental health. Specifically: 1. **Effectiveness of Evaluation Metrics**: The study aims to assess whether traditional evaluation metrics (such as accuracy, precision, and recall) are sufficient to measure the performance of machine learning models in mental health prediction. 2. **Application of Explainable AI Techniques**: The paper employs two popular explainable AI techniques—Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP)—to complement traditional evaluation metrics, thereby better understanding the decision-making process of the models. 3. **Comparison of Different Algorithms**: Experiments are conducted on datasets using various machine learning algorithms (such as logistic regression, K-nearest neighbors, decision trees, etc.), and LIME and SHAP are utilized to analyze the interpretability of these models. The core research questions are: - **RQ1**: How reliable are evaluation metrics like accuracy in assessing the performance of machine learning models? - **RQ2**: How do explainable AI techniques like LIME and SHAP complement traditional evaluation metrics? - **RQ3**: What are the differences in result interpretability among various machine learning algorithms? The paper demonstrates through experiments that relying solely on traditional evaluation metrics can lead to misleading conclusions, especially in mental health prediction. For instance, some models may overly focus on irrelevant or unreasonable features, thereby affecting the reliability of predictions. Therefore, the paper emphasizes that when applying machine learning in critical fields such as healthcare, it is essential to consider not only performance metrics but also the interpretability of the models.

Assessing the Reliability of Machine Learning Models Applied to the Mental Health Domain Using Explainable AI

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Comparative analysis of explainable machine learning prediction models for hospital mortality

Explainability of deep learning models in medical video analysis: a survey

Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities

Evaluation of Popular XAI Applied to Clinical Prediction Models: Can They be Trusted?

Explainable Artificial Intelligence for Predictive Modeling in Healthcare

Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic Metrics

The Role of Explainability in Assuring Safety of Machine Learning in Healthcare

Explainability as fig leaf? An exploration of experts’ ethical expectations towards machine learning in psychiatry

Towards explainability in artificial intelligence frameworks for heartcare: A comprehensive survey

Explain To Decide: A Human-Centric Review on the Role of Explainable Artificial Intelligence in AI-assisted Decision Making

Analyzing Machine Learning Models for Credit Scoring with Explainable AI and Optimizing Investment Decisions

Elucidating Discrepancy in Explanations of Predictive Models Developed using EMR

Generating complex explanations for artificial intelligence models: an application to clinical data on severe mental illness

Explainable, trustworthy, and ethical machine learning for healthcare: A survey

Reliable Autism Spectrum Disorder Diagnosis for Pediatrics Using Machine Learning and Explainable AI

Interpretable and explainable machine learning: A methods‐centric overview with concrete examples

How Reliable and Stable are Explanations of XAI Methods?

Explainable artificial intelligence for mental health through transparency and interpretability for understandability