Abstract:Deep learning models have revolutionized numerous fields, yet their decision-making processes often remain opaque, earning them the characterization of "black-box" models due to their lack of transparency and comprehensibility. This opacity presents significant challenges to understanding the rationale behind their decisions, thereby impeding their interpretability, explainability, and reliability. This review examines 718 studies published between 2015 and 2024 in high-impact journals indexed in SCI, SCI-E, SSCI, and ESCI, providing a crucial reference for researchers investigating methodologies and techniques in related domains. In this exploration, we evaluate a wide array of interpretability and explainability (XAI) strategies, including visual and feature-based explanations, local approach-based techniques, and Bayesian methods. These strategies are assessed for their effectiveness and applicability using a comprehensive set of evaluation metrics. Moving beyond traditional analyses, we propose a novel taxonomy of XAI methods, addressing gaps in the literature and offering a structured classification that elucidates the roles and interactions of these methods. Moreover, we explore the intricate relationship between interpretability and explainability, examining potential conflicts and highlighting the necessity for interpretability in practical applications. Through detailed comparative analysis, we underscore the strengths and limitations of various XAI methods across different data types, ensuring a thorough understanding of their practical performance and real-world utility. The review also examines model robustness against adversarial attacks, emphasizing the critical importance of transparency, reliability, and ethical considerations in model development. A significant emphasis is placed on identifying and mitigating biases in deep learning systems, providing insights into future research directions that aim to enhance fairness and reduce bias. By thoroughly reviewing current challenges and emerging research directions, this article equips researchers with the knowledge and tools to advance the development of more transparent, fair, and reliable deep learning systems. Ultimately, this work aims to bridge existing literature gaps by offering a forward-looking perspective that fosters innovation and progress in the field. This comprehensive review not only illuminates the current state of XAI methodologies but also contributes to the broader understanding and enhancement of deep learning systems, ensuring their ethical and equitable application across various domains.

Opening the Black Box of Neural Networks: Methods for Interpreting Neural Network Models in Clinical Applications

Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcomes

Illuminating the Black Box: Interpreting Deep Neural Network Models for Psychiatric Research

A Gentle Introduction to Artificial Neural Networks

Transparency of deep neural networks for medical image analysis: A review of interpretability methods

A Survey of the Interpretability Aspect of Deep Learning Models

Interpretable Clinical Prediction Via Attention-Based Neural Network.

A Survey on Neural Network Interpretability

Interpret Neural Networks by Extracting Critical Subnetworks

Opening the Black Box: Interpretable Machine Learning for Geneticists.

Global and local interpretability techniques of supervised machine learning black box models for numerical medical data

Interpretability of deep learning models: A survey of results

Interpreting Black-box Machine Learning Models for High Dimensional Datasets

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

Interpreting Brain Biomarkers: Challenges and solutions in interpreting machine learning-based predictive neuroimaging

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Unlocking the black box: an in-depth review on interpretability, explainability, and reliability in deep learning

On Interpretability of Artificial Neural Networks: A Survey

Interpretability of Machine Learning Methods Applied to Neuroimaging