Abstract:In recent years, with the rapid development of deep learning technology, great progress has been made in computer vision, image recognition, pattern recognition, and speech signal processing. However, due to the black-box nature of deep neural networks (DNNs), one cannot explain the parameters in the deep network and why it can perfectly perform the assigned tasks. The interpretability of neural networks has now become a research hotspot in the field of deep learning. It covers a wide range of topics in speech and text signal processing, image processing, differential equation solving, and other fields. There are subtle differences in the definition of interpretability in different fields. This paper divides interpretable neural network (INN) methods into the following two directions: model decomposition neural networks, and semantic INNs. The former mainly constructs an INN by converting the analytical model of a conventional method into different layers of neural networks and combining the interpretability of the conventional model-based method with the powerful learning capability of the neural network. This type of INNs is further classified into different subtypes depending on which type of models they are derived from, i.e., mathematical models, physical models, and other models. The second type is the interpretable network with visual semantic information for user understanding. Its basic idea is to use the visualization of the whole or partial network structure to assign semantic information to the network structure, which further includes convolutional layer output visualization, decision tree extraction, semantic graph, etc. This type of method mainly uses human visual logic to explain the structure of a black-box neural network. So it is a post-network-design method that tries to assign interpretability to a black-box network structure afterward, as opposed to the pre-network-design method of model-based INNs, which designs interpretable network structure beforehand. This paper reviews recent progress in these areas as well as various application scenarios of INNs and discusses existing problems and future development directions.

Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution

Exact and Consistent Interpretation of Piecewise Linear Models Hidden behind APIs: A Closed Form Solution

An Interpretable Neural Network Model Through Piecewise Linear Approximation

Towards Interpreting Recurrent Neural Networks Through Probabilistic Abstraction

Opening the Black Box of Neural Networks: Methods for Interpreting Neural Network Models in Clinical Applications

High-precision Linearized Interpretation for Fully Connected Neural Network

Explaining the black-box model: A survey of local interpretation methods for deep neural networks

SPINE: Soft Piecewise Interpretable Neural Equations

Nearly-tight bounds on linear regions of piecewise linear neural networks

Deep PLS: A Lightweight Deep Learning Model for Interpretable and Efficient Data Analytics

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

Learning outside the Black-Box: The pursuit of interpretable models

Unwrapping The Black Box of Deep ReLU Networks: Interpretability, Diagnostics, and Simplification

Relevance Inference Based on Direct Contribution: Counterfactual Explanation to Deep Networks for Intelligent Decision-making

How to Explain Neural Networks: an Approximation Perspective

Interpretability as Approximation: Understanding Black-Box Models by Decision Boundary

An Interpretable Probabilistic Approach for Demystifying Black-box Predictive Models

Training neural networks for solving 1-D optimal piecewise linear approximation

A Survey on Neural Network Interpretability

Interpretable neural networks: principles and applications

Analysis on the Number of Linear Regions of Piecewise Linear Neural Networks