The inverse problem for neural networks

Marcelo Forets,Christian Schilling
DOI: https://doi.org/10.1007/978-3-031-46002-9_14
2023-08-27
Abstract:We study the problem of computing the preimage of a set under a neural network with piecewise-affine activation functions. We recall an old result that the preimage of a polyhedral set is again a union of polyhedral sets and can be effectively computed. We show several applications of computing the preimage for analysis and interpretability of neural networks.
Machine Learning,Artificial Intelligence,Logic in Computer Science
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper studies the inverse problem of neural networks. Specifically, given a neural network \(N\) and an output set \(Y\), the author hopes to calculate the preimage of the output set \(Y\), that is, to find all inputs \(x\) such that the network output belongs to \(Y\), satisfying \(N(x)\in Y\). #### Main Motivations: 1. **Specification Mining**: - Calculating the preimage can be used to extract potential specifications and explain the functions encoded by neural networks. This was previously known as rule extraction in the literature. 2. **Specification Verification**: - If a certain specification is known, the preimage can be calculated to analyze whether the specification holds, for example, whether there are certain inputs that lead to a specific output set. #### Specific Contributions: - **Complete Picture**: For neural networks with piecewise - affine activation functions, the author provides a complete method for calculating the preimage. The preimage of such networks is the union of polyhedral sets and can be effectively calculated using linear programming. - **Historical Review and Generalization**: Although similar results have been proposed in the past, these works may not have received enough attention. This paper re - examines and generalizes these results, making them applicable to widely - used modern piecewise - affine activation functions such as ReLU. - **Applications and Extensions**: The author shows the applications of calculating the preimage in interpretability and approximate calculation, and proposes a method combined with forward - mapping calculation. #### Related Background: - **Traditional Methods**: The traditional machine - learning community mainly studies the inverse problem of image classifiers by highlighting neurons or input pixels that influence decisions. However, these methods usually only heuristically map to specific inputs rather than actually calculate the inverse mapping. - **Other Fields**: The inverse problem also has applications in areas such as abstract interpretation and adversarial attacks, but most of these methods rely on approximate solutions. In conclusion, this paper aims to provide a systematic method for calculating the preimage of neural networks to improve the understanding of neural network behavior and verify the effectiveness of their specifications.