Revealing the working mechanism of quantum neural networks by mutual information

Xin Zhang,Yuexian Hou
2024-04-30
Abstract:Quantum neural networks (QNNs) is a parameterized quantum circuit model, which can be trained by gradient-based optimizer, can be used for supervised learning, regression tasks, combinatorial optimization, etc. Although many works have demonstrated that QNNs have better learnability, generalizability, etc. compared to classical neural networks. However, as with classical neural networks, we still can't explain their working mechanism well. In this paper, we reveal the training mechanism of QNNs by mutual information. Unlike traditional mutual information in neural networks, due to quantum computing remains information conserved, the mutual information is trivial of the input and output of U operator. In our work, in order to observe the change of mutual information during training, we divide the quantum circuit (U operator) into two subsystems, discard subsystem (D) and measurement subsystem (M) respectively. We calculate two mutual information, I(Di : Mo) and I(Mi : Mo) (i and o means input or output of the corresponding subsystem), and observe their behavior during training. As the epochs increases, I(Di : Mo) gradually increases, this may means some information of discard subsystem is continuously pushed into the measurement subsystem, the information should be label-related. What's more, I(Mi : Mo) exist two-phase behavior in training process, this consistent with the information bottleneck anticipation. The first phase, I(Mi : Mo) is increasing, this means the measurement subsystem perform feature fitting. The second phase, I(Mi : Mo) is decreasing, this may means the system is generalizing, the measurement subsystem discard label-irrelevant information into the discard subsystem as many as possible. Our work discussed the working mechanism of QNNs by mutual information, further, it can be used to analyze the accuracy and generalization of QNNs.
Quantum Physics
What problem does this paper attempt to address?
The paper primarily explores the working mechanism of Quantum Neural Networks (QNNs) and attempts to reveal this mechanism through the method of Mutual Information. Specifically, the paper aims to address the following key issues: 1. **Explaining the working mechanism of QNNs**: Although QNNs have shown advantages over classical neural networks in terms of performance, generalization ability, and trainability, they also face the problem of poor interpretability. This means it is difficult to understand how QNNs make decisions, which limits their application in critical fields such as medical diagnosis and intelligent finance. 2. **Revealing the working mechanism using Mutual Information**: The paper draws on the information bottleneck theory based on Mutual Information, which has achieved some success in explaining the working mechanism of classical neural networks. However, since information is conserved during quantum computation, directly calculating the mutual information between input and output is meaningless. Therefore, the authors adopted a method called the "Scrambling model," which divides the quantum circuit (U operator) into two subsystems—the Discard subsystem (D) and the Measurement subsystem (M)—and then calculates the mutual information within and between these two subsystems. 3. **Observing changes in mutual information during training**: By observing the behavior of mutual information during the training process, a two-stage phenomenon was discovered: in the first stage, the mutual information between the Measurement subsystem (M) and the input increases, indicating that this subsystem is performing feature fitting; in the second stage, this mutual information begins to decrease, which may mean that the system is generalizing, i.e., the Measurement subsystem is discarding information unrelated to the labels into the Discard subsystem as much as possible. 4. **Validating behavior on different datasets**: Experimental results demonstrated the behavior of the Discard subsystem and the Measurement subsystem of QNNs on datasets such as Iris, diabetes, and breast cancer. These results show that during the training process, information related to the labels in the Discard subsystem is continuously pushed into the Measurement subsystem, and the Measurement subsystem undergoes two stages of feature fitting and information compression. In summary, the main goal of the paper is to reveal the working mechanism of QNNs by analyzing changes in mutual information, which is of great significance for improving the interpretability of QNNs and promoting their application in practical scenarios.