Abstract:We propose an interactive editing method that allows humans to help deep neural networks (DNNs) learn a latent space more consistent with human knowledge, thereby improving classification accuracy on indistinguishable ambiguous data. Firstly, we visualize high-dimensional data features through dimensionality reduction methods and design an interactive system \textit{SpaceEditing} to display the visualized data. \textit{SpaceEditing} provides a 2D workspace based on the idea of spatial layout. In this workspace, the user can move the projection data in it according to the system guidance. Then, \textit{SpaceEditing} will find the corresponding high-dimensional features according to the projection data moved by the user, and feed the high-dimensional features back to the network for retraining, therefore achieving the purpose of interactively modifying the high-dimensional latent space for the user. Secondly, to more rationally incorporate human knowledge into the training process of neural networks, we design a new loss function that enables the network to learn user-modified information. Finally, We demonstrate how \textit{SpaceEditing} meets user needs through three case studies while evaluating our proposed new method, and the results confirm the effectiveness of our method.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of poor classification performance of deep neural networks (DNNs) when dealing with similar and ambiguous data. Specifically, the author points out that current deep - learning networks have the following deficiencies:
1. **Difficulty in distinguishing similar ambiguous data**: Although DNNs perform well in many classification tasks, their performance is not satisfactory when dealing with ambiguous data such as abstract concepts or shapes.
2. **Lack of integration of domain knowledge**: For some domain - specific datasets (such as archaeology - related data), the performance of the network is not ideal because these datasets require corresponding domain knowledge to achieve better results.
3. **Uncontrollable training process**: The current deep - learning network training process is a "black box", and users cannot directly intervene and control the learning process of high - dimensional features.
To solve these problems, the author proposes a new interactive editing method, which allows humans to help DNNs learn feature representations that are more in line with human knowledge by modifying the latent space, thereby improving classification accuracy. Specifically, this method includes the following key points:
- **Visualizing high - dimensional features**: Project high - dimensional features onto a two - dimensional workspace through a dimensionality reduction method, allowing users to intuitively observe the data distribution.
- **Interactive editing**: Users can manually adjust the position of the projected data in the two - dimensional workspace, and the system will retrain the network according to the user's editing feedback.
- **Designing a new loss function**: In order to reasonably integrate human knowledge into the network training process, the author designs a new loss function, enabling the network to learn the information modified by the user.
Through this method, users can not only better understand the network training process, but also effectively improve the network performance, especially when dealing with ambiguous data. In addition, this method also provides an interactive system named SpaceEditing, which supports multiple interactive functions, such as zooming, visual volume adjustment, interactive movement, movement guidance, and history recording, etc., to enhance the user experience and the convenience of operation.
### Main contributions of the paper
1. Propose a novel and effective method that enables users to interactively edit the latent space based on their own knowledge, thereby guiding the learning process of the network. This not only improves the performance of the network but also makes the latent space more interpretable.
2. Design a new interactive system SpaceEditing, which supports manual editing from two - dimensional space to high - dimensional space synchronization and provides multiple interactive functions.
3. Evaluate the effectiveness of SpaceEditing in different types of machine - learning tasks through three case studies, verifying the effectiveness and flexibility of this method.
### Formula presentation
To ensure that human knowledge can be effectively integrated into the network training process, the author designs a new loss function. This loss function consists of two parts: classification loss and distance difference loss.
#### Classification loss
The classification loss \( \text{loss}_{\text{cls}} \) is obtained by calculating the cross - entropy between the predicted label and the true label:
\[ \text{loss}_{\text{cls}}=-\sum_{i} y_{i} \log(\hat{y}_{i}) \]
where \( y_{i} \) is the true label and \( \hat{y}_{i} \) is the predicted label.
#### Distance difference loss
The distance difference loss \( \text{loss}_{\text{dis}} \) is calculated according to the points moved by the user:
\[ \text{loss}_{\text{dis}}=\sum_{D} \max(||m_{i}-P_{i}||_{2}^{2}-||m_{i}-N_{i}||_{2}^{2}+\delta, 0) \]
where \( D \) represents the number of points moved by the user, and \( m_{i}\), \( P_{i}\) and \( N_{i}\) respectively represent the corresponding features in the high - dimensional space, and \( \delta \) is the margin used to control the distance between \( P_{i}\) and \( N_{i}\).
#### Total loss function
The total loss function \( \text{Loss} \) is the sum of the classification loss and the distance difference loss.