Towards Croppable Implicit Neural Representations

Maor Ashkenazi,Eran Treister
2024-10-23
Abstract:Implicit Neural Representations (INRs) have peaked interest in recent years due to their ability to encode natural signals using neural networks. While INRs allow for useful applications such as interpolating new coordinates and signal compression, their black-box nature makes it difficult to modify them post-training. In this paper we explore the idea of editable INRs, and specifically focus on the widely used cropping operation. To this end, we present Local-Global SIRENs -- a novel INR architecture that supports cropping by design. Local-Global SIRENs are based on combining local and global feature extraction for signal encoding. What makes their design unique is the ability to effortlessly remove specific portions of an encoded signal, with a proportional weight decrease. This is achieved by eliminating the corresponding weights from the network, without the need for retraining. We further show how this architecture can be used to support the straightforward extension of previously encoded signals. Beyond signal editing, we examine how the Local-Global approach can accelerate training, enhance encoding of various signals, improve downstream performance, and be applied to modern INRs such as INCODE, highlighting its potential and flexibility. Code is available at <a class="link-external link-https" href="https://github.com/maorash/Local-Global-INRs" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to perform cropping and expansion operations on implicit neural representations (INRs) without additional training**. Specifically, the author focuses on being able to easily remove or add specific parts of the encoded signal without retraining the model and maintaining the model performance. ### Problem Background Implicit neural representations (INRs) have received extensive attention in recent years because they can use neural networks to encode natural signals. Although INRs perform well in tasks such as interpolating new coordinates and signal compression, their black - box nature makes it very difficult to modify these representations after training. The usual methods include retraining the entire model or fine - tuning the existing model, but this is not only time - consuming but may also fail to preserve the original encoded information. ### Core Problems of the Paper 1. **Cropping Operation**: How to remove the encoded signal of specific parts in INRs without retraining and proportionally reduce the model weights. 2. **Expansion Operation**: How to effectively expand the encoded signal without completely retraining. 3. **Accelerating Training and Improving Performance**: By combining local and global feature extraction, explore whether the encoding quality and the performance of downstream tasks can be improved while accelerating training. ### Solutions To solve the above problems, the author proposes the **Local - Global SIRENs** architecture, which is a new INR architecture that supports cropping and expansion operations by combining local and global feature extraction. Specifically: - **Local Sub - Networks**: Each local sub - network is responsible for processing a specific partition of the signal and can independently learn local features. - **Global Sub - Network**: The global sub - network is responsible for capturing the global features of the entire signal, ensuring that the structure of the overall signal is still retained even after removing some local sub - networks. - **Merge Operator**: Used to combine local and global features to generate the final signal representation. In this way, Local - Global SIRENs can easily perform cropping and expansion operations without retraining, while maintaining high reconstruction accuracy and low computational cost. ### Experimental Verification The author verifies the effectiveness of Local - Global SIRENs through experiments such as image, audio, video, and 3D shape encoding. The experimental results show that this method not only performs excellently in cropping and expansion operations but also outperforms traditional INR methods in terms of encoding quality, training speed, etc. In summary, the main contribution of this paper is to propose a new INR architecture, which solves the limitations of existing methods in cropping and expansion operations and provides new ideas and methods for the research of implicit neural representations.