From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields

Miltiadis Kofinas,Samuele Papa,Efstratios Gavves
2024-12-12
Abstract:Neural fields (NeFs) have recently emerged as a state-of-the-art method for encoding spatio-temporal signals of various modalities. Despite the success of NeFs in reconstructing individual signals, their use as representations in downstream tasks, such as classification or segmentation, is hindered by the complexity of the parameter space and its underlying symmetries, in addition to the lack of powerful and scalable conditioning mechanisms. In this work, we draw inspiration from the principles of connectionism to design a new architecture based on MLPs, which we term NeoMLP. We start from an MLP, viewed as a graph, and transform it from a multi-partite graph to a complete graph of input, hidden, and output nodes, equipped with high-dimensional features. We perform message passing on this graph and employ weight-sharing via self-attention among all the nodes. NeoMLP has a built-in mechanism for conditioning through the hidden and output nodes, which function as a set of latent codes, and as such, NeoMLP can be used straightforwardly as a conditional neural field. We demonstrate the effectiveness of our method by fitting high-resolution signals, including multi-modal audio-visual data. Furthermore, we fit datasets of neural representations, by learning instance-specific sets of latent codes using a single backbone architecture, and then use them for downstream tasks, outperforming recent state-of-the-art methods. The source code is open-sourced at <a class="link-external link-https" href="https://github.com/mkofinas/neomlp" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the poor performance of existing neural fields (NeFs) in downstream tasks, especially in tasks such as classification or segmentation. Specifically, these problems stem from the following aspects: 1. **Complexity and Symmetry of the Parameter Space**: Existing neural field models have complex symmetry problems in the parameter space, which causes them to perform very poorly in downstream tasks without treatment. 2. **Lack of a Powerful Scalable Conditioning Mechanism**: Existing neural field models lack an effective conditioning mechanism, which limits their representational ability and adaptability to different modal signals. To solve these problems, the authors propose a new architecture - NeoMLP. The following are the specific contributions and solutions of the paper: ### Main Contributions 1. **NeoMLP, a New MLP - Based Architecture**: - By regarding MLP as a graph structure and transforming it from a multi - part graph to a fully - connected graph, the self - attention mechanism is introduced for message passing. - Initialize hidden nodes and output nodes with high - dimensional features and optimize the values of these nodes through back - propagation. 2. **Improvement of the Conditioning Mechanism**: - NeoMLP has a built - in conditioning mechanism and can be conditioned more effectively by using hidden nodes and output nodes as learnable latent codes. - Propose ν - reps (nu - reps) and ν - sets (nu - sets), which are used to represent the neural representation of a single signal and the neural representation of a dataset respectively. 3. **Performance Improvement in Downstream Tasks**: - By using a single backbone architecture to fit high - resolution signals of different modalities (such as audio, video, and multi - modal data) and using latent codes for downstream tasks (such as classification and segmentation), the performance is significantly improved. - The effectiveness of the method is verified on multiple benchmark datasets (such as MNIST, CIFAR10, and ShapeNet10), surpassing the existing state - of - the - art methods. ### Method Overview The core idea of NeoMLP is to regard MLP as a graph structure and make improvements through the following steps: - **Graph Transformation**: Transform MLP from a multi - part graph to a fully - connected graph and introduce self - loop edges. - **Message Passing**: Perform message passing on the graph and achieve weight sharing through the self - attention mechanism. - **High - Dimensional Features**: Initialize nodes with high - dimensional features to improve the expressiveness and scalability of the model. Through these improvements, NeoMLP can not only better fit high - resolution signals but also be effectively applied to various downstream tasks, solving the deficiencies of existing neural field models in these tasks. ### Experimental Results The paper verifies the effectiveness of NeoMLP through a series of experiments, including: - Fitting high - resolution signals (such as audio, video, and multi - modal data) and achieving results on the PSNR metric that are significantly better than existing methods. - Performing downstream tasks (such as classification and segmentation) on multiple datasets, showing better reconstruction quality and higher downstream task performance. In summary, by introducing the NeoMLP architecture, this paper successfully solves the problem of poor performance of existing neural field models in downstream tasks and provides new ideas and methods for the application of neural fields.