MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

Nicolas Menet,Michael Hersche,Geethan Karunaratne,Luca Benini,Abu Sebastian,Abbas Rahimi
2023-12-05
Abstract:With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves about 2-4 x speedup at an accuracy delta within [+0.68, -3.18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2-4 inputs at once while maintaining a high average accuracy within a [-1.07, -3.43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at <a class="link-external link-https" href="https://github.com/IBM/multiple-input-multiple-output-nets" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem this paper attempts to address is reducing the computational cost of large-scale deep learning models during the inference process by introducing a new neural network architecture—Multi-Input Multi-Output Networks (MIMONets). Specifically, MIMONets can handle multiple inputs simultaneously by leveraging computation in superposition, thereby significantly reducing the computational burden for each input. This approach not only improves processing speed but also provides the ability to dynamically balance accuracy and throughput. The paper proposes two specific implementations: MIMOConv and MIMOFormer, which are applied to Convolutional Neural Networks (CNN) and Transformer architectures, respectively. Experimental validation shows that these models can achieve a 2 to 4 times speedup while maintaining high accuracy. Additionally, the design of MIMONets allows for immediate adaptation to different computational needs without loading different model weights, offering great flexibility for practical applications.