MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

Nicolas Menet,Michael Hersche,Geethan Karunaratne,Luca Benini,Abu Sebastian,Abbas Rahimi

2023-12-05

Abstract:With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves about 2-4 x speedup at an accuracy delta within [+0.68, -3.18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2-4 inputs at once while maintaining a high average accuracy within a [-1.07, -3.43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at <a class="link-external link-https" href="https://github.com/IBM/multiple-input-multiple-output-nets" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The main problem this paper attempts to address is reducing the computational cost of large-scale deep learning models during the inference process by introducing a new neural network architecture—Multi-Input Multi-Output Networks (MIMONets). Specifically, MIMONets can handle multiple inputs simultaneously by leveraging computation in superposition, thereby significantly reducing the computational burden for each input. This approach not only improves processing speed but also provides the ability to dynamically balance accuracy and throughput. The paper proposes two specific implementations: MIMOConv and MIMOFormer, which are applied to Convolutional Neural Networks (CNN) and Transformer architectures, respectively. Experimental validation shows that these models can achieve a 2 to 4 times speedup while maintaining high accuracy. Additionally, the design of MIMONets allows for immediate adaptation to different computational needs without loading different model weights, offering great flexibility for practical applications.

MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

MIMONet: Multi-Input Multi-Output On-Device Deep Learning

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

An Efficient 2D Method for Training Super-Large Deep Learning Models

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

Mixture of Nested Experts: Adaptive Processing of Visual Tokens

Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

COMO: Efficient Deep Neural Networks Expansion With COnvolutional MaxOut

Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models

End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture

MoViNets: Mobile Video Networks for Efficient Video Recognition

G-MIMO: Empowering GNNs with Diverse Sub-Networks for Graph Classification

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks

AdderNet: Do We Really Need Multiplications in Deep Learning?

YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers

Full-Stack Optimization for CAM-Only DNN Inference

Gibbon: Efficient Co-Exploration of NN Model and Processing-In-Memory Architecture

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network

A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading

TC-MIMONet: A Learning-based Transceiver for MIMO Systems with Temporal Correlations.