Abstract:In recent years, the optimization of network architecture plays an increasingly important role in the performance improvement of neural networks. We introduce an interactive dual-branch attention mechanism and three different lightweight-oriented strategies to build an accurate and compact residual network model in this work. The channel attention and spatial attention are fused to construct a novel bottleneck to enhance the feature representation ability for accurate performance. Asymmetric convolutions with spatial factorization, channel splitting, depthwise separable convolution with width multiplier adjustment are further combined to compress the parameter size of the attention-driven model for a lightweight and compact residual network named ALResNet. The experimental results of 92.1% top-1 testing accuracy at the inference speed of 14.90 fps on Animals-10 and 89.4% top-1 testing accuracy at the inference speed of 16.21 fps on CIFAR-10, as well as 4.77M parameters and 736.82 MFLOPs, demonstrate that the proposed ALResNet achieves a decent tradeoff between accuracy and computing efficiency for fast inference on resource-limited mobile devices for vision-based tasks. ∗Corresponding author: Aiwen Luo (faith.awluo@gmail.com). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. MLMI’21, September 17–19, 2021, Hangzhou, China © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8424-7/21/09. . . $15.00 https://doi.org/10.1145/3490725.3490729 CCS CONCEPTS • Computing methodologies; • Object recognition; • Neural networks; • Model verification and validation;

ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

ConvMLP: Hierarchical Convolutional MLPs for Vision

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

An MLP Network Based on Residual Learning for Rice Hyperspectral Data Classification

A Novel Classification Framework for Hyperspectral Image Data by Improved Multilayer Perceptron Combined with Residual Network

ALResNet: Attention-Driven Lightweight Residual Network for Fast and Accurate Image Recognition

MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

ParaLkResNet: an efficient multi-scale image classification network

A Novel Biologically Inspired ELM-based Network for Image Recognition

Multimodal Moore-Penrose Inverse-Based Recomputation Framework for Big Data Analysis

Hire-MLP: Vision MLP Via Hierarchical Rearrangement

R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

HyperMLP: Superpixel Prior and Feature Aggregated Perceptron Networks for Hyperspectral and LiDAR Hybrid Classification

Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

LambdaNetworks: Modeling Long-Range Interactions Without Attention

Efficient Deep Spiking Multilayer Perceptrons With Multiplication-Free Inference

SS-MLP: A Novel Spectral-Spatial MLP Architecture for Hyperspectral Image Classification

MLP Architectures for Vision-and-Language Modeling: An Empirical Study

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

Brain-inspired Multilayer Perceptron with Spiking Neurons