Abstract:Semantic segmentation is a pixel-level prediction task to classify each pixel of the input image. Deep learning models, such as convolutional neural networks (CNNs), have been extremely successful in achieving excellent performances in this domain. However, mobile application, such as autonomous driving, demand real-time processing of incoming stream of images. Hence, achieving efficient architectures along with enhanced accuracy is of paramount importance. Since, accuracy and model size of CNNs are intrinsically contentious in nature, the challenge is to achieve a decent trade-off between accuracy and model size. To address this, we propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner. On one hand, it uses a bank of convolutional filters with multiple dilation rates which leads to multi-scale context aggregation; crucial in achieving better accuracy. On the other hand, parameters are reduced by a careful factorization of the employed filters; crucial in achieving lightweight models. Moreover, we decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect. We also design a dedicated Feature-Image Reinforcement (FIR) unit to carry out the fusion operation of shallow and deep features with the downsampled versions of the input image. This gives an accuracy enhancement without increasing model parameters. Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off. More specifically, with only less than 0.5 million parameters, the proposed network achieves 66.93\% and 66.28\% mIoU on Cityscapes validation and test set, respectively. Moreover, FPLNet has a processing speed of 95.5 frames per second (FPS).

Efficient pyramid context encoding and feature embedding for semantic segmentation

Enhanced Feature Pyramid Network for Semantic Segmentation.

FPANet: Feature Pyramid Aggregation Network for Real-Time Semantic Segmentation

SPFNet:Subspace Pyramid Fusion Network for Semantic Segmentation

Real-time Semantic Segmentation in Traffic Scene Using Cross Stage Partial-based Encoder–decoder Network

Ppednet: Pyramid Pooling Encoder-Decoder Network For Real-Time Semantic Segmentation

Efficient Context Integration through Factorized Pyramidal Learning for Ultra-Lightweight Semantic Segmentation

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

S$^2$-FPN: Scale-ware Strip Attention Guided Feature Pyramid Network for Real-time Semantic Segmentation

DFPNet:Dislocation Double Feature Pyramid Real-time Semantic Segmentation Network

EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation

Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation

A Unified Efficient Pyramid Transformer for Semantic Segmentation

Encoder-decoder with double spatial pyramid for semantic segmentation.

Cross Guided and Pyramid Aggregation Networks for Real-time Semantic Segmentation

Improve SegNet with Feature Pyramid for Road Scene Parsing

Lightweight Spatial Pyramid Pooling Network for Real-Time Semantic Segmentation

Semantic Segmentation Based on Spatial Pyramid Pooling and Multilayer Feature Fusion

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

A Lightweight Network for Fast Semantic Segmentation.

BFMNet: Bilateral Feature Fusion Network with Multi-Scale Context Aggregation for Real-Time Semantic Segmentation.