Abstract:Deep learning has achieved great successes in many research areas. In particular, remarkable progresses have been made in the field of computer vision. This special focus, which will also appear in the next few issues, aims at communicating the new ideas on applying deep learning to solve the critical vision tasks. In this special focus, six research papers and four letters are accepted after high quality review. These papers cover a variety of important vision tasks: semantic segmentation, object detection, image synthesis, image retrieval, OCR, age estimation, etc. More specifically, there are two articles on semantic segmentation (Zhang and Pang, Ma et al.), two articles on scene text recognition (Gao et al., Wang et al.), one article on text image synthesis (Liao et al.), one article on gait-based age estimation (Zhu et al.), one letter on deep feature learning (Gao et al.), one letter on product image retrieval (Wang et al.), one letter on object detection (Cui et al.), and one letter on facial expression recognition (Wang et al.). All six research papers achieve the significant progresses in their corresponding vision tasks. (1) In “Progressive rectification network for irregular text recognition”, Gao et al. propose a progressive rectification network (PRN) for iteratively transforming irregular scene text into a front-horizontal view, resulting in the significant performance improvement of scene text recognition. (2) In “Ordinal distribution regression for gait-based age estimation”, by considering the ordinal relationship of ages as an important cue, Zhu et al. design a neural network for gait-based age estimation by a new loss function termed as ordinal distribution loss. This general method is not only limited to gait-based age estimation, but also can be used for face-based age estimation. (3) In “FACLSTM: ConvLSTM with focused attention for scene text recognition”, Wang et al. tackle scene text recognition problem from a spatiotemporal prediction perspective. They propose the ConvLSTM model for reading scene text from 2D space, by which attention mechanism and character center masks are further adopted for enhancing the recognition performance. (4) In “CGNet: cross-guidance network for semantic segmentation”, Zhang and Pang introduce a unified framework named cross guidance network (CGNet) for simultaneously extracting segmentation, edge, and salient features. With the guidance of edge and saliency detection network, more discriminative features are learned with CGNet for obviously enhancing the performance of semantic segmentation. (5) In “SynthText3D: synthesizing scene text images from 3D virtual worlds”, Liao et al. propose an unconventional approach for generating scene text images from the 3D virtual worlds. The synthetic images produced from 3D virtual worlds yield realistic visual effects, including complex perspective transforms, various illuminations, and occlusions, which can be used for training a stronger scene text detector. (6) In “Preserving details in semantics-aware context for scene parsing”, Ma et al. attempt to improve the spatial decoding process through embedding possibly lost low level information in a simple yet effective manner. This method well captures the fine image details, which are difficult to be handled by the FCNbased pipelines for semantic segmentation. Additionally, the four letters show their promising progresses in different vision tasks. Gao et al. present a discriminative stacked autoencoder (DSA) for learning a more robust feature representation.

Design of visual communication based on deep learning approaches

A Lightweight SE-YOLOv3 Network for Multi-Scale Object Detection in Remote Sensing Imagery.

Mutual Support and Promotion: Learning Structure Compensation and Context Completion for Low-Light Vision

Scene Classification in the Environmental Art Design by Using the Lightweight Deep Learning Model under the Background of Big Data

Deep Object Co-segmentation via Spatial-Semantic Network Modulation

Channel and Spatial Enhancement Network for human parsing

Special Focus on Deep Learning for Computer Vision

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition

Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

DEAR: a novel deep-level semantics feature reinforce framework for Infrared Small Object Segmentation

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Bio-inspired deep neural local acuity and focus learning for visual image recognition

Object Recognition via Adaptive Multi-level Feature Integration

Multi-Scale Interactive Network for Salient Object Detection

Study on the Application of Visual Communication Design in APP Interface Design in the Context of Deep Learning

Multi-layer Feature Aggregation for Deep Scene Parsing Models

OCNet: Object Context Network for Scene Parsing

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

MDFN: Multi-scale deep feature learning network for object detection

Learning Deep Conditional Neural Network for Image Segmentation

An Effective and Lightweight Hybrid Network for Object Detection in Remote Sensing Images