Abstract:Recent advances in generalized image understanding have seen a surge in the use of deep convolutional neural networks (CNN) across a broad range of image-based detection, classification and prediction tasks. Whilst the reported performance of these approaches is impressive, this study investigates the hitherto unapproached question of the impact of commonplace image and video compression techniques on the performance of such deep learning architectures. Focusing on the JPEG and H.264 (MPEG-4 AVC) as a representative proxy for contemporary lossy image/video compression techniques that are in common use within network-connected image/video devices and infrastructure, we examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation. As such, within this study we include a variety of network architectures and domains spanning end-to-end convolution, encoder-decoder, region-based CNN (R-CNN), dual-stream, and generative adversarial networks (GAN). Our results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied. Notably, performance decreases significantly below a JPEG quality (quantization) level of 15% and a H.264 Constant Rate Factor (CRF) of 40. However, retraining said architectures on pre-compressed imagery conversely recovers network performance by up to 78.4% in some cases. Furthermore, there is a correlation between architectures employing an encoder-decoder pipeline and those that demonstrate resilience to lossy image compression. The characteristics of the relationship between input compression to output task performance can be used to inform design decisions within future image/video devices and infrastructure.

Position, Padding and Predictions: A Deeper Look at Position Information in CNNs

Attention cutting and padding learning for fine-grained image recognition

How Can CNNs Use Image Position for Segmentation?

Padding Module: Learning the Padding in Deep Neural Networks

Improving Translation Invariance in Convolutional Neural Networks with Peripheral Prediction Padding

Context-aware Padding for Semantic Segmentation

Capsule Network Performance on Complex Data

Interpolation-Aware Padding for 3D Sparse Convolutional Neural Networks

A Frustratingly Easy Improvement for Position Embeddings Via Random Padding

Padding-Aware Learned Image Compression.

CNNs Avoid Curse of Dimensionality by Learning on Patches

Pushing the Limits of Capsule Networks

Reversible Data Hiding based on optimized CNN predictor and Prediction Error Expansion with Lower Surround Background Complexity

CNN Fixations: An unraveling approach to visualize the discriminative image regions

Learning to predict crisp boundaries

Design of compensation algorithms for zero padding and its application to a patch based deep neural network

On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures

Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

Understanding the Role of Pathways in a Deep Neural Network

Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks

An Investigation on The Position Encoding in Vision-Based Dynamics Prediction