Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss

Jaeha Kim,Junghun Oh,Kyoung Mu Lee
2024-04-04
Abstract:In real-world scenarios, image recognition tasks, such as semantic segmentation and object detection, often pose greater challenges due to the lack of information available within low-resolution (LR) content. Image super-resolution (SR) is one of the promising solutions for addressing the challenges. However, due to the ill-posed property of SR, it is challenging for typical SR methods to restore task-relevant high-frequency contents, which may dilute the advantage of utilizing the SR method. Therefore, in this paper, we propose Super-Resolution for Image Recognition (SR4IR) that effectively guides the generation of SR images beneficial to achieving satisfactory image recognition performance when processing LR images. The critical component of our SR4IR is the task-driven perceptual (TDP) loss that enables the SR network to acquire task-specific knowledge from a network tailored for a specific task. Moreover, we propose a cross-quality patch mix and an alternate training framework that significantly enhances the efficacy of the TDP loss by addressing potential problems when employing the TDP loss. Through extensive experiments, we demonstrate that our SR4IR achieves outstanding task performance by generating SR images useful for a specific image recognition task, including semantic segmentation, object detection, and image classification. The implementation code is available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper mainly discusses the challenges encountered in low-resolution (LR) image recognition tasks and proposes a method called Super-Resolution for Image Recognition (SR4IR) to solve this problem. Traditional image super-resolution (SR) methods mainly restore high-resolution (HR) images through pixel loss or perceptual loss, but they may not be able to recover the features that are crucial for specific tasks, thereby limiting the advantages of SR in image recognition tasks. The core innovation of SR4IR lies in the proposed task-driven perceptual (TDP) loss, which enables the SR network to learn task-related knowledge from a network tailored for specific tasks, thereby recovering high-frequency details that contribute to improving task performance. In addition, the paper proposes a Cross-Quality Patch Mix (CQMix) data augmentation strategy to prevent the task network from learning biased features and further enhance the efficiency of the TDP loss. SR4IR adopts an alternating training framework, first training the SR network with the TDP loss and then training the task network with CQMix. Experimental results show that SR4IR significantly improves task performance in tasks such as semantic segmentation, object detection, and image classification, while the generated SR images are visually closer to HR images. Compared to baseline methods, SR4IR outperforms in multiple metrics, demonstrating its generality and effectiveness.