PIG: Prompt Images Guidance for Night-Time Scene Parsing

Zhifeng Xie,Rui Qiu,Sen Wang,Xin Tan,Yuan Xie,Lizhuang Ma
DOI: https://doi.org/10.1109/TIP.2024.3415963
2024-06-15
Abstract:Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hampers dataset construction and restricts generalization across night scenes in different datasets. Moreover, UDA, focusing on network architecture and training strategies, faces difficulties in handling classes with few domain similarities. In this paper, we leverage Prompt Images Guidance (PIG) to enhance UDA with supplementary night knowledge. We propose a Night-Focused Network (NFNet) to learn night-specific features from both target domain images and prompt images. To generate high-quality pseudo-labels, we propose Pseudo-label Fusion via Domain Similarity Guidance (FDSG). Classes with fewer domain similarities are predicted by NFNet, which excels in parsing night features, while classes with more domain similarities are predicted by UDA, which has rich labeled semantics. Additionally, we propose two data augmentation strategies: the Prompt Mixture Strategy (PMS) and the Alternate Mask Strategy (AMS), aimed at mitigating the overfitting of the NFNet to a few prompt images. We conduct extensive experiments on four night-time datasets: NightCity, NightCity+, Dark Zurich, and ACDC. The results indicate that utilizing PIG can enhance the parsing accuracy of UDA.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the challenges in Night-time Scene Parsing (NTSP) by proposing a novel solution. Specifically, the paper aims to tackle the following key issues: 1. **Dataset Limitation**: The existing night-time image datasets are limited in number and have high annotation costs, which restricts the training and performance improvement of night-time scene parsing models. 2. **Limitations of Unsupervised Domain Adaptation (UDA) Methods**: Current mainstream night-time scene parsing methods rely on day-time and night-time image pairs to guide model training. This approach not only increases the cost of dataset construction but also has weak generalization capabilities across different datasets. Additionally, existing UDA methods struggle to handle categories with low domain similarity. 3. **Night-time Feature Extraction**: Due to the unique lighting conditions of night-time scenes, the knowledge learned directly from day-time images may not be well applicable to night-time image parsing. To address the above issues, the paper proposes the "Prompt Images Guidance" (PIG) method. Specifically, this method includes the following core contributions: 1. **Night-Focused Network (NFNet)**: This is a network specifically designed for night-time image parsing. It focuses on learning night-time specific features by incorporating a small number of annotated night-time "prompt images." NFNet is co-trained with traditional UDA methods, but NFNet only receives night-time images as input to avoid interference from day-time images, thereby better learning night-time features. 2. **Pseudo-label Fusion via Domain Similarity Guidance (FDSG)**: To generate high-quality pseudo-labels, the authors propose a fusion mechanism that uses Learned Perceptual Image Patch Similarity (LPIPS) to evaluate domain similarity between the source and target domains, guiding the fusion of UDA and NFNet predictions. For categories with low domain similarity, predictions are made by NFNet; for categories with high similarity, predictions are made by UDA. This improves overall parsing accuracy. 3. **Data Augmentation Strategies**: To fully utilize the information in the small number of prompt images, the paper introduces two data augmentation strategies—"Prompt Mixture Strategy" (PMS) and "Alternate Mask Strategy" (AMS). These strategies help the network better learn night-time features, reduce overfitting, and improve robustness to the distribution of image center and edges. Experimental results show that using the PIG method significantly improves the parsing accuracy of UDA models on multiple night-time datasets, demonstrating the effectiveness and practicality of the method.