Abstract:Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hampers dataset construction and restricts generalization across night scenes in different datasets. Moreover, UDA, focusing on network architecture and training strategies, faces difficulties in handling classes with few domain similarities. In this paper, we leverage Prompt Images Guidance (PIG) to enhance UDA with supplementary night knowledge. We propose a Night-Focused Network (NFNet) to learn night-specific features from both target domain images and prompt images. To generate high-quality pseudo-labels, we propose Pseudo-label Fusion via Domain Similarity Guidance (FDSG). Classes with fewer domain similarities are predicted by NFNet, which excels in parsing night features, while classes with more domain similarities are predicted by UDA, which has rich labeled semantics. Additionally, we propose two data augmentation strategies: the Prompt Mixture Strategy (PMS) and the Alternate Mask Strategy (AMS), aimed at mitigating the overfitting of the NFNet to a few prompt images. We conduct extensive experiments on four night-time datasets: NightCity, NightCity+, Dark Zurich, and ACDC. The results indicate that utilizing PIG can enhance the parsing accuracy of UDA.

What problem does this paper attempt to address?

The paper primarily addresses the challenges in Night-time Scene Parsing (NTSP) by proposing a novel solution. Specifically, the paper aims to tackle the following key issues: 1. **Dataset Limitation**: The existing night-time image datasets are limited in number and have high annotation costs, which restricts the training and performance improvement of night-time scene parsing models. 2. **Limitations of Unsupervised Domain Adaptation (UDA) Methods**: Current mainstream night-time scene parsing methods rely on day-time and night-time image pairs to guide model training. This approach not only increases the cost of dataset construction but also has weak generalization capabilities across different datasets. Additionally, existing UDA methods struggle to handle categories with low domain similarity. 3. **Night-time Feature Extraction**: Due to the unique lighting conditions of night-time scenes, the knowledge learned directly from day-time images may not be well applicable to night-time image parsing. To address the above issues, the paper proposes the "Prompt Images Guidance" (PIG) method. Specifically, this method includes the following core contributions: 1. **Night-Focused Network (NFNet)**: This is a network specifically designed for night-time image parsing. It focuses on learning night-time specific features by incorporating a small number of annotated night-time "prompt images." NFNet is co-trained with traditional UDA methods, but NFNet only receives night-time images as input to avoid interference from day-time images, thereby better learning night-time features. 2. **Pseudo-label Fusion via Domain Similarity Guidance (FDSG)**: To generate high-quality pseudo-labels, the authors propose a fusion mechanism that uses Learned Perceptual Image Patch Similarity (LPIPS) to evaluate domain similarity between the source and target domains, guiding the fusion of UDA and NFNet predictions. For categories with low domain similarity, predictions are made by NFNet; for categories with high similarity, predictions are made by UDA. This improves overall parsing accuracy. 3. **Data Augmentation Strategies**: To fully utilize the information in the small number of prompt images, the paper introduces two data augmentation strategies—"Prompt Mixture Strategy" (PMS) and "Alternate Mask Strategy" (AMS). These strategies help the network better learn night-time features, reduce overfitting, and improve robustness to the distribution of image center and edges. Experimental results show that using the PIG method significantly improves the parsing accuracy of UDA models on multiple night-time datasets, demonstrating the effectiveness and practicality of the method.

PIG: Prompt Images Guidance for Night-Time Scene Parsing

See Clearer at Night: Towards Robust Nighttime Semantic Segmentation Through Day-Night Image Conversion

Boosting Night-Time Scene Parsing With Learnable Frequency

Improving Nighttime Driving-Scene Segmentation via Dual Image-adaptive Learnable Filters

Towards Dynamic and Small Objects Refinement for Unsupervised Domain Adaptative Nighttime Semantic Segmentation

Nighttime Road Scene Parsing by Unsupervised Domain Adaptation

2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection

Improving Panoptic Segmentation for Nighttime or Low-Illumination Urban Driving Scenes

Alignment and fusion for adaptive domain nighttime semantic segmentation

LoopDA: Constructing Self-loops to Adapt Nighttime Semantic Segmentation

NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night

SFNet-N: An Improved SFNet Algorithm for Semantic Segmentation of Low-Light Autonomous Driving Road Scenes

Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection

Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement

Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

Disentangled Contrastive Image Translation for Nighttime Surveillance

Latent domain knowledge distillation for nighttime semantic segmentation

Progressive Bidirectional Feature Extraction and Enhancement Network for Quality Evaluation of Night-time Images

NOD: Taking a Closer Look at Detection under Extreme Low-Light Conditions with Night Object Detection Dataset

Nighttime Semantic Segmentation with Unsupervised Learning and Cross Attention

Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation