Abstract:Black-box unsupervised domain adaptation (UDA) learns with source predictions of target data without accessing either source data or source models during training, and it has clear superiority in data privacy and flexibility in target network selection. However, the source predictions of target data are often noisy and training with them is prone to learning collapses. We propose BiMem, a bi-directional memorization mechanism that learns to remember useful and representative information to correct noisy pseudo labels on the fly, leading to robust black-box UDA that can generalize across different visual recognition tasks. BiMem constructs three types of memory, including sensory memory, short-term memory, and long-term memory, which interact in a bi-directional manner for comprehensive and robust memorization of learnt features. It includes a forward memorization flow that identifies and stores useful features and a backward calibration flow that rectifies features' pseudo labels progressively. Extensive experiments show that BiMem achieves superior domain adaptation performance consistently across various visual recognition tasks such as image classification, semantic segmentation and object detection.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the model training collapse problem caused by the noise in the pseudo - labels of the target data in Black - box Unsupervised Domain Adaptation (UDA). Specifically, the existing black - box UDA methods rely on the initial predictions of the target data by the source model during the training process, and these predictions often contain errors, leading to the "forgetting" phenomenon in the self - training process. That is, the model can better learn the information of the target domain in the early stage, but as the training progresses, the accumulated pseudo - label noise makes the model performance gradually decline, and even lower than the model trained only with the source - domain data. To overcome this challenge, the paper proposes BiMem, a two - way memory mechanism, aiming to correct the pseudo - label noise by constructing and calibrating three types of memories (sensory memory, short - term memory, and long - term memory), thereby achieving more stable and effective black - box UDA. ### Main Contributions 1. **General Framework**: Designed BiMem, a general black - box UDA framework applicable to different visual recognition tasks. To the best of the authors' knowledge, this is the first work to explore and benchmark black - box UDA on different visual recognition tasks. 2. **Memory Mechanism**: Designed three types of memories that interact in a two - way manner, reducing the "forgetting" of useful and representative features, improving the accuracy of the pseudo - labels of the target data, and thus achieving better adaptation effects in black - box UDA. 3. **Experimental Verification**: Extensive experiments on multiple benchmark datasets show that BiMem has achieved superior performance in computer vision tasks such as image classification, semantic segmentation, and object detection. ### Method Overview The core idea of BiMem is to solve the "forgetting" problem in black - box UDA by constructing and calibrating three types of memories: - **Sensory Memory**: Buffers the features of the current batch to capture fresh knowledge. - **Short - Term Memory**: Actively selects and stores difficult samples from the sensory memory, which usually have high classification uncertainty. - **Long - Term Memory**: Stores global and representative information by class - wise compression and accumulation of all features removed from the sensory memory and short - term memory. ### Memory Update and Calibration - **Forward Memory Flow**: Updates the sensory memory, short - term memory, and long - term memory to ensure the capture of fresh and representative information. - **Backward Calibration Flow**: Calibrates the short - term memory through the long - term memory, and jointly calibrates the sensory memory through the calibrated short - term memory and long - term memory, gradually correcting the pseudo - labels of the features. ### Experimental Results The paper conducted experiments on multiple visual tasks, including semantic segmentation (GTA5 → Cityscapes and SYNTHIA → Cityscapes), object detection (Cityscapes → Foggy Cityscapes and SYNTHIA → Cityscapes), and image classification (Office - Home and Office - 31). The experimental results show that BiMem significantly outperforms the existing black - box UDA methods on all tasks, especially in semantic segmentation and object detection tasks. ### Conclusion BiMem effectively solves the "forgetting" problem in black - box UDA by constructing and calibrating three types of memories, improving the adaptability and robustness of the model in different visual tasks. This method provides a new solution for black - box UDA and is expected to be widely used in practical applications.

Black-box Unsupervised Domain Adaptation with Bi-directional Atkinson-Shiffrin Memory

A New Bidirectional Unsupervised Domain Adaptation Segmentation Framework

Unsupervised Domain Adaptation With Class-Aware Memory Alignment

Memory Bank for Unsupervised Domain Adaptation Person Retrieval

Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory

Memory-Assisted Sub-Prototype Mining for Universal Domain Adaptation

CMFT: Contrastive Memory Feature Transfer for Non-shared-and-Imbalanced Unsupervised Domain Adaption

Unsupervised Domain Adaptation for Neuron Membrane Segmentation based on Structural Features

Unsupervised Domain Adaptation for Brain Structure Segmentation Via Mutual Information Maximization Alignment

Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain

Memorizing Comprehensively to Learn Adaptively: Unsupervised Cross-Domain Person Re-ID with Multi-level Memory

Adversarial Unsupervised Domain Adaptation for 3D Semantic Segmentation with 2D Image Fusion of Dense Depth

Adaptive Memorization with Group Labels for Unsupervised Person Re-identification

Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation

FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation

Approximate & memorize: Settling opposing views in replay-based continuous unsupervised domain adaptation

Reducing Bi-Level Feature Redundancy for Unsupervised Domain Adaptation

Unsupervised Domain Adaptation with Joint Domain-Adversarial Reconstruction Networks

Co-MDA: Federated Multisource Domain Adaptation on Black-Box Models

Brain-Inspired Domain-Incremental Adaptive Detection for Autonomous Driving

Unsupervised Scene Adaptation with Memory Regularization in Vivo.