SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

Junsu Kim,Hoseong Cho,Jihyeon Kim,Yihalem Yimolal Tiruneh,Seungryul Baek
2024-05-07
Abstract:In the field of class incremental learning (CIL), generative replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the complexities of scenes involving multiple labels. In this paper, we propose a novel approach called stable diffusion deep generative replay (SDDGR) for CIOD. Our method utilizes a diffusion-based generative model with pre-trained text-to-diffusion networks to generate realistic and diverse synthetic images. SDDGR incorporates an iterative refinement strategy to produce high-quality images encompassing old classes. Additionally, we adopt an L2 knowledge distillation technique to improve the retention of prior knowledge in synthetic images. Furthermore, our approach includes pseudo-labeling for old objects within new task images, preventing misclassification as background elements. Extensive experiments on the COCO 2017 dataset demonstrate that SDDGR significantly outperforms existing algorithms, achieving a new state-of-the-art in various CIOD scenarios. The source code will be made available to the public.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper mainly discusses a specific problem in Class-Incremental Learning (CIL), which is Class-Incremental Object Detection (CIOD). In CIL, the model needs to continuously learn new categories without forgetting old knowledge, and in object detection tasks, this challenge is even more complex due to the possibility of multiple labels in a scene. The paper proposes a new method called Stable Diffusion Deep Generative Replay (SDDGR) to address the catastrophic forgetting problem in CIOD. SDDGR utilizes a pre-trained text-to-image diffusion model to generate realistic and diverse synthetic images that contain objects from old categories. Through iterative refinement strategies, SDDGR is able to generate high-quality images and utilizes L2 knowledge distillation technique to improve the retention of old knowledge in the synthetic images. Additionally, the method includes pseudo-labeling of old objects in new task images to prevent them from being misclassified as background. Experiments are conducted on the COCO 2017 dataset, and the results demonstrate that SDDGR significantly outperforms existing algorithms, achieving new state-of-the-art performance in various CIOD scenarios. Compared to traditional generative models such as GAN and VAE, SDDGR utilizes more advanced diffusion models, especially in handling multi-label scenarios like CIOD, demonstrating better performance. Through a series of methods in SDDGR, researchers are able to effectively alleviate catastrophic forgetting problem without using actual old data, thus improving the model's continual learning ability in detecting multiple labeled scenes.