Continual Learning Meets Multimodal Foundation Models: Fundamentals and Advances

Wenbin Li,Fan Qi,Rui Yan,Hongguang Zhang,Wang Lei,Jinhui Tang,Jiebo Luo
DOI: https://doi.org/10.1145/3688859.3690083
2024-01-01
Abstract:To deal with the dynamic changes of multimedia applications, incorporating new knowledge into existing models to adapt to new problems is a fundamental challenge of computer vision. With the advancement of multimodal foundation models, there is a growing interest in enhancing their generalization abilities through continual learning to process diverse data types, from text to visuals, and continuously update their capabilities based on real-time inputs. This technology improves models' robustness and functionality when handling new multimedia contents and modalities. Consequently, continual learning has emerged as a pivotal paradigm in machine learning, leveraging the continuous refinement of multimodal foundation models through fine-tuning. The growing and widespread interest in this direction demonstrates its relevance and complexity. Our workshop aims to provide a venue where academic researchers and industry practitioners can come together to discuss the principles, limitations and applications of multimodal foundation models in continual learning for multimedia applications, and promote the understanding of multimodal foundation models in continual learning, innovative algorithms, and research on new multimodal technologies and applications.
What problem does this paper attempt to address?