CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

Lin Zhu,Yifeng Yang,Qinying Gu,Xinbing Wang,Chenghu Zhou,Nanyang Ye
2024-01-01
Abstract:Recent vision-language pre-trained models (VL-PTMs) have shown remarkablesuccess in open-vocabulary tasks. However, downstream use cases often involvefurther fine-tuning of VL-PTMs, which may distort their general knowledge andimpair their ability to handle distribution shifts. In real-world scenarios,machine learning systems inevitably encounter both covariate shifts (e.g.,changes in image styles) and semantic shifts (e.g., test-time unseen classes).This highlights the importance of enhancing out-of-distribution (OOD)generalization on covariate shifts and simultaneously detectingsemantic-shifted unseen classes. Thus a critical but underexplored questionarises: How to improve VL-PTMs' generalization ability to closed-set OOD data,while effectively detecting open-set unseen classes during fine-tuning? In thispaper, we propose a novel objective function of OOD detection that also servesto improve OOD generalization. We show that minimizing the gradient magnitudeof energy scores on training data leads to domain-consistent Hessians ofclassification loss, a strong indicator for OOD generalization revealed bytheoretical analysis. Based on this finding, we have developed a unifiedfine-tuning framework that allows for concurrent optimization of both tasks.Extensive experiments have demonstrated the superiority of our method. The codeis available at https://github.com/LinLLLL/CRoFT.
What problem does this paper attempt to address?