Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Kaican Li,Weiyan Xie,Yongxiang Huang,Didan Deng,Lanqing Hong,Zhenguo Li,Ricardo Silva,Nevin L. Zhang
2024-11-29
Abstract:Fine-tuning foundation models often compromises their robustness to distribution shifts. To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features. However, not all pre-trained features are robust and those methods are largely indifferent to which ones to preserve. We propose dual risk minimization (DRM), which combines empirical risk minimization with worst-case risk minimization, to better preserve the core features of downstream tasks. In particular, we utilize core-feature descriptions generated by LLMs to induce core-based zero-shot predictions which then serve as proxies to estimate the worst-case risk. DRM balances two crucial aspects of model robustness: expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks. DRM significantly improves the out-of-distribution performance of CLIP ViT-L/14@336 on ImageNet (75.9 to 77.1), WILDS-iWildCam (47.1 to 51.8), and WILDS-FMoW (50.7 to 53.1); opening up new avenues for robust fine-tuning. Our code is available at <a class="link-external link-https" href="https://github.com/vaynexie/DRM" rel="external noopener nofollow">this https URL</a> .
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to maintain the model's robustness to distribution changes when fine - tuning zero - shot models. Specifically, existing fine - tuning methods often sacrifice the model's generalization ability on unseen data while improving the performance of specific tasks, resulting in a significant decline in the model's performance on out - of - distribution (OOD) data. To solve this problem, the authors propose the Dual Risk Minimization (DRM) method, which combines Empirical Risk Minimization (ERM) and Worst - Case Risk Minimization (WRM) to better preserve the core features of downstream tasks and thus improve the model's robustness. ### Main Contributions 1. **Propose Dual Risk Minimization (DRM)**: - Combine ERM and WRM, and solve the infeasibility of WRM through innovative use of concept descriptions, thereby improving the robustness of downstream tasks when fine - tuning zero - shot models. 2. **Emphasize Two Aspects of Robustness**: - Point out that robustness involves not only the expected performance (i.e., the average performance) but also the worst - case performance. Most existing works only focus on one of these aspects, while DRM provides a simple and effective method to balance these two important aspects. 3. **Establish New Best Performances on Multiple Benchmarks**: - In multiple real - world benchmark tests, DRM significantly outperforms the existing best methods. For example, on the CLIP ViT - L/14@336 model, DRM improves the OOD performance on ImageNet from 75.9% to 77.1%, on WILDS - iWildCam from 47.1% to 51.8%, and on WILDS - FMoW from 50.7% to 53.1%. ### Method Overview - **Data Generation Model**: - Input variable \(X\) and target variable \(Y\) are generated by core feature \(X_c\), non - core feature \(X_n\) and exogenous noise \(\epsilon\). - **Idealized Dual Risk Minimization (IDRM)**: - Optimize the expected performance of the model on all possible domains \(D\), while ensuring that the worst - case performance does not exceed a certain threshold \(\alpha\). - **Dual Risk Minimization (DRM)**: - Approximate the worst - case risk by introducing a regularization term \(R_c^s(\theta)\), thereby transforming IDRM into a solvable optimization problem. - **Fine - tune Using Zero - shot Models**: - Use default prompts for ERM and concept descriptions for WRM. Concept descriptions are generated by large - language models (such as GPT - 4) and are used to capture the core visual features of each category. ### Experimental Results - **Performance on Multiple Benchmarks**: - DRM performs well in multiple benchmark tests, especially significantly outperforming the baseline methods in OOD performance. - Compared with FLYP, DRM achieves relative improvements of 5.0%, 12.4% and 11.1% on the three benchmarks of ImageNet, iWildCam and FMoW respectively on the CLIP ViT - B/16 model. ### Summary This paper successfully solves the problem of robustness decline when fine - tuning zero - shot models by proposing the DRM method, providing new ideas and methods for improving the model's performance on OOD data.