Abstract:Fine-tuning foundation models often compromises their robustness to distribution shifts. To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features. However, not all pre-trained features are robust and those methods are largely indifferent to which ones to preserve. We propose dual risk minimization (DRM), which combines empirical risk minimization with worst-case risk minimization, to better preserve the core features of downstream tasks. In particular, we utilize core-feature descriptions generated by LLMs to induce core-based zero-shot predictions which then serve as proxies to estimate the worst-case risk. DRM balances two crucial aspects of model robustness: expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks. DRM significantly improves the out-of-distribution performance of CLIP ViT-L/14@336 on ImageNet (75.9 to 77.1), WILDS-iWildCam (47.1 to 51.8), and WILDS-FMoW (50.7 to 53.1); opening up new avenues for robust fine-tuning. Our code is available at <a class="link-external link-https" href="https://github.com/vaynexie/DRM" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to maintain the model's robustness to distribution changes when fine - tuning zero - shot models. Specifically, existing fine - tuning methods often sacrifice the model's generalization ability on unseen data while improving the performance of specific tasks, resulting in a significant decline in the model's performance on out - of - distribution (OOD) data. To solve this problem, the authors propose the Dual Risk Minimization (DRM) method, which combines Empirical Risk Minimization (ERM) and Worst - Case Risk Minimization (WRM) to better preserve the core features of downstream tasks and thus improve the model's robustness. ### Main Contributions 1. **Propose Dual Risk Minimization (DRM)**: - Combine ERM and WRM, and solve the infeasibility of WRM through innovative use of concept descriptions, thereby improving the robustness of downstream tasks when fine - tuning zero - shot models. 2. **Emphasize Two Aspects of Robustness**: - Point out that robustness involves not only the expected performance (i.e., the average performance) but also the worst - case performance. Most existing works only focus on one of these aspects, while DRM provides a simple and effective method to balance these two important aspects. 3. **Establish New Best Performances on Multiple Benchmarks**: - In multiple real - world benchmark tests, DRM significantly outperforms the existing best methods. For example, on the CLIP ViT - L/14@336 model, DRM improves the OOD performance on ImageNet from 75.9% to 77.1%, on WILDS - iWildCam from 47.1% to 51.8%, and on WILDS - FMoW from 50.7% to 53.1%. ### Method Overview - **Data Generation Model**: - Input variable \(X\) and target variable \(Y\) are generated by core feature \(X_c\), non - core feature \(X_n\) and exogenous noise \(\epsilon\). - **Idealized Dual Risk Minimization (IDRM)**: - Optimize the expected performance of the model on all possible domains \(D\), while ensuring that the worst - case performance does not exceed a certain threshold \(\alpha\). - **Dual Risk Minimization (DRM)**: - Approximate the worst - case risk by introducing a regularization term \(R_c^s(\theta)\), thereby transforming IDRM into a solvable optimization problem. - **Fine - tune Using Zero - shot Models**: - Use default prompts for ERM and concept descriptions for WRM. Concept descriptions are generated by large - language models (such as GPT - 4) and are used to capture the core visual features of each category. ### Experimental Results - **Performance on Multiple Benchmarks**: - DRM performs well in multiple benchmark tests, especially significantly outperforming the baseline methods in OOD performance. - Compared with FLYP, DRM achieves relative improvements of 5.0%, 12.4% and 11.1% on the three benchmarks of ImageNet, iWildCam and FMoW respectively on the CLIP ViT - B/16 model. ### Summary This paper successfully solves the problem of robustness decline when fine - tuning zero - shot models by proposing the DRM method, providing new ideas and methods for improving the model's performance on OOD data.

Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Robust Fine-tuning of Zero-shot Models via Variance Reduction

Domain-Specific Risk Minimization for Out-of-Distribution Generalization

The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness

Efficient and Versatile Robust Fine-Tuning of Zero-shot Models

Domain-Specific Risk Minimization for Domain Generalization

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Minimizing Embedding Distortion for Robust Out-of-Distribution Performance

Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study

Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning

Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness

Robust Fine-Tuning of Vision-Language Models for Domain Generalization

Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness

Robust Sparse Regularization: Simultaneously Optimizing Neural Network Robustness and Compactness

AugLoss: A Robust Augmentation-based Fine Tuning Methodology

TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization

Toward Adversarial Robustness via Semi-supervised Robust Training

Towards robust neural networks via a global and monotonically decreasing robustness training strategy