Abstract:The dissertation presents four key contributions toward fairness and robustness in vision learning. First, to address the problem of large-scale data requirements, the dissertation presents a novel Fairness Domain Adaptation approach derived from two major novel research findings of Bijective Maximum Likelihood and Fairness Adaptation Learning. Second, to enable the capability of open-world modeling of vision learning, this dissertation presents a novel Open-world Fairness Continual Learning Framework. The success of this research direction is the result of two research lines, i.e., Fairness Continual Learning and Open-world Continual Learning. Third, since visual data are often captured from multiple camera views, robust vision learning methods should be capable of modeling invariant features across views. To achieve this desired goal, the research in this thesis will present a novel Geometry-based Cross-view Adaptation framework to learn robust feature representations across views. Finally, with the recent increase in large-scale videos and multimodal data, understanding the feature representations and improving the robustness of large-scale visual foundation models is critical. Therefore, this thesis will present novel Transformer-based approaches to improve the robust feature representations against multimodal and temporal data. Then, a novel Domain Generalization Approach will be presented to improve the robustness of visual foundation models. The research's theoretical analysis and experimental results have shown the effectiveness of the proposed approaches, demonstrating their superior performance compared to prior studies. The contributions in this dissertation have advanced the fairness and robustness of machine vision learning.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in machine vision learning to achieve more fair and robust visual perception capabilities. Specifically, the paper mainly focuses on the following four aspects of problems: 1. **Large - scale data - dependence problem**: - Current visual learning methods usually rely on large - scale labeled data, and the data - labeling process is both expensive and time - consuming. To solve this problem, the paper proposes a new **Fairness Domain Adaptation** method, by introducing **Bijective Maximum Likelihood** and **Fairness Adaptation Learning Framework** to reduce the dependence on large - scale labeled data. 2. **Unfair prediction problem**: - Due to unbalanced data distribution, current visual models will produce unfair prediction results in practical applications, especially in applications involving humans. For this reason, the paper proposes an **Open - world Fairness Continual Learning Framework**, which combines the research directions of **Fairness Continual Learning** and **Open - world Continual Learning** to improve the fairness of the model. 3. **Cross - view feature - modeling problem**: - Visual data usually comes from multiple camera perspectives, so robust methods that can model invariant features across views are required. The paper proposes a Geometry - based Cross - view Adaptation framework to learn robust feature representations across views. 4. **Large - scale multi - modal data - understanding problem**: - With the increase of large - scale videos and multi - modal data, it is crucial to understand and improve the robustness of large - scale visual foundation models. The paper proposes some new Transformer - based methods. By introducing new self - attention mechanisms and learning objectives, it improves the robust feature representations of multi - modal and temporal data, and proposes a new Domain Generalization Approach to enhance the robustness of visual foundation models. Through these contributions, the paper aims to promote the fairness and robustness of machine vision learning in the open - world environment, thus getting closer to human capabilities in visual perception tasks.

Towards Robust and Fair Vision Learning in Open-World Environments

Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments

Improving Fairness using Vision-Language Driven Image Augmentation

Fairness-aware Vision Transformer via Debiased Self-Attention

FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding

Robust Multiview Feature Learning for RGB-D Image Understanding

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

Fairness in Deep Learning: A Survey on Vision and Language Research

The Robustness Limits of SoTA Vision Models to Natural Variation

Fairness and Bias Mitigation in Computer Vision: A Survey

Learning Robust Visual-Semantic Embeddings

Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics

Improving Viewpoint Robustness for Visual Recognition via Adversarial Training

Evaluating Robustness of Vision Transformers on Imbalanced Datasets (Student Abstract)

FairViT: Fair Vision Transformer via Adaptive Masking

Fast & Fair: Efficient Second-Order Robust Optimization for Fairness in Machine Learning

Fairness in Large Language Models in Three Hours

EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding

Geometric Understanding of Discriminability and Transferability for Visual Domain Adaptation

To be Robust or to be Fair: Towards Fairness in Adversarial Training

Vision transformers in domain adaptation and domain generalization: a study of robustness