Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification

Zhaorui Tan,Xi Yang,Qiufeng Wang,Anh Nguyen,Kaizhu Huang
2024-10-16
Abstract:Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper mainly explores the relationship between logical reasoning and deep learning in the generalization ability of image classification tasks and proposes a regularization method based on logical reasoning (referred to as L-Reg). Specifically, the paper attempts to solve the following two key problems: 1. **How is logical reasoning related to visual tasks (such as image classification)?** - The authors explore how to learn a good general logical relationship between images and labels when training an image classifier by combining the image classification process with a logical research framework. This logical relationship is formed by combining the semantics generated by the encoder and the classifier to form atomic formulas. 2. **How to derive a regularization term from logical reasoning to improve generalization ability?** - The authors introduce a sample-based logical regularization term, L-Reg, and reveal the effect of L-Reg in reducing model complexity. L-Reg simplifies the model by balancing feature distribution and reducing extreme values in the classifier weights. This allows the model to extract key features (such as faces) for classification, thereby improving generalization ability. ### Main Contributions - **Proposing L-Reg**: A new regularization method that enhances the generalization ability of image classification models through logical reasoning. - **Theoretical Analysis**: The effectiveness of L-Reg is demonstrated through theoretical analysis, especially in multi-domain generalization (mDG) and generalized category discovery (GCD) tasks. - **Experimental Validation**: The effectiveness of L-Reg under different generalization settings is validated through experiments on multiple benchmark datasets, including multi-domain generalization, generalized category discovery, and complex scenarios involving both. ### Experimental Results - **Multi-Domain Generalization (mDG)**: On datasets such as PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, L-Reg significantly improved the performance of the GMDG baseline model, especially on datasets where the baseline model performed poorly. - **Generalized Category Discovery (GCD)**: On the PIM dataset, L-Reg improved the model's ability to recognize unknown categories, particularly in terms of average performance on known and unknown categories. - **Multi-Domain Generalization and Generalized Category Discovery (mDG + GCD)**: In complex scenarios involving unknown categories and unseen domains, L-Reg still performed excellently, further validating its effectiveness in practical applications. ### Conclusion L-Reg effectively improves the performance of image classification models under different generalization settings through logical reasoning, especially when dealing with unseen domains and unknown categories. This method not only enhances the model's generalization ability but also improves the model's interpretability.