Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond

Dimitrios Kollias,Viktoriia Sharmanska,Stefanos Zafeiriou
2024-01-03
Abstract:Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve a key problem in multi - task learning (MTL), that is, how to effectively perform joint learning of multiple classification tasks when there is little or no overlap in labeled data. Specifically, the paper explores how to achieve effective co - training through task - relatedness when there are large differences in the amount of labeled data between different tasks, thereby avoiding the occurrence of negative transfer and improving the overall performance of the model. ### Main contributions of the paper 1. **Propose a flexible framework**: This framework can accommodate different classification tasks and model task - relatedness by encoding prior knowledge between tasks. In the experiment, the author evaluated two effective task - relatedness strategies: one is obtained from domain knowledge, for example, based on cognitive research; the other is inferred from dataset annotations. 2. **Propose a weakly - supervised learning method**: This method couples tasks with less or no overlapping labeled data through distribution matching and label co - annotation. The paper considers multiple application scenarios and is divided into two case studies: affective computing and other application areas (such as face recognition, fine - grained species classification, footwear type recognition, and clothing category recognition). 3. **Conduct extensive experimental research**: Experiments were carried out using 9 databases, demonstrating the effectiveness of the proposed method in terms of network - agnostic, performance improvement, and prevention of negative transfer. The experimental results show that this method outperforms existing methods on all tasks and databases and successfully prevents negative transfer. ### Key technologies - **Distribution Matching Loss (\(L_{DM}\))**: By matching the predicted distributions of the expression task and the AU task, the model generates more consistent predictions during the training process. The specific formula is as follows: \[ L_{DM}=\mathbb{E}_x\left[\sum_{i = 1}^M\left[-p(y_i^{au}|x)\log q(y_i^{au}|x)\right]\right] \] where \(q(y_i^{au}|x)\) is a mixed distribution based on basic expression categories: \[ q(y_i^{au}|x)=\sum_{y^{exp}\in\{1,\ldots,7\}}p(y^{exp}|x)p(y_i^{au}|y^{exp}) \] - **Soft Co - Annotation Loss (\(L_{SCA}\))**: By using AU labels to generate soft expression labels and matching them with the model's prediction results, it provides additional training support in partially or non - label - overlapping images. The specific formula is as follows: \[ L_{SCA}=\mathbb{E}_x\left[\sum_{y^{exp}\in\{1,\ldots,7\}}\left[-p(y^{exp}|x)\log q(y^{exp}|x)\right]\right] \] where the soft expression label \(q(y^{exp}|x)\) is calculated as follows: \[ q(y^{exp}|x)=\frac{e^{I(y^{exp}|x)}}{\sum_{y'^{exp}\in\{1,\ldots,7\}}e^{I(y'^{exp}|x)}} \] and the indication score \(I(y^{exp})\) is calculated as follows: