Feature Partitioning for Efficient Multi-Task Architectures

Alejandro Newell,Lu Jiang,Chong Wang,Li-Jia Li,Jia Deng
DOI: https://doi.org/10.48550/arXiv.1908.04339
2019-08-13
Abstract:Multi-task learning holds the promise of less data, parameters, and time than training of separate models. We propose a method to automatically search over multi-task architectures while taking resource constraints into consideration. We propose a search space that compactly represents different parameter sharing strategies. This provides more effective coverage and sampling of the space of multi-task architectures. We also present a method for quick evaluation of different architectures by using feature distillation. Together these contributions allow us to quickly optimize for efficient multi-task models. We benchmark on Visual Decathlon, demonstrating that we can automatically search for and identify multi-task architectures that effectively make trade-offs between task resource requirements while achieving a high level of final performance.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to automatically search for an efficient model architecture in multi - task learning while considering resource constraints (such as computing resources, the number of parameters, etc.). Specifically, the author hopes to find a method that can effectively share parameters and operations among different tasks, so as to reduce the average node usage of each task and improve the final performance. In addition, the author also hopes to quickly evaluate different architectures through feature distillation, thereby accelerating the search process. ### Problem Background Multi - task learning allows multiple tasks to share different parts of the same model, which can not only improve the generalization ability of the model, but also reduce the amount of data, the number of iterations, and the total number of parameters required for training. However, finding a multi - task architecture that can maximize performance under resource constraints is a complex problem. Different tasks have different requirements for the model, so resources need to be effectively allocated and shared. ### Core Contributions of the Paper 1. **Parametric Representation**: The author proposes a parametric method for compactly representing different parameter - sharing strategies in multi - task architectures. This method reduces the redundancy of the search space and can more effectively explore and sample the space of multi - task architectures. 2. **Feature Distillation Evaluation**: In order to accelerate the architecture search, the author introduces the method of feature distillation. Through distillation, the effectiveness of different architectures can be quickly evaluated without fully training the model. Feature distillation provides a direct training signal by comparing the activation of shared layers and single - task layers. 3. **Experimental Verification**: The author conducted experiments on the Visual Decathlon dataset to verify that the proposed search strategy can effectively identify efficient multi - task architectures that make resource trade - offs between different tasks. ### Specific Methods - **Feature Partitioning**: The author achieves fine - grained resource sharing by partitioning feature channels within each layer. Each task can dynamically adjust the number of feature channels used according to its requirements. Through binary masks, which channels are used or updated can be controlled during forward propagation and backward propagation. - **Partitioning Parameterization**: In order to effectively search for feature partitioning strategies, the author defines a matrix \(P\), where the diagonal elements represent the proportion of feature channels used by each task, and the non - diagonal elements represent the degree of sharing between tasks. In this way, the complex discrete search problem can be transformed into an optimization problem in a continuous space. - **Optimization Strategy**: The author adopts two methods, random sampling and evolutionary strategy, to optimize the parameter matrix \(P\). The evolutionary strategy guides the search direction through gradient approximation and gives priority to task configurations that use fewer channels. - **Sample Evaluation**: Through feature distillation, the author can evaluate the performance of different architectures in a short time without fully training the model. The distillation process is carried out by minimizing the mean - squared - error (MSE) loss between the output features of shared layers and single - task layers. ### Summary The main goal of this paper is to find a multi - task architecture that can maximize performance under resource constraints through automatic search and rapid evaluation. The method proposed by the author not only improves the search efficiency, but also provides a new perspective for multi - task learning, that is, better managing resource sharing between tasks through fine - grained feature partitioning.