Deep multi-task learning with flexible and compact architecture search

Jiejie Zhao,Weifeng Lv,Bowen Du,Junchen Ye,Leilei Sun,Guixi Xiong
DOI: https://doi.org/10.1007/s41060-021-00274-0
2021-07-24
International Journal of Data Science and Analytics
Abstract:Multi-task learning has been applied successfully in various applications. Recent research shows that the performance of multi-task learning methods could be improved by appropriately sharing model architectures. However, the existing work either identifies multi-task architecture manually based on prior knowledge, or simply uses an identical model structure for all tasks with a parameter sharing mechanism. In this paper, we propose a novel architecture search method to discover flexible and compact architectures for deep multi-task learning automatically, which not only extends the expressiveness of existing reinforcement learning-based neural architecture search methods, but also enhances the flexibility of existing hand-crafted multi-task learning methods. The discovered architecture shares structure and parameters adaptively to handle different levels of task relatedness, resulting in effectiveness improvement. In particular, for deep multi-task learning, we propose an architecture search space which includes a combination of partially shared modules at the low-level layer, and a set of task-specific modules with various depths at high-level layers. Secondly, a parameter generation mechanism is proposed to not only explore all possible cross-layer connections, but also reduce the search cost. Thirdly, we propose a task-specific shadow batch normalization mechanism to stabilize the training process and improve the search effectiveness. Finally, an auxiliary module is designed to guide the model training process. Experimental results demonstrate that the learned architectures outperform state-of-the-art methods with fewer learning parameters.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to automatically discover flexible and compact model architectures in deep multi - task learning to improve the effectiveness of the shared structure among different tasks. Specifically, existing multi - task learning methods either manually identify multi - task architectures based on prior knowledge or simply use the same model structure and parameter - sharing mechanism for all tasks. However, these methods are difficult to find the optimal architecture when dealing with a large number of tasks and deep networks, and may lead to performance degradation when task correlations are low. Therefore, this paper proposes a new architecture - search method, aiming to automatically discover flexible and compact architectures suitable for deep multi - task learning, which not only extends the expressive power of existing reinforcement - learning - based neural - architecture - search methods but also enhances the flexibility of existing hand - designed multi - task learning methods. The key to the paper lies in proposing an effective architecture - search space and parameter - generation mechanism, as well as a task - specific shadow batch - normalization mechanism to stabilize the training process and improve the search effect. In addition, an auxiliary module is designed to guide the model - training process. Verified by experiments, the learned architecture outperforms existing state - of - the - art methods with fewer learning parameters.