A new multi-task learning method with universum data

Yanshan Xiao,Jing Wen,Bo Liu
DOI: https://doi.org/10.1007/s10489-020-01954-3
IF: 5.3
2020-11-13
Applied Intelligence
Abstract:Multi-task learning (MTL) obtains a better classifier than single-task learning (STL) by sharing information between tasks within the multi-task models. Most existing multi-task learning models only focus on the data of the target tasks during training, and ignore the data of non-target tasks that may be contained in the target tasks. In this way, Universum data can be added to classifier training as prior knowledge, and these data do not belong to any indicated categories. In this paper, we address the problem of multi-task learning with Universum data, which improves utilization of non-target task data. We introduce Universum learning to make non-target task data act as prior knowledge and propose a novel multi-task support vector machine with Universum data (U-MTLSVM). Based on the characteristics of MTL, each task have corresponding Universum data to provide prior knowledge. We then utilize the Lagrange method to solve the optimization problem so as to obtain the multi-task classifiers. Then, conduct experiments to compare the performance of the proposed method with several baslines on different data sets. The experimental results demonstrate the effectiveness of the proposed methods for multi-task classification.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively utilize non - target task data (i.e., Universum data) in multi - task learning (MTL) to improve the classification performance of the model. Specifically, the author proposes a new multi - task support vector machine method (U - MTLSVM). By introducing Universum learning, non - target task data can participate in model training as prior knowledge, thus improving the effect of multi - task classification. ### Background and Problem Description In traditional multi - task learning, most models only focus on the data of target tasks and ignore the non - target task data that may exist in the target tasks. Although these non - target task data do not belong to any specific category, they belong to the same problem domain and can provide useful prior knowledge, which is helpful to improve the generalization ability of the model. However, how to effectively utilize these non - target task data is a challenge. ### Solution To solve this problem, the author proposes the U - MTLSVM method, and its main contributions are as follows: 1. **Introducing Universum Learning**: - By introducing Universum data into multi - task learning, prior knowledge is constructed and data utilization is improved. - Each task has corresponding Universum data, which is used to encode the prior knowledge information of the training set. 2. **Optimizing the Model**: - The Lagrange multiplier method is used to transform the original objective model into its dual problem, and then the model is optimized to obtain the classifier. - U - MTLSVM not only integrates the information of the original samples, but also integrates the implicit information in the Universum samples, improving data utilization and the generalization ability of the model. 3. **Experimental Verification**: - The performance of the U - MTLSVM framework is evaluated through extensive experiments. - The experimental results show that this method is superior to the existing multi - task learning methods in terms of performance and noise resistance. ### Mathematical Model The optimization problem of U - MTLSVM can be expressed as: \[ \begin{aligned} & \min_{\omega_0, v_t, \psi_{ut}, \psi^*_{ut}, \xi_{it}} \frac{1}{2} \| \omega_0 \|^2 + \frac{1}{2} \mu \sum_{t = 1}^T \| v_t \|^2 + C \sum_{i = 1}^m \sum_{t = 1}^T \xi_{it} + D \sum_{u = 1}^U \sum_{t = 1}^T (\psi_{ut} + \psi^*_{ut}) \\ & \text{s.t.} \quad y_{it} (\omega_0 + v_t) \cdot \phi(x_{it}) \geq 1 - \xi_{it} \\ & \quad (\omega_0 + v_t) \cdot \phi(x^*_{ut}) \geq -\epsilon - \psi_{ut} \\ & \quad (\omega_0 + v_t) \cdot \phi(x^*_{ut}) \leq \epsilon + \psi^*_{ut} \\ & \quad \xi_{it} \geq 0, \psi_{ut} \geq 0, \psi^*_{ut} \geq 0 \end{aligned} \] where: - \( T \) represents the number of tasks. - \( m \) represents the amount of data of the \( t \) - th task. - \( U \) represents the amount of Universum data of the \( t \) - th task. - \( \mu \) is a non - negative trade - off parameter that controls the preference between tasks. - \( C \) and \( D \) are penalty parameters for task data and Universum data. - \( \xi_{it} \) is the slack variable of task data. - \( \psi_{ut} \) and \( \psi^*_{ut} \)