Abstract:Inferring predictive maps between multiple input and multiple output variables or tasks has innumerable applications in data science. Multi-task learning attempts to learn the maps to several output tasks simultaneously with information sharing between them. We propose a novel multi-task learning framework for sparse linear regression, where a full task hierarchy is automatically inferred from the data, with the assumption that the task parameters follow a hierarchical tree structure. The leaves of the tree are the parameters for individual tasks, and the root is the global model that approximates all the tasks. We apply the proposed approach to develop and evaluate: (a) predictive models of plant traits using large-scale and automated remote sensing data, and (b) GWAS methodologies mapping such derived phenotypes in lieu of hand-measured traits. We demonstrate the superior performance of our approach compared to other methods, as well as the usefulness of discovering hierarchical groupings between tasks. Our results suggest that richer genetic mapping can indeed be obtained from the remote sensing data. In addition, our discovered groupings reveal interesting insights from a plant science perspective.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively share information and discover the hierarchical structure between tasks in multi - task learning. Specifically, the author proposes a new multi - task learning framework, which can automatically infer the complete hierarchical structure of task parameters from the data, assuming that these task parameters follow a hierarchical tree structure. The paper mainly focuses on the applications in plant trait prediction modeling and genome - wide association study (GWAS). By using large - scale automated remote - sensing data, it improves the performance of the prediction model and reveals the hierarchical grouping between tasks, thus obtaining a richer genetic mapping. ### Main problems 1. **Information sharing in multi - task learning**: Traditional multi - task learning methods usually assume that the same information is shared among all tasks, which may overlook the specificity of the relationships between tasks. The method proposed in this paper aims to solve how and with whom to share information between tasks. 2. **Hierarchical structure of task parameters**: The author assumes that task parameters follow a hierarchical tree structure. The leaf nodes of the tree are the parameters of each task, and the root node is a global model that approximates all tasks. This method can automatically learn the hierarchical relationships between tasks from the data. 3. **Challenges in application fields**: In plant trait prediction modeling and GWAS, how to use large - scale automated remote - sensing data to improve the performance of the prediction model and reveal the hierarchical grouping between tasks, so as to obtain a richer genetic mapping. ### Solutions 1. **Multi - task linear regression model**: The author proposes a regularized regression problem, which simultaneously estimates task parameters and the hierarchical relationships between tasks by minimizing the loss function. 2. **Convex clustering penalty term**: A penalty term similar to convex clustering is introduced to encourage the sharing between task parameters, thus forming a hierarchical tree structure of task parameters. 3. **Optimization algorithm**: The proximal decomposition method is adopted to efficiently solve the proposed convex optimization problem, and the numerical convergence of the algorithm is proved. 4. **Theoretical guarantee**: The asymptotic properties of the estimators are provided, and it is proved that under certain conditions, the joint limit distribution of the estimators is concentrated on certain specific lines, reflecting the sharing relationships between task parameters. ### Application effects 1. **Simulation data experiment**: Experiments were carried out on synthetic data sets, and the results show that the proposed method is superior to other baseline methods in prediction accuracy. 2. **Practical application**: Experiments were carried out using real data in plant trait prediction modeling and GWAS. The results show that the proposed method not only improves the prediction accuracy, but also reveals interesting and interpretable relationships between tasks. In conclusion, by proposing a new multi - task learning framework, this paper solves the problems of information sharing and task hierarchical structure discovery in multi - task learning, and demonstrates its effectiveness and practicality in plant trait prediction modeling and GWAS.

Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties

Multitask Learning of Vegetation Biochemistry from Hyperspectral Data

Learning Task Grouping and Overlap in Multi-task Learning

Distributed Learning of Predictive Structures from Multiple Tasks over Networks

Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification.

Multi-task Sparse Structure Learning

Learning task structure via sparsity grouped multitask learning

Regularized multi-trait multi-locus linear mixed models for genome-wide association studies and genomic selection in crops

Multitask Spectral Clustering by Exploring Intertask Correlation

Co-Clustering for Multitask Learning

Phenotype Analysis of Arabidopsis thaliana Based on Optimized Multi-Task Learning

A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset

Multi-task Subspace Clustering

Doing More With Less: A Multitask Deep Learning Approach in Plant Phenotyping

SGW-based Multi-Task Learning in Vision Tasks

Encoding Tree Sparsity in Multi-Task Learning: A Probabilistic Framework.

Highly Scalable Task Grouping for Deep Multi-Task Learning in Prediction of Epigenetic Events

Multitask Bregman Clustering

Multi-Task Learning Architectures and Applications

Efficiently Identifying Task Groupings for Multi-Task Learning

Multi-Task Learning with Group-Specific Feature Space Sharing