Sparse partial least squares with group and subgroup structure

Matthew Sutton,Rodolphe Thiébaut,Benoît Liquet
DOI: https://doi.org/10.1002/sim.7821
2018-06-11
Statistics in Medicine
Abstract:Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at https://github.com/matt-sutton/sgspls.
public, environmental & occupational health,medicine, research & experimental,medical informatics,mathematical & computational biology,statistics & probability
What problem does this paper attempt to address?