A two-sample tree-based test for hierarchically organized genomic signals

Pierre Neuvial,Nathanaël Randriamihamison,Marie Chavent,Sylvain Foissac,Nathalie Vialaneix
DOI: https://doi.org/10.1093/jrsssc/qlae011
2024-03-14
Abstract:Abstract This article addresses a common type of data encountered in genomic studies, where a signal along a linear chromosome exhibits a hierarchical organization. We propose a novel framework to assess the significance of dissimilarities between two sets of genomic matrices obtained from distinct biological conditions. Our approach relies on a data representation based on trees. It utilizes tree distances and an aggregation procedure for tests performed at the level of leaf pairs. Numerical experiments demonstrate its statistical validity and its superior accuracy and power compared to alternatives. The method’s effectiveness is illustrated using real-world data from GWAS and Hi-C data.
statistics & probability
What problem does this paper attempt to address?