A simple extension of Azadkia & Chatterjee's rank correlation to multi-response vectors

Jonathan Ansari,Sebastian Fuchs
2024-07-18
Abstract:Recently, Chatterjee (2023) recognized the lack of a direct generalization of his rank correlation $\xi$ in Azadkia and Chatterjee (2021) to a multi-dimensional response vector. As a natural solution to this problem, we here propose an extension of $\xi$ that is applicable to a set of $q \geq 1$ response variables, where our approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation $\xi$ to it. Our novel measure $T$ quantifies the scale-invariant extent of functional dependence of a response vector $\mathbf{Y} = (Y_1,\dots,Y_q)$ on predictor variables $\mathbf{X} = (X_1, \dots,X_p)$, characterizes independence of $\mathbf{X}$ and $\mathbf{Y}$ as well as perfect dependence of $\mathbf{Y}$ on $\mathbf{X}$ and hence fulfills all the characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various invariance results for $T$ as well as a closed-form expression in multivariate normal models. Building upon the graph-based estimator for $\xi$ in Azadkia and Chatterjee (2021), we obtain a non-parametric, strongly consistent estimator for $T$ and show its asymptotic normality. Based on this estimator, we develop a model-free and dependence-based feature ranking and forward feature selection for multiple-outcome data. Simulation results and real case studies illustrate $T$'s broad applicability.
Statistics Theory,Methodology
What problem does this paper attempt to address?