Addressing class imbalance in functional data clustering

Catherine Higgins,Michelle Carey
DOI: https://doi.org/10.1007/s11634-024-00611-8
2024-11-19
Advances in Data Analysis and Classification
Abstract:The goal of functional clustering is twofold: first, to categorize curves with similar temporal behaviors into separate clusters, and second, to obtain a representative curve that summarizes the typical temporal behavior within each cluster. An important challenge in current functional clustering techniques is class imbalance, where some clusters contain a significantly greater number of curves than others. While class imbalance is extensively addressed in supervised classification, it remains relatively unexplored in unsupervised contexts. To address this gap, we propose adapting the iterative hierarchical clustering approach, originally designed for multivariate data, to the context of functional data. Thus introducing a novel method called functional iterative hierarchical clustering (funIHC) to effectively handle the clustering of imbalanced functional data. Through comprehensive simulation studies and benchmarking datasets, we demonstrate the effectiveness of the funIHC approach. Utilizing funIHC on gene expression data related to human influenza infection induced by the H3N2 virus, we identify five distinct and biologically meaningful patterns of gene expression. The R and MATLAB code for implementing funIHC is freely accessible at www.fdaatucd.com.
statistics & probability
What problem does this paper attempt to address?