A feature ranking algorithm for clustering medical data

Eran Shpigelman,Ron Shamir
DOI: https://doi.org/10.1101/2023.09.30.23296349
2024-11-03
Abstract:Objective: Clustering methods are often applied to electronic medical records (EMR) for various objectives, including the discovery of previously unrecognized disease subtypes. The abundance and redundancy of information in EMR data raises the need to rank the features by their relevance to clustering. Methods: Here we propose FRIGATE, an ensemble feature ranking algorithm for clustering. FRIGATE ranks the features by solving multiple clustering problems on subgroups of features, using game-theoretic principles to rank and weigh features. In every such problem, a Shapley-like framework is utilized to rank a selected set of features. In another version of the algorithm, multiplicative weights are employed to reduce the randomness in feature set selection. The code for the algorithms is available in: https://github.com/Shamir-Lab/FRIGATE. Results: On simulated data and on eleven real genomics and EMR datasets, FRIGATE outperforms extant ensemble ranking algorithms, in solution quality and in speed. Conclusion: Frigate can improve disease understanding by enabling better subtype discovery from EMR data.
Health Informatics
What problem does this paper attempt to address?