Feature selection with vector-symbolic architectures: a case study on microbial profiles of shotgun metagenomic samples of colorectal cancer

Fabio Cumbo,Simone Truglia,Emanuel Weitschek,Daniel Blankenberg
DOI: https://doi.org/10.1101/2024.11.18.624180
2024-11-20
Abstract:The continuingly decreasing cost of next-generation sequencing has recently led to a significant increase in the number of microbiome-related studies, providing invaluable information for understanding host-microbiome interactions and their relation to diseases. A common approach in metagenomics consists of determining the composition of samples in terms of the amount and types of microbial species that populate them, with the goal to identify microbes whose profiles are able to differentiate samples under different conditions with advanced feature selection techniques. Here we propose a novel backward variable selection method based on the hyperdimensional computing paradigm, which takes inspiration from how the human brain works in the classification of concepts by encoding features into vectors in a high-dimensional space. We validated our method on public metagenomic samples collected from patients affected by colorectal cancer in a case/control scenario, by performing a comparative analysis with other state-of-the-art feature selection methods, obtaining promising results.
Biology
What problem does this paper attempt to address?