Towards robust and generalizable representations of extracellular data using contrastive learning

Ankit Vishnubhotla,Charlotte Loh,Liam Paninski,Akash Srivastava,Cole Hurwitz
DOI: https://doi.org/10.1101/2023.10.30.564831
2024-02-21
Abstract:Contrastive learning is quickly becoming an essential tool in neuroscience for extracting robust and meaningful representations of neural activity. Despite numerous applications to neuronal population data, there has been little exploration of how these methods can be adapted to key primary data analysis tasks such as spike sorting or cell-type classification. In this work, we propose a novel contrastive learning framework, ( ontrastive mbeddings for xtracellular ata), for high-density extracellular recordings. We demonstrate that through careful design of the network architecture and data augmentations, it is possible to generically extract representations that far outperform current specialized approaches. We validate our method across multiple high-density extracellular recordings. All code used to run CEED can be found at .
Neuroscience
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of existing methods in processing high - density extracellular recording data, especially their performance in spike sorting and cell - type classification tasks. Specifically: 1. **Spike sorting**: Current methods such as Principal Component Analysis (PCA) are effective and scalable, but have several key drawbacks: - Lack of robustness to extracellular interfering variables (such as temporally and spatially overlapping spikes, correlated background noise, or changes in spike detection time). - Unable to model nonlinear data. - The objective function aims to find features that explain variance rather than features that distinguish different waveforms. 2. **Cell - type classification**: Existing feature extraction methods usually rely on manually extracted features (such as action potential width, peak - to - peak amplitude, etc.). These features are scalable and effective, but are too simple and arbitrary to fully capture differences in morpho - electrical properties. Some recent nonlinear methods (such as WaveMap) use UMAP and Louvain community detection to automatically discover cell - type clusters, but still have limitations. To solve these problems, the paper proposes a new contrastive learning framework - **Contrastive Embeddings for Extracellular Data (CEED)**, which aims to extract low - dimensional representations that are invariant to common and task - specific interfering variables through contrastive learning. Specific goals include: - **Robustness and generalization ability**: By designing network architectures and data augmentation strategies, make the extracted representations far outperform current specialized methods. - **Multi - task applicability**: Verify the effectiveness of this method on multiple high - density extracellular recording data, not only limited to spike sorting, but also including cell - type classification. - **Zero - shot learning**: Demonstrate the ability to perform cell - type classification on unseen animals and probe geometries. Through these improvements, CEED aims to provide a more robust and general - purpose feature learning method to address the challenges in extracellular recording data analysis.