More or fewer latent variables in the high-dimensional data space? That is the question

Francesco Edoardo Vaccari,Stefano Diomedi,Edoardo Bettazzi,Matteo Filippini,Marina De Vitis,Kostas Hadjidimitrakis,Patrizia Fattori
DOI: https://doi.org/10.1101/2024.11.28.625854
2024-12-09
Abstract:Dimensionality reduction is widely used in modern Neuroscience to process massive neural recordings data. Despite the development of complex non-linear techniques, linear algorithms, in particular Principal Component Analysis (PCA), are still the gold standard. However, there is no consensus on how to estimate the optimal number of latent variables to retain. In this study, we addressed this issue by testing different criteria on simulated data. Parallel analysis and cross validation proved to be the best methods, being largely unaffected by the number of units and the amount of noise. Parallel analysis was quite conservative and tended to underestimate the number of dimensions especially in low-noise regimes, whereas in these conditions cross validation provided slightly better estimates. Both criteria consistently estimate the ground truth when 100+ units were available. As an exemplary application to real data, we estimated the dimensionality of the spiking activity in two macaque parietal areas during different phases of a delayed reaching task. We show that different criteria can lead to different trends in the estimated dimensionality. These apparently contrasting results are reconciled when the implicit definition of dimensionality underlying the different criteria is considered. Our findings suggest that the term "dimensionality" needs to be defined carefully and, more importantly, that the most robust criteria for choosing the number of dimensions should be adopted in future works. To help other researchers with the implementation of such an approach on their data, we provide a simple software package, and we present the results of our simulations through a simple Web based app to guide the choice of latent variables in a variety of new studies.
Neuroscience
What problem does this paper attempt to address?