Abstract:Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian process factor models to characterize multi-population interactions has grown. However, the cubic runtime scaling of current methods with the length of experimental trials and the number of recorded populations (groups) precludes their application to large-scale multi-population recordings. Here, we improve this scaling from cubic to linear in both trial length and group number. We present two approximate approaches to fitting multi-group Gaussian process factor models based on (1) inducing variables and (2) the frequency domain. Empirically, both methods achieved orders of magnitude speed-up with minimal impact on statistical performance, in simulation and on neural recordings of hundreds of neurons across three brain areas. The frequency domain approach, in particular, consistently provided the greatest runtime benefits with the fewest trade-offs in statistical performance. We further characterize the estimation biases introduced by the frequency domain approach and demonstrate effective strategies to mitigate them. This work enables a powerful class of analysis techniques to keep pace with the growing scale of multi-population recordings, opening new avenues for exploring brain function.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the computational scalability problem of multi - group Gaussian process factor models when dealing with large - scale multi - group neural recordings. Specifically, with the development of neural recording techniques, researchers are able to record the activities of a large number of neurons from multiple brain regions, cortical layers and cell types simultaneously. The analysis of such high - dimensional data requires more powerful statistical tools to describe the interactions between different neural populations. However, the existing multi - group Gaussian process factor models have the problem of high computational complexity when dealing with long - time - series or multi - group data. The running time of these methods increases cubically with the length of experimental trials and the number of recorded groups, which limits their application to large - scale data sets. To solve this problem, the authors propose two methods to accelerate the fitting of multi - group Gaussian process factor models: 1. **Method based on inducing variables**: By introducing a small number of inducing variables to approximate the original data, thereby reducing the computational burden. 2. **Method based on the frequency domain**: Transform the model into the frequency domain for inference and fitting, and use the diagonal covariance matrix in the frequency domain to achieve linear computational complexity. Both of these methods can significantly improve computational efficiency while maintaining statistical performance, enabling these models to be applied to larger - scale multi - group neural recording data. In particular, the frequency - domain method provides the greatest running - time advantage in most cases and has almost no loss in statistical performance. ### Formula summary - **Observation model**: \[ y_{m,n,t}=C_mx_{m,n,t}+d_m+\varepsilon_m \] where \(\varepsilon_m\sim\mathcal{N}(0,(\Phi_m)^{-1})\), \(\Phi_m = \text{diag}(\phi_{m,1},\dots,\phi_{m,q_m})\). - **Gaussian process state model**: \[ \begin{bmatrix} x_{1,n,j,:}\\ \vdots\\ x_{M,n,j,:} \end{bmatrix}\sim\mathcal{N}\left(0,\begin{bmatrix} K_{1,1,j}&\cdots&K_{1,M,j}\\ \vdots&\ddots&\vdots\\ K_{M,1,j}&\cdots&K_{M,M,j} \end{bmatrix}\right) \] where \(K_{m_1,m_2,j}(t_1,t_2)=\left(1-\sigma_j^2\right)\exp\left(-\frac{(\Delta t)^2}{2\tau_j^2}\right)+\sigma_j^2\cdot\delta_{\Delta t}\), \(\Delta t=(t_2 - D_{m_2,j})-(t_1 - D_{m_1,j})\). - **Posterior inference and fitting**: By maximizing the evidence lower bound (ELBO) through variational inference, that is: \[ \log P(Y)\geq L(Q,\Omega)=\mathbb{E}_Q[\log P(Y,\theta|\Omega)]-\mathbb{E}_Q[\log Q(\theta)] \] The proposal of these methods enables multi - group Gaussian process factor models to better adapt to the rapidly developing scale of neuroscience data and provides a new way to explore brain functions.

Fast Multi-Group Gaussian Process Factor Models

Residual Gaussian Process: A Tractable Nonparametric Bayesian Emulator for Multi-Fidelity Simulations

Conditionally-Conjugate Gaussian Process Factor Analysis for Spike Count Data via Data Augmentation

Scalable Multi-Task Gaussian Process Tensor Regression for Normative Modeling of Structured Variation in Neuroimaging Data

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

Thresholded Multiscale Gaussian Processes with Application to Bayesian Feature Selection for Massive Neuroimaging Data

Efficient Learning Algorithms for Gaussian Processes

Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features

Discovering Temporally Compositional Neural Manifolds with Switching Infinite GPFA

Group Integrative Dynamic Factor Models With Application to Multiple Subject Brain Connectivity

Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data

Thoughts on Massively Scalable Gaussian Processes

Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions

Learning Coupled Subspaces for Multi-Condition Spike Data

Infinite-Horizon Gaussian Processes

Large-Scale Gaussian Processes via Alternating Projection

Nonparametric Bayesian Mixed-effect Model: a Sparse Gaussian Process Approach

GmGM: a Fast Multi-Axis Gaussian Graphical Model

Extended Poisson Gaussian-Process Latent Variable Model for Unsupervised Neural Decoding

Monotonic Gaussian Process for spatio-temporal disease progression modeling in brain imaging data