Evren Gokcen,Anna I. Jasper,Adam Kohn,Christian K. Machens,Byron M. Yu
Abstract:Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian process factor models to characterize multi-population interactions has grown. However, the cubic runtime scaling of current methods with the length of experimental trials and the number of recorded populations (groups) precludes their application to large-scale multi-population recordings. Here, we improve this scaling from cubic to linear in both trial length and group number. We present two approximate approaches to fitting multi-group Gaussian process factor models based on (1) inducing variables and (2) the frequency domain. Empirically, both methods achieved orders of magnitude speed-up with minimal impact on statistical performance, in simulation and on neural recordings of hundreds of neurons across three brain areas. The frequency domain approach, in particular, consistently provided the greatest runtime benefits with the fewest trade-offs in statistical performance. We further characterize the estimation biases introduced by the frequency domain approach and demonstrate effective strategies to mitigate them. This work enables a powerful class of analysis techniques to keep pace with the growing scale of multi-population recordings, opening new avenues for exploring brain function.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the computational scalability problem of multi - group Gaussian process factor models when dealing with large - scale multi - group neural recordings. Specifically, with the development of neural recording techniques, researchers are able to record the activities of a large number of neurons from multiple brain regions, cortical layers and cell types simultaneously. The analysis of such high - dimensional data requires more powerful statistical tools to describe the interactions between different neural populations.
However, the existing multi - group Gaussian process factor models have the problem of high computational complexity when dealing with long - time - series or multi - group data. The running time of these methods increases cubically with the length of experimental trials and the number of recorded groups, which limits their application to large - scale data sets.
To solve this problem, the authors propose two methods to accelerate the fitting of multi - group Gaussian process factor models:
1. **Method based on inducing variables**: By introducing a small number of inducing variables to approximate the original data, thereby reducing the computational burden.
2. **Method based on the frequency domain**: Transform the model into the frequency domain for inference and fitting, and use the diagonal covariance matrix in the frequency domain to achieve linear computational complexity.
Both of these methods can significantly improve computational efficiency while maintaining statistical performance, enabling these models to be applied to larger - scale multi - group neural recording data. In particular, the frequency - domain method provides the greatest running - time advantage in most cases and has almost no loss in statistical performance.
### Formula summary
- **Observation model**:
\[
y_{m,n,t}=C_mx_{m,n,t}+d_m+\varepsilon_m
\]
where \(\varepsilon_m\sim\mathcal{N}(0,(\Phi_m)^{-1})\), \(\Phi_m = \text{diag}(\phi_{m,1},\dots,\phi_{m,q_m})\).
- **Gaussian process state model**:
\[
\begin{bmatrix}
x_{1,n,j,:}\\
\vdots\\
x_{M,n,j,:}
\end{bmatrix}\sim\mathcal{N}\left(0,\begin{bmatrix}
K_{1,1,j}&\cdots&K_{1,M,j}\\
\vdots&\ddots&\vdots\\
K_{M,1,j}&\cdots&K_{M,M,j}
\end{bmatrix}\right)
\]
where \(K_{m_1,m_2,j}(t_1,t_2)=\left(1-\sigma_j^2\right)\exp\left(-\frac{(\Delta t)^2}{2\tau_j^2}\right)+\sigma_j^2\cdot\delta_{\Delta t}\), \(\Delta t=(t_2 - D_{m_2,j})-(t_1 - D_{m_1,j})\).
- **Posterior inference and fitting**:
By maximizing the evidence lower bound (ELBO) through variational inference, that is:
\[
\log P(Y)\geq L(Q,\Omega)=\mathbb{E}_Q[\log P(Y,\theta|\Omega)]-\mathbb{E}_Q[\log Q(\theta)]
\]
The proposal of these methods enables multi - group Gaussian process factor models to better adapt to the rapidly developing scale of neuroscience data and provides a new way to explore brain functions.