Bayesian model-based clustering for populations of network data

Anastasia Mantziou,Simon Lunagomez,Robin Mitra
DOI: https://doi.org/10.48550/arXiv.2107.03431
2023-06-20
Abstract:There is increasing appetite for analysing populations of network data due to the fast-growing body of applications demanding such methods. While methods exist to provide readily interpretable summaries of heterogeneous network populations, these are often descriptive or ad hoc, lacking any formal justification. In contrast, principled analysis methods often provide results difficult to relate back to the applied problem of interest. Motivated by two complementary applied examples, we develop a Bayesian framework to appropriately model complex heterogeneous network populations, whilst also allowing analysts to gain insights from the data, and make inferences most relevant to their needs. The first application involves a study in Computer Science measuring human movements across a University. The second analyses data from Neuroscience investigating relationships between different regions of the brain. While both applications entail analysis of a heterogeneous population of networks, network sizes vary considerably. We focus on the problem of clustering the elements of a network population, where each cluster is characterised by a network representative. We take advantage of the Bayesian machinery to simultaneously infer the cluster membership, the representatives, and the community structure of the representatives, thus allowing intuitive inferences to be made. The implementation of our method on the human movement study reveals interesting movement patterns of individuals in clusters, readily characterised by their network representative. For the brain networks application, our model reveals a cluster of individuals with different network properties of particular interest in Neuroscience. The performance of our method is additionally validated in extensive simulation studies.
Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a Bayesian framework that can appropriately model complex heterogeneous network data, while allowing analysts to gain insights from the data and make inferences most relevant to their needs. Specifically, the paper focuses on the clustering problem in network populations, where each cluster is characterized by a network representative. By leveraging the Bayesian method, the paper aims to simultaneously infer cluster membership, representatives, and the community structure of representatives, thereby achieving intuitive inferences. ### Background and Motivation of the Paper With the progress of technology, the application range of network data is continuously expanding, and the demand for analyzing network data populations is also increasing. Although there are already some methods that can provide easily interpretable summaries of heterogeneous network populations, these methods are usually descriptive or ad - hoc and lack a formal theoretical basis. On the contrary, principle - based analysis methods, although providing more rigorous results, are often difficult to be directly applied to practical problems. The paper illustrates the effectiveness of its method through two complementary practical examples: 1. **Application in the field of computer science**: Studying the movement patterns of people on a university campus. Each display device represents a node in the network, and when a person moves from one display device to another, an edge is assumed to exist between these two nodes. By effectively modeling these data, important insights into behavior patterns can be provided for analysts. 2. **Application in the field of neuroscience**: Studying the connection patterns between different brain regions. Each network observation represents an individual's brain connection pattern at rest. Through cluster analysis, groups of individuals with different network characteristics can be revealed, which is of great significance for neuroscience research. ### Method Overview The paper proposes a mixture model based on the measurement error model to identify clusters in network populations. Specifically, it is assumed that each network in the network population is a noisy realization of some true underlying network. The advantage of this method is that it can decouple the statistical model from the underlying cluster - specific network properties, thereby providing a flexible model - based method that can both detect clusters in network populations and interpret these clusters according to the model parameterization. ### Main Contributions 1. **Model Flexibility**: The proposed Bayesian framework is flexible enough to combine different modeling assumptions to meet the needs of specific applications. 2. **Joint Inference**: By using the Markov chain Monte Carlo (MCMC) method, the cluster membership of the network, model parameters, and the structure of the underlying network representatives can be simultaneously inferred. 3. **Outlier Detection**: The framework can also be extended to detect abnormal network observations in the data, which is very useful for identifying individuals with special network characteristics. ### Conclusion The paper demonstrates the effectiveness and wide applicability of its methods through two practical application cases. These methods can not only detect clusters in network populations but also infer key differences between different clusters by comparing the underlying network representatives. In addition, this framework can also be used to identify observations that do not follow the distribution of most network data, thereby providing valuable insights for practical problems.