CLUSTER ANALYSIS AND NETWORK COMMUNITY DETECTION WITH APPLICATION TO NEUROSCIENCE

Yun Zhang
2017-01-01
Abstract:Sustained efforts have been devoted to understanding schizophrenia and related disorders. This dissertation is inspired from two conceptually important problems in schizophrenia research and we overcome statistical challenges inherent in solving these problems. Basic neurobiological studies have unveiled distinct subtypes of schizophrenia. Moreover, genetic evidence shows certain core features are shared between schizophrenia and other disorders. It is of scientific interest to examine similarities in the profiles of subtypes in different disorders, which may help to develop novel therapeutic approaches. To address this challenge, we develop a statistical framework to assess whether or not clusters identified from independent populations exhibit commonalities. As an initial step, we formulate our hypotheses by borrowing the concept of bioequivalence under a finite normal mixture framework. We then propose testing procedures for univariate data based on the idea of two one-sided test (TOST) that has been used in the analysis of pharmaceutical bioequivalence trials. In an attempt to boost power, we propose to use a methodology based on bootstrap confidence intervals. Neurocognitive research studies functional brain networks aiming to improve the understanding of the cognitive deficits in subjects with schizophrenia. One important problem in the inference for brain connectivity networks concerns partitioning of functionally distinct brain regions. The brain segmentation problem can be viewed conceptually as a community detection problem in network analysis. The stochastic block model (SBM) and its variants are popular models used in community detection for network data. In this research, we propose a feature adjusted stochastic block model (FASBM) to capture the impact of node features on the network links as well as to detect the residual community structure beyond that explained by the node features. The proposed model can accommodate multiple node features and estimate the form of feature impacts from the data. Moreover, unlike many existing algorithms that are limited to binary-valued interactions, the proposed FASBM model and inference approaches are easily applied to relational data that generates from any exponential family distribution. We illustrate the methods on simulated networks and on three real world networks: a brain network, an US air-transportation network and a friendship network.
What problem does this paper attempt to address?