Abstract:Recently, graph (network) data is an emerging research area in artificial intelligence, machine learning and statistics. In this work, we are interested in whether node's labels (people's responses) are affected by their neighbor's features (friends' characteristics). We propose a novel latent logistic regression model to describe the network dependence with binary responses. The key advantage of our proposed model is that a latent binary indicator is introduced to indicate whether a node is susceptible to the influence of its neighbour. A score-type test is proposed to diagnose the existence of network dependence. In addition, an EM-type algorithm is used to estimate the model parameters under network dependence. Extensive simulations are conducted to evaluate the performance of our method. Two public datasets are used to illustrate the effectiveness of the proposed latent logistic regression model.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in graph (network) data, whether the labels of nodes (such as people's responses) are affected by the characteristics of their neighbors (such as the characteristics of friends). Specifically, the authors focus on the existence and quantification of network dependence in binary response data. To explore this problem, they propose a logistic regression model with a latent binary indicator, which can describe whether a node is susceptible to the characteristics of its neighbors. In simpler terms, this paper mainly wants to figure out whether a person's behavior or preference in a social network or similar structure will change because of the behavior and preference of his or her friends or contacts, and detect and quantify this influence by proposing a new statistical model. ### Model Features 1. **Introduction of Latent Binary Indicator**: A latent variable \(\zeta_i\) is introduced into the model to represent whether the \(i\) -th node is sensitive to the characteristics of its neighbors. 2. **Network Dependence Detection**: A score - type test method is proposed to diagnose whether there is network dependence in the logistic regression model. 3. **Parameter Estimation**: The EM algorithm is used to estimate the model parameters, ensuring the consistency and good performance of the estimators. ### Mathematical Expression The specific form of the model is: \[ P(Y_i = 1 | X_i, \zeta_i) = \frac{\exp\left(\beta_0 + X_i'\beta + \delta \zeta_i \sum_{j = 1}^n a_{ij} X_j' \beta\right)}{1 + \exp\left(\beta_0 + X_i'\beta + \delta \zeta_i \sum_{j = 1}^n a_{ij} X_j' \beta\right)}, \] where: - \(Y_i\) is the binary label of the \(i\) -th node, - \(X_i\) is the feature vector of the \(i\) -th node, - \(\zeta_i\) is the latent binary indicator, - \(a_{ij}\) is an element of the adjacency matrix \(A\), indicating whether there is an edge connection between node \(i\) and node \(j\), - \(\beta_0\) is the intercept term, - \(\beta\) is the regression coefficient vector, - \(\delta\) represents the strength of a node's dependence on its neighbors. In addition, the probability distribution of the latent variable \(\zeta_i\) is: \[ P(\zeta_i = 1 | X_i) = \frac{\exp(\gamma_0 + X_i' \gamma)}{1 + \exp(\gamma_0 + X_i' \gamma)}. \] Through the above model, the authors can better understand the influence of network structure on node labels and provide effective tools to detect and quantify this influence.

A Latent Logistic Regression Model with Graph Data

A Latent Moving Average Model for Network Regression

Fitting Network Data Based on Latent Cluster Model

Semisupervised regression in latent structure networks on unknown manifolds

Testing and Estimation of Social Network Dependence with Time to Event Data

A New Generative Statistical Model for Graphs: The Latent Order Logistic (LOLOG) Model

Maximum Likelihood Latent Space Embedding of Logistic Random Dot Product Graphs

Prediction Models with Graph Kernel Regularization for Network Data

Latent Space Models for Dynamic Networks with Weighted Edges

Detecting Latent Communities in Network Formation Models

Regression Analysis of Logistic Model with Latent Variables

Graphical model selection with latent variables

Network Regression with Graph Laplacians

A Fused Latent and Graphical Model for Multivariate Binary Data

High Dimensional Semiparametric Latent Graphical Model for Mixed Data

Confidence sets for network structure

Differential equation and probability inspired graph neural networks for latent variable learning

Estimating Differential Latent Variable Graphical Models with Applications to Brain Connectivity

Neighborhood selection with application to social networks

Social Discrete Choice Models

Link prediction via latent space logistic regression model