Abstract:The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the l (2)-regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacher-student scenario under the assumption that the teacher couplings are sparse, the analysis is conducted using the replica and cavity methods, with a special focus on whether the presence/absence of teacher couplings is correctly inferred or not. The result indicates that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization when the number of spins N is smaller than the dataset size M, in the thermodynamic limit N -> infinity. Further, to access the underdetermined region M < N, we examine the effect of the l (2) regularization, and find that biases appear in all the coupling estimates, preventing the perfect identification of the network structure. We, however, find that the biases are shown to decay exponentially fast as the distance from the center spin chosen in the pseudolikelihood method grows. Based on this finding, we propose a two-stage estimator: in the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold; in the second stage the naive linear regression is conducted only on the remaining couplings, and the resultant estimates are again pruned by another relatively large threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in 0 < M/N < 1. Results of extensive numerical experiments support these findings.

Towards Quantifying Sampling Bias in Network Inference

Structure Learning in Inverse Ising Problems Using ℓ_2-Regularized Linear Estimator

Inference in Linear Dyadic Data Models with Network Spillovers

Network sampling coverage II: The effect of non-random missing data on network measurement

Improving Network Inference: The Impact of False Positive and False Negative Conclusions about the Presence or Absence of Links

Network Structure Inference, A Survey: Motivations, Methods, and Applications

Statistical inference in social networks: how sampling bias and uncertainty shape decisions

Network Structure and Biased Variance Estimation in Respondent Driven Sampling

Quantifying the Multi-Scale Performance of Network Inference Algorithms

A Full Bayesian Approach to Sparse Network Inference Using Heterogeneous Datasets

Preserving the topological properties of complex networks in network sampling

Homophily and minority size explain perception biases in social networks

Mitigating Subpopulation Bias for Fair Network Topology Inference

Reliability of relational event model estimates under sampling: how to fit a relational event model to 360 million dyadic events

A Tale of Three Graphs: Sampling Design on Hybrid Social-Affiliation Networks

Network homophily via tail inequalities

Inference and Influence of Large-Scale Social Networks Using Snapshot Population Behaviour without Network Data

Causal Inference for Social Network Data

Network Sampling: From Static to Streaming Graphs

Evaluating Network Inference Methods in Terms of Their Ability to Preserve the Topology and Complexity of Genetic Networks

Unveiling homophily beyond the pool of opportunities