Quantifying the potential of cascade outbreaks via early infected nodes using network percolation

Xin Li,Huichun Li,Xue Zhang,Chengli Zhao,Xiaojun Duan
DOI: https://doi.org/10.1063/5.0190294
2024-04-01
Abstract:In many fields, accurate prediction of cascade outbreaks during their early stages of propagation is of paramount importance. Based on percolation theory, we propose a global propagation probability algorithm that effectively estimates the probability of information spreading from source nodes to the giant component. Building on this, we further introduce an early prediction method for cascade outbreaks, which provides quantitative predictions of both the probability and scope of cascade outbreaks by fully considering the network structure data and propagation dynamics. Through our research, we observe that cascade outbreaks resemble a phase transition. When approaching the critical point of an outbreak, a few specific activating nodes typically facilitate the transmission of information throughout the entire network, thus enabling early inference of a cascading outbreak. To validate our findings, we conducted experiments on diverse network structures using a classical propagation model and applied our proposed method to analyze a real microblog cascade dataset. The experimental results robustly demonstrate the superiority of our approach over baseline methods in terms of effectively predicting cascade outbreaks with high precision and early detection capability.
mathematics, applied,physics, mathematical
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accurately predict the possibility and scope of cascade outbreaks in the early stage of information dissemination. Specifically, based on the network percolation theory, the authors propose a global propagation probability algorithm (GPP), which can effectively estimate the probability of information spreading from the source node to the entire network. On this basis, they further introduce an early - prediction method for cascade outbreaks (GPP - COP), which provides a quantitative prediction of the probability and scope of cascade outbreaks by comprehensively considering network structure data and propagation dynamics. The study finds that cascade outbreaks are similar to phase - transition processes. When approaching the outbreak critical point, a few specific activated nodes usually promote the spread of information throughout the network, thereby achieving early inference of cascade outbreaks. To verify these findings, the authors conducted experiments on multiple network structures and applied the proposed method to analyze real Weibo cascade data sets. Experimental results show that, compared with the baseline method, this method has higher accuracy and early - detection ability in predicting cascade outbreaks. ### Main Contributions 1. **Global Propagation Probability Algorithm (GPP)**: - Based on the message - passing method and percolation theory, a global propagation probability algorithm is proposed, which accurately reflects the possibility of an activated node activating the entire network. - This index establishes a connection between local and global network structures. 2. **Cascade Outbreak Prediction Method (GPP - COP)**: - Based on the GPP algorithm, a new cascade outbreak prediction method is proposed, which can monitor the probability of cascade outbreaks online and estimate the final outbreak scope. - Using prior information of network structure and propagation dynamics, it provides a new scientific tool for cascade outbreak research. 3. **Experimental Verification**: - Experiments are carried out on multiple simulated and real network structures. The experimental results show that the GPP - COP method can provide quantitative and deterministic inferences in the early stage of propagation, showing strong practical application potential. ### Theoretical Background - **Network Percolation Theory**: Percolation theory is an important tool for studying network dynamic problems, such as cascade failures, disease spread, and traffic flow. In many complex networks, the overall inter - connection framework depends on a set of specific structural nodes. The activation of these nodes can spread information to the entire network. - **Equivalence between Classical Propagation Models and Edge Percolation Models**: In the SIR model, the probability $\beta_{uv}$ that node $u$ infects its neighbor node $v$ during the infection period is $\beta_{uv}=1-(1 - \phi_{uv})^{m_0}$, where $\phi_{uv}$ is the infection probability per unit time and $m_0$ is the number of unit times included in the infection period. Therefore, if the source node belongs to the giant connected component, the information will spread to the entire network; otherwise, the information can only spread locally. The SIR model is equivalent to the network edge percolation model, and the global outbreak under some classical propagation patterns can be studied through the edge percolation model. ### Models and Algorithms - **Approximate Algorithm for Global Propagation Probability**: - A new iterative algorithm is proposed to calculate the global propagation probability through edge probability $p_{v_0\rightarrow u}$ instead of node probability $p(u, s_\infty)$. - By introducing a discount factor $\lambda$, the calculation error introduced by the local tree - like structure assumption deviation is reduced. - **Early - Prediction Method for Cascade Outbreaks**: - The infected nodes and recovered nodes are merged into a new node $s(t)$, and the probability of cascade outbreaks is calculated through the GPP algorithm. - The outbreak scope is estimated, and the formula is $s_\infty\approx\sum_{n\in V, n\notin (I(t)\cup R(t))}p_n+\text{card}(I(t)\cup R(t))$, where $\text{card}(I\cup R)$ represents the cardinality of the set $I\cup R$, and $p_n$ represents the global propagation probability of node $n$. ### Experimental Results - **Data Sets**: - On two synthetic networks (ER random network and BA scale - free network) and three real - world networks