Abstract:Influence maximization (IM) is a very important issue in social network diffusion analysis. The topology of real social network is large-scale, dynamic, and heterogeneous. The heterogeneity, and continuous expansion and evolution of social network pose a challenge to find influential users. Existing IM algorithms usually assume that social networks are static or dynamic but homogeneous to simplify the complexity of the IM problem. We propose a community-based influence maximization algorithm using network embedding in dynamic heterogeneous social networks. We use DyHATR algorithm to obtain the propagation feature vectors of network nodes, and execute k -means cluster algorithm to transform the original network into a coarse granularity network (CGN). On CGN, we propose a community-based three-hop independent cascade model and construct the objective function of IM problem. We design a greedy heuristics algorithm to solve the IM problem with \((1-\frac{1}{e})- \) approximation guarantee and use community structure to quickly identify seed users and estimate their influence value. Experimental results on real social networks demonstrated that compared with existing IM algorithms, our proposed algorithm had better comprehensive performance with respect to the influence value, more less execution time and memory consumption, and better scalability.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to find the most influential set of users, namely the seed set, in large - scale dynamic heterogeneous social networks and estimate the influence values of these seed users. Specifically, the paper proposes an Influence Maximization (IM) algorithm based on community diffusion and Dynamic Heterogeneous Network Embedding (DHNE) to address the challenges faced by existing IM algorithms when dealing with large - scale, dynamic and heterogeneous social networks. Traditional IM algorithms usually assume that social networks are static or dynamic but homogeneous, which simplifies the complexity of the IM problem but cannot effectively cope with the complexity and dynamic changes of social networks in the real world.
### Main contributions of the paper:
1. **Proposed a community diffusion model based on DHNE**: This model utilizes network embedding technology to represent dynamic heterogeneous propagation characteristics using low - dimensional dense feature vectors, improving the accuracy of community diffusion. At the same time, it reduces the space complexity through the community structure.
2. **Designed the DHNE - CIM algorithm with (1−1/e) approximation guarantee**: This algorithm can efficiently search for seed users and estimate their influence.
3. **Experimental results show**: The experimental results on three large - scale dynamic heterogeneous networks indicate that the proposed algorithm is superior to existing IM algorithms in terms of influence value, execution time, memory consumption and scalability.
### Specific problems solved:
- **Large - scale networks**: Social networks in reality are extremely large, containing tens of millions or even more nodes and edges.
- **Dynamic networks**: The topological structure of social networks is constantly changing, the number of nodes and edges is constantly increasing, and the relationships of edges are also constantly changing.
- **Heterogeneous networks**: There are multiple types of nodes and edges in social networks, such as user nodes, message nodes, comment relationships, etc.
### Technical methods:
- **Dynamic Heterogeneous Network Embedding (DHNE)**: Use the DyHATR algorithm to learn the propagation feature vectors of nodes.
- **Community discovery**: Convert the original network into a coarse - grained network (CGN) through the k - means clustering algorithm.
- **Community diffusion model**: Proposed a community - based three - hop independent cascade model on CGN and constructed the objective function of the IM problem.
- **Greedy heuristic algorithm**: Designed a greedy heuristic algorithm with (1−1/e) approximation guarantee to solve the IM problem.
### Formula analysis:
- **Propagation proximity**:
\[
p(y_t^v, y_t^u)=\frac{1}{2}\left(1 - \frac{y_t^v\cdot y_t^u}{\|y_t^v\|\cdot\|y_t^u\|}\right)
\]
This formula defines the propagation proximity between two nodes, measured by the normalized cosine distance.
- **Influence probability**:
- Probability of first - hop diffusion:
\[
q'^t_{c_i}=p(z^t_{c_{\text{int}}}, z^t_{c_i})
\]
- Probability of second - hop diffusion:
\[
q''^t_{c_i, s}=p(z^t_{c_i}, y^t_s)
\]
- Probability of third - hop diffusion:
\[
q'''^t_{s, v}=\sum_{v\in c_i\setminus S_t^c_i}p(y^t_s, y^t_v)
\]
- **Influence value**:
\[
\sigma(S_t)=\sum_{c_i\in C_t}\sum_{s\in S_t^{c_i}}\sum_{v\in c_i\setminus S_t^{c_i}}q'^t_{c_i}q''^t_{c_i, s}q'''^t_{s, v}
\]
Through the above methods and techniques, the paper successfully solves the problem of finding the most influential set of users in large - scale dynamic heterogeneous social networks and provides an efficient solution.