Abstract:Graph is a ubiquitous representation of data in various research fields, and graph embedding is a prevalent machine learning technique for capturing key features and generating fixed-sized attributes. However, most state-of-the-art graph embedding methods are computationally and spatially expensive. Recently, the Graph Encoder Embedding (GEE) has been shown as the fastest graph embedding technique and is suitable for a variety of network data applications. As real-world data often involves large and sparse graphs, the huge sparsity usually results in redundant computations and storage. To address this issue, we propose an improved version of GEE, sparse GEE, which optimizes the calculation and storage of zero entries in sparse matrices to enhance the running time further. Our experiments demonstrate that the sparse version achieves significant speedup compared to the original GEE with Python implementation for large sparse graphs, and sparse GEE is capable of processing millions of edges within minutes on a standard laptop.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: when dealing with large - scale sparse graphs, the existing graph embedding methods (such as Graph Encoder Embedding, GEE) are less efficient in terms of computation and storage. Specifically: 1. **Computation efficiency problem**: When dealing with large - scale sparse graphs, most state - of - the - art graph embedding methods have redundant computations during the calculation process due to the large number of zero elements in the sparse matrix, which affects the running speed of the algorithm. 2. **Storage efficiency problem**: Traditional graph embedding methods do not fully utilize the characteristics of sparse matrices when storing them, resulting in a waste of storage space. Especially when dealing with large - scale graphs, the storage cost is very large. To solve these problems, the author proposes an improved version of GEE - **sparse GEE (sparse GEE)**, which further improves the running time and storage efficiency of the algorithm by optimizing the calculation and storage methods of sparse matrices. Specific improvements include: - Using the **Compressed Sparse Row (CSR)** data structure to represent and calculate the embedding matrix, reducing the storage and calculation of zero elements. - Using the **Dictionary of Keys (DOK)** data structure in the intermediate result construction stage and converting it to CSR format for subsequent calculations. Through these improvements, sparse GEE can significantly improve performance when dealing with large - scale sparse graphs, especially when additional options such as Laplacian normalization are enabled. Experimental results show that when dealing with large - scale graphs containing millions of edges, sparse GEE can complete the embedding task within a few minutes on an ordinary laptop, and compared with the original GEE, sparse GEE achieves an 86 - fold speed improvement on the largest simulated data set.

Efficient Graph Encoder Embedding for Large Sparse Graphs in Python

Task-Oriented Genetic Activation for Large-Scale Complex Heterogeneous Graph Embedding.

Edge-Parallel Graph Encoder Embedding

Efficiently Visualizing Large Graphs

PyTorch-BigGraph: A Large-scale Graph Embedding System

Adaptive Graph Encoder for Attributed Graph Embedding

Accurate, Efficient and Scalable Graph Embedding

Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection

Heterogeneous Graph Sparsification for Efficient Representation Learning

Encoder Embedding for General Graph and Node Classification

Boosting Graph Embedding on a Single GPU

Distributed-Memory Vertex-Centric Network Embedding for Large-Scale Graphs

Sparse Decomposition of Graph Neural Networks

SketchNE: Embedding Billion-Scale Networks Accurately in One Hour

NeuGraph: Parallel Deep Neural Network Computation on Large Graphs

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

Graph Accelerators—A Case for Sparse Data Processing

Graph Positional and Structural Encoder

LIGHTNE: A Lightweight Graph Processing System for Network Embedding

ProNE: Fast and Scalable Network Representation Learning.