HET-KG: Communication-Efficient Knowledge Graph Embedding Training Via Hotness-Aware Cache

Sicong Dong,Xupeng Miao,Pengkai Liu,Xin Wang,Bin Cui,Jianxin Li
DOI: https://doi.org/10.1109/icde53745.2022.00177
2022-01-01
Abstract:With the popularization and application of Artificial Intelligence technology, knowledge graph embedding methods are widely used for a variety of machine learning tasks. However, most of the current knowledge graph embedding models are trained with a large number of parameters and high computational time complexity. This becomes a main obstacle to apply these existing models to large-scale knowledge graphs. To address this challenge, we propose HET-KG, a distributed system for training knowledge graph embedding efficiently. HET-KG can reduce the communication overheads by introducing a cache embedding table structure to maintain hot-embeddings at each worker. To improve the effectiveness of the cache mechanism, we design a prefetching algorithm and a filtering algorithm for adaptively selecting hot-embeddings, and provide two kinds of hot-embedding table construction strategies. To address the issue of inconsistency between the local cached hot-embeddings and the global embeddings, we also develop a hot-embedding synchronization algorithm for dynamically updating the cache embedding table, which can guarantee the inconsistency bounded within a given threshold. Finally, extensive experiments are conducted on three knowledge graph datasets FB15k, WN18, and Freebase-86m. The experimental results show that HET-KG achieves 3.7x and 1.1x speedup over the state-of-the-art systems PyTorch-BigGraph and DGL-KE, respectively.
What problem does this paper attempt to address?