A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and Storage

Haoyu Wang,Yaqing Wang,Defu Lian,Jing Gao
DOI: https://doi.org/10.1145/3459637.3482224
2021-01-01
Abstract:Knowledge graphs, which consist of entities and their relations, have become a popular way to store structured knowledge. Knowledge graph embedding (KGE), which derives a representation for each entity and relation, has been widely used to capture the semantics of the information in the knowledge graphs, and has demonstrated great success in many downstream applications, such as the extraction of similar entities in response to a query entity. However, existing KGE methods cannot work well on emerging knowledge graphs that are large-scale due to the constraints in storage and inference efficiency. In this paper, we propose a lightweight KGE model, LightKG, which significantly reduces storage as well as running time needed for inference. Instead of storing a continuous vector for every entity, LightKG only needs to store a few codebooks, each of which contains some codewords that correspond to the representatives among the embeddings, and the indices that correspond to the codeword selections for entities. Hence LightKG can achieve highly efficient storage. The efficiency of the downstream querying process can be significantly boosted too with the proposed LightKG model as the relevance score between the query and an entity can be efficiently calculated via a quick look-up in a table that contains the scores between the query and codewords. The storage and inference efficiency of LightKG is achieved by its novel design. LightKG is an end-to-end framework that automatically infers codebooks and codewords and generates an approximated embedding for each entity. A residual module is included in LightKG to induce the diversity among codebooks, and a continuous function is adopted to approximate codeword selection, which is non-differential. In addition, to further improve the performance of KGE, we propose a novel dynamic negative sampling method based on quantization, which can be applied to the proposed LightKG or other KGE methods. We conduct extensive experiments on five public datasets. The experiments show that LightKG is search and memory efficient with high approximate search accuracy. Also, the dynamic negative sampling can dramatically improve model performance with over 19% improvement on average.
What problem does this paper attempt to address?