Abstract:Recently, the introduction of knowledge graphs (KGs) has significantly advanced recommender systems by facilitating the discovery of potential associations between items. However, existing methods still face several limitations. First, most KGs suffer from missing facts or limited scopes. This can lead to biased knowledge representations, thereby constraining the model's performance. Second, existing methods typically convert textual information into IDs, resulting in the loss of natural semantic connections between different items. Third, existing methods struggle to capture high-order relationships in global KGs due to their inefficient layer-by-layer information propagation mechanisms, which are prone to introducing significant noise. To address these limitations, we propose a novel method called CoLaKG, which leverages large language models (LLMs) for knowledge-aware recommendation. The extensive world knowledge and remarkable reasoning capabilities of LLMs enable them to supplement KGs. Additionally, the strong text comprehension abilities of LLMs allow for a better understanding of semantic information. Based on this, we first extract subgraphs centered on each item from the KG and convert them into textual inputs for the LLM. The LLM then outputs its comprehension of these item-centered subgraphs, which are subsequently transformed into semantic embeddings. Furthermore, to utilize the global information of the KG, we construct an item-item graph using these semantic embeddings, which can directly capture higher-order associations between items. Both the semantic embeddings and the structural information from the item-item graph are effectively integrated into the recommendation model through our designed representation alignment and neighbor augmentation modules. Extensive experiments on four real-world datasets demonstrate the superiority of our method.
What problem does this paper attempt to address?
This paper attempts to solve the following three main problems:
1. **The problem of missing facts and limited scope in the knowledge graph (KG)**:
- Many existing knowledge graphs often have missing facts or limited coverage due to the large amount of human effort and domain expertise required in the construction process. This will cause the recommendation system to rely on biased knowledge representations during the learning process, thus affecting the model performance.
- Formula representation: Assume that \( KG \) represents the knowledge graph and \( F(KG) \) represents its fact integrity, then in existing methods \( F(KG) < 1 \), that is, there are missing facts.
2. **The problem of effectively using text entities and relations**:
- Existing recommendation methods usually convert text entities and relations into IDs and cannot fully utilize the semantic information in the text, resulting in the loss of natural semantic connections between different items.
- For example, in Figure 1, "horror" and "thriller" are two semantically related attribute nodes, but because they are converted into different IDs, this semantic relevance is not reflected in the recommendation system.
- Formula representation: Let \( T \) be a text entity and \( ID(T) \) be the corresponding ID. In existing methods, \( Semantics(ID(T)) = 0 \), that is, semantic information is lost.
3. **The problem of capturing high - order relations**:
- Existing methods are difficult to effectively capture high - order relations in the global knowledge graph. These methods usually propagate and aggregate information by stacking multiple - layer graph neural networks (GNNs), but this method is not only inefficient but also introduces a large amount of irrelevant node information, resulting in the over - smoothing problem.
- For example, in Figure 1, although point A and point H have a strong semantic connection, existing methods are difficult to capture this semantic relationship due to the long distance in the KG.
- Formula representation: Let \( G(KG) \) be the global knowledge graph and \( H(G(KG)) \) be the high - order relation. In existing methods, \( H(G(KG))\approx 0 \), that is, high - order relations are difficult to capture.
To solve these problems, the author proposes a new method, CoLaKG, which uses large - language models (LLMs) to understand and enhance the semantic and structural information in the knowledge graph, thereby improving the performance of the recommendation system. Specifically, CoLaKG is implemented through the following steps:
- **Extract item - centered sub - graphs and convert them into text inputs**, then use LLMs to generate an understanding of these sub - graphs and convert them into semantic embeddings.
- **Construct an item - item graph based on these semantic embeddings** to directly capture high - order associations between items.
- **Fuse the semantic embeddings with the ID embeddings in the recommendation model** and effectively integrate these two types of information through the designed representation alignment and neighbor enhancement modules.
Through these methods, CoLaKG can more comprehensively understand the information in the knowledge graph and improve the performance of the recommendation system.