Associative Knowledge Graphs for Efficient Sequence Storage and Retrieval

Przemysław Stokłosa,Janusz A. Starzyk,Paweł Raif,Adrian Horzyk,Marcin Kowalik
2024-11-19
Abstract:This paper presents a novel approach for constructing associative knowledge graphs that are highly effective for storing and recognizing sequences. The graph is created by representing overlapping sequences of objects, as tightly connected clusters within the larger graph. Individual objects (represented as nodes) can be a part of multiple sequences or appear repeatedly within a single sequence. To retrieve sequences, we leverage context, providing a subset of objects that triggers an association with the complete sequence. The system's memory capacity is determined by the size of the graph and the density of its connections. We have theoretically derived the relationships between the critical density of the graph and the memory capacity for storing sequences. The critical density is the point beyond which error-free sequence reconstruction becomes impossible. Furthermore, we have developed an efficient algorithm for ordering elements within a sequence. Through extensive experiments with various types of sequences, we have confirmed the validity of these relationships. This approach has potential applications in diverse fields, such as anomaly detection in financial transactions or predicting user behavior based on past actions.
Artificial Intelligence,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to store and retrieve sequence data efficiently. Specifically, the author proposes a new method based on Associative Knowledge Graphs (AKGs) for storing and recognizing sequences. This method represents overlapping sequences as tightly - connected clusters in the graph and utilizes context to trigger complete sequence recall, thereby achieving efficient sequence storage and retrieval. ### Main problems: 1. **Efficient storage and retrieval of sequences**: - Traditional neural networks (such as Hopfield networks) have limitations in storage capacity, and modern methods (such as Dense Associative Memories and Transformers), although they have expanded the storage capacity, still face challenges when dealing with complex sequences. - The method proposed in this paper aims to improve the efficiency of sequence storage and retrieval through structured associative knowledge graphs. 2. **Maintaining the sparsity and memory capacity of the graph**: - When the graph becomes too dense, it becomes impossible to recall stored memories without error. Therefore, it is crucial to determine the critical density to define the memory capacity. - The author derives the relationship between the critical density of the graph and the memory capacity through theoretical analysis and shows how to maximize the storage capacity while maintaining the sparsity of the graph. 3. **Effectively ordering the retrieved sequence elements**: - The retrieved sequence elements need to be arranged in the correct order, which is especially important for large and complex sequences. - To this end, the author has developed several effective algorithms to order the retrieved sequence elements and has verified the performance of these algorithms through experiments. ### Solutions: - **Constructing an associative knowledge graph**: By representing each sequence as a transitive tournament graph and updating the connection weights when adding new sequences to the graph, efficient storage is achieved. - **Determining the critical density**: Through theoretical analysis, the formula for the critical density of the graph is derived to ensure maximum storage capacity without being overly dense. - **Developing sorting algorithms**: Four different sorting algorithms (Simple Sort, Node Ordering, Enhanced Node Ordering, Weighted Edges Node Ordering) are proposed, and their performance is compared through experiments. Finally, it is proved that the Weighted Edges Node Ordering algorithm performs the best. ### Application prospects: - This method has potential application value in multiple fields, such as anomaly detection in financial transactions, prediction based on users' past behaviors, etc. Through the above methods, the paper provides a novel and effective solution for efficiently storing and retrieving sequence data, while also solving the deficiencies of traditional methods in storage capacity and sorting efficiency.