Learning Semantic Association Rules from Internet of Things Data

Erkan Karabulut,Paul Groth,Victoria Degeler
2024-12-04
Abstract:Association Rule Mining (ARM) is the task of discovering commonalities in data in the form of logical implications. ARM is used in the Internet of Things (IoT) for different tasks including monitoring and decision-making. However, existing methods give limited consideration to IoT-specific requirements such as heterogeneity and volume. Furthermore, they do not utilize important static domain-specific description data about IoT systems, which is increasingly represented as knowledge graphs. In this paper, we propose a novel ARM pipeline for IoT data that utilizes both dynamic sensor data and static IoT system metadata. Furthermore, we propose an Autoencoder-based Neurosymbolic ARM method (Aerial) as part of the pipeline to address the high volume of IoT data and reduce the total number of rules that are resource-intensive to process. Aerial learns a neural representation of a given data and extracts association rules from this representation by exploiting the reconstruction (decoding) mechanism of an autoencoder. Extensive evaluations on 3 IoT datasets from 2 domains show that ARM on both static and dynamic IoT data results in more generically applicable rules while Aerial can learn a more concise set of high-quality association rules than the state-of-the-art with full coverage over the datasets.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main problems encountered in association rule mining (ARM) in Internet of Things (IoT) data: 1. **Existing methods fail to fully consider the special nature of IoT data**: - Most of the existing association rule mining methods are simple adaptations of traditional data mining algorithms and are not optimized specifically for the characteristics of IoT data such as heterogeneity and large amounts of data. - These methods usually ignore static domain - specific description data (such as knowledge graphs), which are very important for understanding the context of IoT systems. 2. **The number of generated association rules is excessive and difficult to handle**: - As the input dimension increases, existing ARM algorithms will generate a large number of association rules, which not only consumes a large amount of resources but is also difficult to maintain and interpret. - In a large - scale IoT environment, each sensor is regarded as a different data dimension, resulting in a large number of generated rules. To solve these problems, the author makes two main contributions: 1. **A new ARM pipeline**: - Combine dynamic sensor data and static IoT system metadata (such as knowledge graphs) to learn semantic association rules. - By introducing semantic information, the generated rules are more general and easier to interpret. For example, traditional sensor - based rules can only be applied to specific sensor combinations, while semantic association rules can describe a wider range of situations. 2. **Autoencoder - based Neurosymbolic ARM method (Aerial)**: - Aerial uses an autoencoder to learn the neural representation of input data and extracts association rules from this representation. - Through this method, Aerial can learn a set of high - quality, concise association rules from high - dimensional IoT data while ensuring full coverage of the entire data set. ### Summary The paper proposes a new ARM pipeline and an autoencoder - based Neurosymbolic ARM method (Aerial), aiming to solve the limitations of existing ARM methods in IoT data, specifically including: - Make full use of the static and dynamic parts of IoT data to generate more general semantic association rules. - Reduce the number of generated association rules and improve the quality and interpretability of the rules. Through these two contributions, the paper provides a more efficient and more suitable association rule mining method for IoT environments.