An OPC UA-based industrial Big Data architecture

Eduard Hirsch,Simon Hoher,Stefan Huber
2023-06-02
Abstract:Industry 4.0 factories are complex and data-driven. Data is yielded from many sources, including sensors, PLCs, and other devices, but also from IT, like ERP or CRM systems. We ask how to collect and process this data in a way, such that it includes metadata and can be used for industrial analytics or to derive intelligent support systems. This paper describes a new, query model based approach, which uses a big data architecture to capture data from various sources using OPC UA as a foundation. It buffers and preprocesses the information for the purpose of harmonizing and providing a holistic state space of a factory, as well as mappings to the current state of a production site. That information can be made available to multiple processing sinks, decoupled from the data sources, which enables them to work with the information without interfering with devices of the production, disturbing the network devices they are working in, or influencing the production process negatively. Metadata and connected semantic information is kept throughout the process, allowing to feed algorithms with meaningful data, so that it can be accessed in its entirety to perform time series analysis, machine learning or similar evaluations as well as replaying the data from the buffer for repeatable simulations.
Information Retrieval,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper attempts to address the issues of data collection, processing, and analysis in the context of Industry 4.0. Specifically, the paper focuses on the following aspects: 1. **How to provide a comprehensive and well-defined view of the production environment**: Collecting data and its metadata from various devices and systems without compromising data integrity to provide a comprehensive view of the production status. 2. **Data collection methods**: How to frequently collect data from multiple sources without interfering with the production process, maintaining data integrity, and reducing the load on the network and data sources. 3. **Data harmonization methods**: How to appropriately harmonize data so that downstream processing systems can understand the semantic information of these data. The paper proposes a big data architecture based on OPC UA (Open Platform Communications Unified Architecture) for collecting data from different devices in the factory. It uses a Query Model to define which devices need to be monitored and what data needs to be collected. This architecture also includes decoupling data sources from data processing systems to facilitate historical data analysis and uses the concept of data streams to transform and enhance data from different sources. The ultimate goal is to build an Information Engine that can provide pre-processed data for subsequent analysis and machine learning tasks.