Heterogeneous data format integration and conversion (HDFIC) using machine learning and IBM-DFDL for IoT
Sandeep M,B. R. Chandavarkar,Sagar Khatri
DOI: https://doi.org/10.1007/s12530-024-09568-7
IF: 2.347
2024-02-28
Evolving Systems
Abstract:The future of the Internet of Things (IoT) demands the integration of synergetic applications to cater to societal needs. Examples of IoT-based confederated applications include Ambient Assisted Living with Active Healthy Ageing, CasAware with Smart Energy, Smart Gas Distribution Networks with GIS systems, and more. However, the data heterogeneity hinders integration, as these systems follow different standards, data formats, semantic models, and representations. Further, this leads to data interoperability issues in IoT. The major concern of academia and industry in the smooth integration of heterogeneous applications is interpreting different data formats and representing them in a common schema for further analysis. Existing solutions, such as message payload translation, middleware/cloud format, and Inter-IoT, are complex, time-consuming, and ineffective. Hence, this paper proposes the heterogeneous data format integration and conversion (HDFIC), a machine learning-based system to identify data formats using a Random Forest classifier and integrate them using the Data Format Description Language (DFDL). The content-based data format identification in the proposed HDFIC is trained with the standard features defined in RFC 7111, 8259, and 8996. Subsequently, the data is integrated into a single XML Schema Definition and converted into the required data format using the IBM App Connect Enterprise tool and DFDL. Finally, the performance of HDFIC is evaluated with the synergetic patient body vitals and room ambiance dataset for accuracy, data integration time, and conversion efficiency.
computer science, artificial intelligence