Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare management

Vijayalakshmi Manickam,Minu Rajasekaran Indra
DOI: https://doi.org/10.1007/s00500-022-06938-8
Abstract:The growth of information technology has opened the gate for the organizations to maintain their data in various forms and at various volumes. This increases the volume and dimension of data being maintained. However, they store their data in their data servers or in cloud environment. Such data have been used to generate various intelligence to support various problems. To support such analysis process, different data have been used and the big data comes to play in this part. Optimizing techniques to help improve the process of ETL could greatly help in real-time analysis of data. ETL optimization could be achieved through several factors simplest being increasing the frequency of the process. Other ways to achieve optimization are through the use of various architectures, programming models, intelligence in transformation and security. To improve the performance of ETL, an efficient dynamic multi-variant relational intelligent ETL framework has been presented in this article. The distributed approach maintains various ontology's and data dictionaries which have been dynamically updated by different threads of ETL process. Initially, the process is start by applying the extraction process which extracts the data from different sources and finds set of dimensions and their characteristics. Such data extracted have been verified over the data dictionary. Further, the relational score has been measured for each data source with the existing one. Similarly, the method computes the value of multi-variant relational similarity (MVRS) for the data obtained from a single source. This will be performed by different threads of ETL process. According to the value of MVRS, the method performs map reduce and merging of data. According to the value of MVRS the method selects the data node and merges the data to store in the data warehouse. The threads of ETL are capable of reading the changes in data dictionaries and ontology's to iterate the process of transformation and loading. The method improves the performance of ETL with least time complexity and higher performance.
What problem does this paper attempt to address?