Data Provenance Analysis And Description For Etl Based On Prov

Zhang Ran,Dai Chao-Fan,Zeng Sai-Hong
2016-01-01
Abstract:Data provenance, also calling it data lineage or pedigree, is related information of data about the process from its generation to present situation. W3C workshop proposes PROV standards that rule what vocabularies/ ontologies/rules were used to generate data. It is the uniform standard for data provenance, which strengthens interoperations between different provenance information. ETL, which Extract-Transform-Load abbreviates to, is a description for the change process from data source to end, including extraction, transformation an d loading. In this paper, what we do is to analyze and design a system that can trace data and process correctly and effectively,and we focus on reverse rules and tracing method. As a result, we will do research on data provenance, which will be based on ETL and use PROV standards can make the tracing process better. What's more,we will give an introduction about provenan cc tree that is graphical representation of data tracing process
What problem does this paper attempt to address?