A Survey on Management of Data Provenance

高明,金澈清,王晓玲,田秀霞,周傲英
DOI: https://doi.org/10.3724/sp.j.1016.2010.00373
2010-01-01
Chinese Journal of Computers
Abstract:The data provenance describes about how data is generated and evolves with time going on,which has many applications,including evaluation of data quality,audit trail,replication recipes,data citation,etc.Generally,the data provenance could be recorded among multiple sources,or just within a single data source.In other words,the derivation history of data could take place either in schema level,or in instance level.This paper surveys the researches about presentation and query of data provenance both in schema level and instance level.For the schema level,the focus is on query rewriting and schema mappings,and for the instance level,the focus includes relational data provenance,XML data provenance,streaming data provenance.Moreover,the research efforts of uncertain data provenance to track the derivation of data and uncertainty are also summarized.Finally,this paper lists applications of the data provenance,discusses the main challenges,and points out some research issues in future.
What problem does this paper attempt to address?