Data Provenance: A Categorization of Existing Approaches (bibtex)
by Boris Glavic, Klaus R. Dittrich
Abstract:
In many application areas like e-science and data-warehousing detailed information about the origin of data is required. This kind of information is often referred to as data provenance or data lineage. The provenance of a data item includes information about the processes and source data items that lead to its creation and current representation. The diversity of data representation models and application domains has lead to a number of more or less formal definitions of provenance. Most of them are limited to a special application domain, data representation model or data processing facility. Not surprisingly, the associated implementations are also restricted to some application domain and depend on a special data model. In this paper we give a survey of data provenance models and prototypes, present a general categorization scheme for provenance models and use this categorization scheme to study the properties of the existing approaches. This categorization enables us to distinguish between different kinds of provenance information and could lead to a better understanding of provenance in general. Besides the categorization of provenance types, it is important to include the storage, transformation and query requirements for the different kinds of provenance information and application domains in our considerations. The analysis of existing approaches will assist us in revealing open research problems in the area of data provenance.
Reference:
Data Provenance: A Categorization of Existing Approaches (Boris Glavic, Klaus R. Dittrich), In Proceedings of the 12th GI Conference on Datenbanksysteme in Buisness, Technologie und Web (BTW), 2007.
Bibtex Entry:
@inproceedings{GD07,
	Abstract = {In many application areas like e-science and data-warehousing detailed 
information about the origin of data is required. This kind of information is 
often referred to as data provenance or data lineage. The provenance of a data 
item includes information about the processes and source data items that lead 
to its creation and current representation. The diversity of data  
representation models and application domains has lead to a number of more or 
less formal definitions of provenance. Most of them are limited to a special 
application domain, data representation model or data processing facility. Not 
surprisingly, the associated implementations are also restricted to some 
application domain and depend on a special data model. In this paper we give a 
survey of data provenance models and prototypes, present a general 
categorization scheme for provenance models and use this categorization scheme 
to study the properties of the existing approaches. This categorization enables 
us to distinguish between different kinds of provenance information and could 
lead to a better understanding of provenance in general. Besides the 
categorization of provenance types, it is important to include the storage, 
transformation and query requirements for the different kinds of provenance 
information and application domains in our considerations. The analysis of 
existing approaches will assist us in revealing open research problems in the 
area of data provenance.},
	Author = {Boris Glavic and Klaus R. Dittrich},
	Bibsource = {DBLP, http://dblp.uni-trier.de},
	Booktitle = {Proceedings of the 12th GI Conference on Datenbanksysteme in Buisness, Technologie und Web (BTW)},
	Date-Added = {2012-12-14 18:55:49 +0000},
	Date-Modified = {2012-12-14 18:55:49 +0000},
	Keywords = {Provenance},
	Local-Url = {file://localhost/Users/admin/Documents/Uni/IFI/Papers/GD07_Data%20Provenance%20A%20Categorization%20of%20Existing%20Approaches_0.pdf},
	Pages = {227-241},
	Title = {{Data Provenance: A Categorization of Existing Approaches}},
	Url = {http://cs.iit.edu/%7edbgroup/pdfpubls/GD07.pdf},
	Venueshort = {BTW},
	Year = {2007},
	Bdsk-Url-1 = {http://cs.iit.edu/%7edbgroup/pdfpubls/GD07.pdf}}
Powered by bibtexbrowser