CS 595 : Hot topics in database systems: Data Provenance - Fall 2012

Organization

Each student has to pick two papers from the list (Report papers) below to read, write a report on, and give an oral presentation on one of these papers. The deadline is Sep 7th. I will bring a list of papers to class and additionally you will be able to choose online. The oral presentations will be in the second half of the course (time to be announced later).

Background Reading

The course is to a large extend based on the research presented in these papers. Reading these papers will make it easier to follow the topics presented in this course. This list will be updated regularly with papers for the next few classes.

Data Lineage: A Survey
R. Ikeda and J. Widom

Provenance in Databases: Why, How, and Where
J. Cheney and L. Chiticariu and W.-C. Tan

Data Provenance: A Categorization of Existing Approaches
B. Glavic and K. R. Dittrich

Provenance in Databases: Past, Current, and Future
W.-C. Tan

Provenance in Scientific Workflow Systems
S. B. Davidson and S. Cohen-Boulakia and A. Eyal and B. Ludascher and T. McPhillips and S. Bowers and J. Freire

A Survey of Data Provenance in e-science
Y. L. Simmhan and B. Plale and D. Gannon

Research Problems in Data Provenance
W.-C. Tan

Why and Where: A Characterization of Data Provenance
P. Buneman and S. Khanna and W.-C. Tan

Data Provenance: Some Basic Issues
P. Buneman and S. Khanna and W.-C. Tan

Tracing the Lineage of View Data in a Warehousing Environment
Y. Cui and J. Widom and J. L. Wiener

Report Papers

These are the paper to choose from:

Tiresias: The database oracle for how-to queries
A. Meliou and D. Suciu

RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows
H. Park and R. Ikeda and J. Widom

Provenance for generalized map and reduce workflows
R. Ikeda and H. Park and J. Widom

Provenance-based refresh in data-oriented workflows
R. Ikeda and S. Salihoglu and J. Widom

Putting Lipstick on Pig: Enabling Database-style Workflow Provenance
Y. Amsterdamer and S. B. Davidson and D. Deutch and T. Milo and J. Stoyanovich and V. Tannen

Tracing data errors with view-conditioned causality
A. Meliou and W. Gatterbauer and S. Nath and D. Suciu

A graph model of data and workflow provenance
U. Acar and P. Buneman and J. Cheney and J. van den Bussche and N. Kwasnikowska and S. Vansummeren

Efficient querying and maintenance of network provenance at internet-scale
W. Zhou and M. Sherr and T. Tao and X. Li and B. T. Loo and Y. Mao

Explaining Missing Answers to SPJUA Queries
M. Herschel and M. Hernandez

Generating Databases for Query Workloads
E. Lo and N. Cheng and W.-K. Hon

How to ConQueR why-not questions
Q. T. Tran and C.-Y. Chan

Lost source provenance
J. Zhang and H. Jagadish

Propagating updates through XML views using lineage tracing
L. Fegaras

Provenance in ORCHESTRA
T. J. Green and G. Karvounarakis and Z. G. I. V. Tannen

Querying data provenance
G. Karvounarakis and Z. G. Ives and V. Tannen

The Complexity of Causality and Responsibility for Query Answers and non-Answers
A. Meliou and W. Gatterbauer and K. F. Moore and D. Suciu

Tracking and Sketching Distributed Data Provenance
T. Malik and L. Nistor and A. Gehani

Efficient Provenance Storage over Nested Data Collections
M. K. Anand and S. Bowers and T. McPhillips and B. Ludäscher

Why Not?
A. Chapman and H. V. Jagadish

Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases
J. Widom and M. Theobald and A. D. Sarma

Annotated XML: Queries and Provenance
J. N. Foster and T. J. Green and V. Tannen

Efficient Provenance Storage
A. Chapman and H. V. Jagadish and P. Ramanan

On the Provenance of Non-answers to Queries over Extracted Data
J. Huang and T. Chen and A. Doan and J. F. Naughton

Program Slicing and Data Provenance
J. Cheney

Provenance Semirings
T. J. Green and G. Karvounarakis and V. Tannen

Tracing Lineage beyond Relational Operators
M. Zhang and X. Zhang and X. Zhang and S. Prabhakar

ULDBs: Databases with Uncertainty and Lineage
O. Benjelloun and A. D. Sarma and A. Y. Halevy and J. Widom

MONDRIAN: Annotating and Querying Databases through Colors and Blocks
F. Geerts and A. Kementsietsidis and D. Milano

Provenance-Aware Storage Systems
M. Seltzer and K.-K. Muniswamy-Reddy and D. A. Holland and U. Braun and J. Ledlie

An Annotation Management System for Relational Databases
D. Bhagwat and L. Chiticariu and W.-C. Tan and G. Vijayvargiya

On Propagation of Deletions and Annotations through Views
P. Buneman and S. Khanna and W.-C. Tan

Storing Auxiliary Data for Efficient Maintenance and Lineage Tracing of Complex Views
Y. Cui and J. Widom