Organization
Each student has to pick two papers from the list (Report papers) below to read, write a report on, and give an oral presentation on one of these papers. The deadline is Sep 7th. I will bring a list of papers to class and additionally you will be able to choose online. The oral presentations will be in the second half of the course (time to be announced later).- Choose two papers to review and pick one for oral presentation by Sep 7th
- Short written reviews (4-6 pages). The final version is due by Nov 1st
- First version of the review has to be handed in no later than Oct 15th
- Oral presentations will likely be in Oct 22nd and 24th
- Use google scholar to access pdf versions of the papers
Background Reading
The course is to a large extend based on the research presented in these papers. Reading these papers will make it easier to follow the topics presented in this course. This list will be updated regularly with papers for the next few classes.
-
Data Lineage: A Survey
R. Ikeda and J. Widom
- Provenance in Databases: Why, How, and Where
J. Cheney and L. Chiticariu and W.-C. Tan
- Data Provenance: A Categorization of Existing Approaches
B. Glavic and K. R. Dittrich
- Provenance in Databases: Past, Current, and Future
W.-C. Tan
- Provenance in Scientific Workflow Systems
S. B. Davidson and S. Cohen-Boulakia and A. Eyal and B. Ludascher and T. McPhillips and S. Bowers and J. Freire
- A Survey of Data Provenance in e-science
Y. L. Simmhan and B. Plale and D. Gannon
- Research Problems in Data Provenance
W.-C. Tan
- Why and Where: A Characterization of Data Provenance
P. Buneman and S. Khanna and W.-C. Tan
- Data Provenance: Some Basic Issues
P. Buneman and S. Khanna and W.-C. Tan
- Tracing the Lineage of View Data in a Warehousing Environment
Y. Cui and J. Widom and J. L. Wiener
- Provenance in Databases: Why, How, and Where
Report Papers
These are the paper to choose from:
-
Tiresias: The database oracle for how-to queries
A. Meliou and D. Suciu
- RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows
H. Park and R. Ikeda and J. Widom
- Provenance for generalized map and reduce workflows
R. Ikeda and H. Park and J. Widom
- Provenance-based refresh in data-oriented workflows
R. Ikeda and S. Salihoglu and J. Widom
- Putting Lipstick on Pig: Enabling Database-style Workflow Provenance
Y. Amsterdamer and S. B. Davidson and D. Deutch and T. Milo and J. Stoyanovich and V. Tannen
- Tracing data errors with view-conditioned causality
A. Meliou and W. Gatterbauer and S. Nath and D. Suciu
- A graph model of data and workflow provenance
U. Acar and P. Buneman and J. Cheney and J. van den Bussche and N. Kwasnikowska and S. Vansummeren
- Efficient querying and maintenance of network provenance at internet-scale
W. Zhou and M. Sherr and T. Tao and X. Li and B. T. Loo and Y. Mao
- Explaining Missing Answers to SPJUA Queries
M. Herschel and M. Hernandez
- Generating Databases for Query Workloads
E. Lo and N. Cheng and W.-K. Hon
- How to ConQueR why-not questions
Q. T. Tran and C.-Y. Chan
- Lost source provenance
J. Zhang and H. Jagadish
- Propagating updates through XML views using lineage tracing
L. Fegaras
- Provenance in ORCHESTRA
T. J. Green and G. Karvounarakis and Z. G. I. V. Tannen
- Querying data provenance
G. Karvounarakis and Z. G. Ives and V. Tannen
- The Complexity of Causality and Responsibility for Query Answers and non-Answers
A. Meliou and W. Gatterbauer and K. F. Moore and D. Suciu
- Tracking and Sketching Distributed Data Provenance
T. Malik and L. Nistor and A. Gehani
- Efficient Provenance Storage over Nested Data Collections
M. K. Anand and S. Bowers and T. McPhillips and B. Ludäscher
- Why Not?
A. Chapman and H. V. Jagadish
- Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases
J. Widom and M. Theobald and A. D. Sarma
- Annotated XML: Queries and Provenance
J. N. Foster and T. J. Green and V. Tannen
- Efficient Provenance Storage
A. Chapman and H. V. Jagadish and P. Ramanan
- On the Provenance of Non-answers to Queries over Extracted Data
J. Huang and T. Chen and A. Doan and J. F. Naughton
- Program Slicing and Data Provenance
J. Cheney
- Provenance Semirings
T. J. Green and G. Karvounarakis and V. Tannen
- Tracing Lineage beyond Relational Operators
M. Zhang and X. Zhang and X. Zhang and S. Prabhakar
- ULDBs: Databases with Uncertainty and Lineage
O. Benjelloun and A. D. Sarma and A. Y. Halevy and J. Widom
- MONDRIAN: Annotating and Querying Databases through Colors and Blocks
F. Geerts and A. Kementsietsidis and D. Milano
- Provenance-Aware Storage Systems
M. Seltzer and K.-K. Muniswamy-Reddy and D. A. Holland and U. Braun and J. Ledlie
- An Annotation Management System for Relational Databases
D. Bhagwat and L. Chiticariu and W.-C. Tan and G. Vijayvargiya
- On Propagation of Deletions and Annotations through Views
P. Buneman and S. Khanna and W.-C. Tan
- Storing Auxiliary Data for Efficient Maintenance and Lineage Tracing of Complex Views
Y. Cui and J. Widom
- RAMP: A System for Capturing and Tracing Provenance in MapReduce Workflows