CS 595 : Hot topics in database systems: Data Provenance - Fall 2012

Syllabus

syllabus.pdf

Course Description

With the ever increasing amount of digital information comes an increasing need to understand "where" an piece of data (data item) is coming from, "why" it is in the result of a data transformation, and "how" it was produced by the transformation. For example, biologists use complex digital workflow and simulations to gain new insights from measurement and derived data. The result data of a complex workflow is meaningless without information of how the data was produced from which input data. This type of information, i.e., information about the creation process and origin of data, is called data provenance.

Systems that automatically track provenance information for data produced by e.g., workflows or SQL queries are becoming more and more important. Data provenance is an emerging technology which is used to, e.g., trace errors in transformed data back to its origin or gain additional insights about the data. This course introduces several models of provenance developed for domains such as databases and workflow systems. We will cover approaches for automatically tracking provenance, and study query languages and storage mechanism for provenance information. Furthermore, we will discuss real systems that generate provenance data. This course gives the students the opportunity to learn about a hot topic in database research and work with novel research prototype provenance systems.

Textbooks

No text book is required. Required reading will consist of research publications that are available online and will be linked on the course schedule page

Detailed Course Topics

Grading Policies

Course Objectives

After attending the course students should be able to: