IIT Database Group

Fork me on GitHub

Provenance Games and Generalization

Explaining why a certain answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data, and answering hypothetical questions about data. Both types of questions, i.e., why provenance and why-not (missing answer) provenance have been studied extensively independently. Based on a game theoretic notion of provenance developed by our collaborators at UIUC, we investigate how to developed a unified approach towards provenance and missing answers. Our approach is based on the observation that for a provenance model for queries with full negation, why-not questions can be translated into why questions and vice versa. In addition to investigating how provenance games can be computed efficiently using a goal oriented approach that outsources most of the computation to a datalog engine or relational database, we also are interested in the theoretical properties of provenance games and their relationship to existing provenance and missing answer models. Furthermore, provenance for missing answers (and queries with negation in general) can be very large or even infinite. Thus, a pratical approach requires the size of provenance to be reduced through finite encoding of infinite structures, compression, generalization, and summarization.

Implementation

We envision an system which supports computes explanations of a user's why and why-not questions as (summarized) provenance game fragments. The computation of the relevant fragment of a provenance game will be offloaded to a datalog engine or relational database backend. Here we will benefit from the optimized relational algebra to SQL compilation of our GProM system. The figure below gives an overview of the architecture of the proposed approach.

Collaborators

Publications

2017
[5] Integrating Approximate Summarization with Provenance Capture (Seokki Lee, Xing Niu, Bertram Ludäscher, Boris Glavic), In Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2017. [bibtex] [pdf]
[4] A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries (Seokki Lee, Sven Köhler, Bertram Ludäscher, Boris Glavic), In Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE), 2017. [bibtex] [pdf]
2016
[3] Implementing Unified Why- and Why-Not Provenance Through Games (Seokki Lee, Sven Köhler, Bertram Ludäscher, Boris Glavic), In Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (TaPP) (Poster), 2016. [bibtex] [pdf]
2015
[2] An Efficient Implementation of Game Provenance in DBMS (Seokki Lee, Yuchen Tang, Sven Köhler, Bertram Ludäscher, Boris Glavic), Technical report, Illinois Institute of Technology, IIT/CS-DB-2015-02, 2015. [bibtex] [pdf]
[1] Towards Constraint-based Explanations for Answers and Non-Answers (Boris Glavic, Sven Köhler, Sean Riddle, Bertram Ludäscher), In Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP), 2015. [bibtex] [pdf] [slides]