GProM
GProM is a generic provenance database middleware that computes provenance for SQL queries, updates, and transactions on demand.
Provenance tracking for database operations, i.e., automatically collecting and managing information about the origin of data, has received considerable interest from the database community in the last decade. Efficiently generating and querying provenance is essential for debugging data and queries, evaluating trust measures for data, defining new types of access control models, auditing, and as a supporting technology for data integration and probabilistic databases. The de-facto standard for database provenance is to model provenance as annotations on data and compute the provenance for the outputs of an operation by propagating annotations. Many provenance systems use a relational encoding of provenance annotations. These systems apply query rewrite techniques to transform a query q into a query that propagates input annotations to produce the result of q annotated with provenance. This approach has many advantages. It benefits from existing database technology, e.g., provenance computations are optimized by the database optimizer. Queries over provenance can be expressed as SQL queries over the relational encoding. Alternatively, we can compile a special-purpose provenance query language into SQL queries over such an encoding. In this project we advance the current state-of-the-art in several aspects:
- We have developed the first provenance model and capture mechanism for transactional updates.
- We achieve interoperability with other provenance systems by supporting import and export of provenance represented as PROV-JSON
- We have developed a cost-based optimizer for provenance instrumentation pipelines.
- We have build GProM, a generic provenance middleware that supports multiple frontend languages and backend databases.
GProM is a database middleware that adds provenance support to multiple database backends. Provenance is information about how data was produced by database operations. That is, for a row in the database or returned by a query we capture from which rows it was derived and by which operations. The system compiles declarative queries with provenance requests into SQL code and executes this SQL code on a backend database system. GProM supports provenance capture for SQL queries and transactions, and produces provenance graphs explaining existing and missing answers for Datalog queries. Provenance is captured on demand by using a compilation technique called instrumentation. Instrumentation rewrites an SQL query (or past transaction) into a query that returns rows paired with their provenance. The output of the instrumentation process is a regular SQL query that can be executed using any standard relational database. The instrumented query generated from a provenance request returns a standard relation that maps rows to their provenance. Provenance for transactions is captured retroactively using a declarative replay technique called reenactment that we have developed at IIT. GProM extends multiple frontend languages (e.g., SQL and Datalog) with provenance requests and can produce code for multiple backends (currently Oracle). The reenactment approach was developed in collaboration with Oracle as part of the the provenance for temporal databases project. Other noteworthy features of GProM include: support for multiple database backends and an optimizer for rewritten queries.
System Highlights
- Database independent
- Provenance for queries, updates, and transactions
- User decides when to compute and store provenance
- Supports multiple provenance models
- Retroactive on-demand provenance computation using instrumentation and reenactment
- Only requires audit log and time travel
- No additional runtime and storage overhead
- Non-invasive provenance computation using query instrumentation and annotation propagation
- Provenance import/export
- Heuristic and Cost-based optimizations for instrumented queries
Architecture and Approach
An overview of GProM's architecture is shown above. The user interacts
with the system using an extension of one of the supported
Provenance Computation
Similar to Perm (and other systems) we represent provenance information using a relational encoding of provenance annotations. This representation is flexible enough to encode typical database provenance models including PI-CS (and, thus, provenance polynomials), Where- and Why-provenance, and many others. The provenance rewriter module (2) uses provenance-type specific rules to rewrite an input query into a query that propagates annotations to produce an encoding of data annotated with provenance (we call this process instrumentation.
Supporting Past Queries, Updates, and Transactions
One unique feature of GProM is that the system can retroactively compute the provenance of queries, updates, and transactions. This feature requires that a log of database operations is available (we call this an audit log) and that the underlying database system supports time travel, i.e., querying past versions of a relation. These features are available in most database systems or can be added using extensibility mechanisms. An audit log paired with time travel functionality is sufficient for computing the provenance of past queries using simple modifications of standard provenance rewrites. Our main contribution is to demonstrate that this is also sufficient for tracking the provenance of updates and transactions. If the user requests provenance for a transaction , the transaction reenactor (3) extracts the list of SQL statements executed by from the audit log and constructs a reenactment query that simulates the effects of these statements. This query runs over the database version valid at the time when the transaction was executed.
We use the provenance rewriter to rewrite into a query that computes the provenance of the reenacted transaction. Note that the construction of is independent of the provenance rewrite and is a standard relational query. This is because the reenactment query and transaction are annotation-equivalent, i.e., they have the same result and provenance. Using this approach, we can compute any type of provenance for updates, transactions, and across transactions as long as rewrite rules for computing the provenance of queries have been implemented for this provenance type.
Optimizing Rewritten Queries
GProM includes an optimizer (7) which applies heuristic and cost-based rules to transform instrumented queries into SQL code that can be successfully optimized by the backend DBMS. This is necessary, because provenance rewrites generate queries with unusual access patterns and operator sequences. Even sophisticated database optimizers are not capable of producing reasonable plans for such queries.
Database Backends
Support for additional database backends can be added to GProM by implementing new parser, catalog lookup, and SQL code generator plugins. Here we benefit from our backend-independent relational algebra graph representation of queries, because all the remaining functionality, e.g., provenance computation, works on the database-independent algebraic representation of queries.
Provenance Language Extensions
The wiki of the github repository for GProM documents the SQL and Datalog frontend language extensions.
Collaborators
- Dieter Gawlick - Oracle
- Oliver Kennedy - SUNY Buffalo
- Vasudha Krishnaswamy - Oracle
- Venkatesh Radhakrishnan
- Zhen Hua Liu - Oracle
Publications
-
Heuristic and Cost-based Optimization for Diverse Provenance Tasks
Xing Niu, Raghav Kapoor, Boris Glavic, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy and Venkatesh Radhakrishnan
IEEE Transactions on Knowledge and Data Engineering. 31, 7 (2019) , 1267–1280.@article{NK18, author = {Niu, Xing and Kapoor, Raghav and Glavic, Boris and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh}, doi = {10.1109/TKDE.2018.2827074}, journal = {IEEE Transactions on Knowledge and Data Engineering}, keywords = {Provenance; Optimization; GProM}, longversionurl = {https://arxiv.org/pdf/1804.07156.pdf}, number = {7}, pages = {1267--1280}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/NX18.pdf}, projects = {GProM}, title = {Heuristic and Cost-based Optimization for Diverse Provenance Tasks}, venueshort = {TKDE}, volume = {31}, year = {2019} }
-
Provenance For Transactional Updates
Bahareh Arab
Illinois Institue of Technology.@phdthesis{A19, author = {Arab, Bahareh}, keywords = {Provenance; GProM; Reenactment}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/A19.pdf}, projects = {GProM}, school = {Illinois Institue of Technology}, title = {{Provenance For Transactional Updates}}, venueshort = {PhD Thesis}, year = {2019} }
Database provenance explains how results are derived by queries. However, many use cases such as auditing and debugging of transactions require understanding of how the current state of a database was derived by a transactional history. We introduce an approach for capturing the provenance of transactions. Our approach does not just work for serializable transactions but also non-serializable transaction such as read committed snapshot isolation (RC-SI). The main drivers of our approach are a provenance model for queries, updates, and transactions and reenactment, a novel technique for retroactively capturing the provenance of tuple versions. We introduce the MV-semirings provenance model for updates and transactions as an extension of the existing semiring provenance model for queries. Our reenactment technique exploits the time travel and audit logging capabilities of modern DBMS to replay parts of a transactional history using queries. Importantly, our technique requires no changes to the transactional workload or underlying DBMS and results in only moderate runtime overhead for transactions. Furthermore, we discuss how our MV-semirings model and reenactment approach can be used to serve a wide variety of applications and use cases including answering of historical what-if queries which determine the effect of hypothetical changes to past operations of a business, post- mortem debugging of transactions, and Provenance-aware Versioned Dataworkspaces (PVDs). We have implemented our approach on top of a commercial DBMS and our experiments confirm that by applying novel optimizations we can efficiently capture provenance for complex transactions over large data sets.
-
GProM - A Swiss Army Knife for Your Provenance Needs
Bahareh Arab, Su Feng, Boris Glavic, Seokki Lee, Xing Niu and Qitian Zeng
IEEE Data Engineering Bulletin. 41, 1 (2018) , 51–62.@article{AF18, author = {Arab, Bahareh and Feng, Su and Glavic, Boris and Lee, Seokki and Niu, Xing and Zeng, Qitian}, bibsource = {dblp computer science bibliography, https://dblp.org}, biburl = {https://dblp.org/rec/bib/journals/debu/ArabFGLNZ17}, journal = {{IEEE} Data Engineering Bulletin}, keywords = {GProM; Provenance; Annotations}, number = {1}, pages = {51--62}, pdfurl = {http://sites.computer.org/debull/A18mar/p51.pdf}, projects = {GProM; Reenactment}, timestamp = {Fri, 02 Mar 2018 18:50:49 +0100}, title = {{GProM} - {A} Swiss Army Knife for Your Provenance Needs}, venueshort = {Data Eng. Bull.}, volume = {41}, year = {2018}, bdsk-url-1 = {http://sites.computer.org/debull/A18mar/p51.pdf} }
-
Using Reenactment to Retroactively Capture Provenance for Transactions
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
IEEE Transactions on Knowledge and Data Engineering. 30, 3 (2018) , 599–612.@article{AG17c, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, doi = {10.1109/TKDE.2017.2769056}, journal = {IEEE Transactions on Knowledge and Data Engineering}, keywords = {Provenance; GProM; Reenactment; Concurrency Control}, number = {3}, pages = {599--612}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG17c.pdf}, projects = {GProM; Reenactment}, title = {Using Reenactment to Retroactively Capture Provenance for Transactions}, venueshort = {TKDE}, volume = {30}, year = {2018} }
-
Debugging Transactions and Tracking their Provenance with Reenactment
Xing Niu, Boris Glavic, Seokki Lee, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Su Feng and Xun Zou
Proceedings of the VLDB Endowment (Demonstration Track). 10, 12 (2017) , 1857–1860.@article{NG17, author = {Niu, Xing and Glavic, Boris and Lee, Seokki and Arab, Bahareh and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Feng, Su and Zou, Xun}, journal = {Proceedings of the VLDB Endowment (Demonstration Track)}, keywords = {Provenance; GProM; Reenactment; Debugging; Concurrency Control; Reenactment}, number = {12}, pages = {1857--1860}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XG17.pdf}, projects = {GProM; Reenactment}, title = {Debugging Transactions and Tracking their Provenance with Reenactment}, venueshort = {PVLDB}, volume = {10}, year = {2017}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XG17.pdf} }
-
A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries
Seokki Lee, Sven Köhler, Bertram Ludäscher and Boris Glavic
Proceedings of the 33rd IEEE International Conference on Data Engineering (2017), pp. 485–496.@inproceedings{LS17, author = {Lee, Seokki and K\"{o}hler, Sven and Lud\"{a}scher, Bertram and Glavic, Boris}, booktitle = {Proceedings of the 33rd IEEE International Conference on Data Engineering}, keywords = {Provenance; Datalog; GProM; Missing Answers; Game Provenance; PUGS}, pages = {485-496}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LS17.pdf}, projects = {GProM; PUGS}, title = {A SQL-Middleware Unifying Why and Why-Not Provenance for First-Order Queries}, venueshort = {ICDE}, year = {2017} }
-
Integrating Approximate Summarization with Provenance Capture
Seokki Lee, Xing Niu, Bertram Ludäscher and Boris Glavic
Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (2017).@inproceedings{SN17, author = {Lee, Seokki and Niu, Xing and Lud\"{a}scher, Bertram and Glavic, Boris}, booktitle = {Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance}, isworkshop = {true}, keywords = {Provenance; Datalog; GProM; Missing Answers; Game Provenance; PUGS}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/SN17.pdf}, projects = {GProM; PUGS}, title = {Integrating Approximate Summarization with Provenance Capture}, venueshort = {TaPP}, year = {2017}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/SN17.pdf} }
-
Provenance-aware Query Optimization
Xing Niu, Raghav Kapoor, Boris Glavic, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy and Venkatesh Radhakrishnan
Proceedings of the 33rd IEEE International Conference on Data Engineering (2017), pp. 473–484.@inproceedings{XN17, author = {Niu, Xing and Kapoor, Raghav and Glavic, Boris and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh}, booktitle = {Proceedings of the 33rd IEEE International Conference on Data Engineering}, keywords = {Provenance; Cost-based optimization; Query instrumentation; Annotation propagation; GProM}, pages = {473-484}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XN17.pdf}, projects = {GProM}, title = {Provenance-aware Query Optimization}, venueshort = {ICDE}, year = {2017} }
-
Optimizing Provenance Capture and Queries - Algebraic Transformations and Cost-based Optimization
Xing Niu and Boris Glavic
Technical Report #IIT/CS-DB-2016-02
Illinois Institute of Technology.@techreport{XN16a, author = {Niu, Xing and Glavic, Boris}, date-added = {2016-09-17 20:07:29 +0000}, date-modified = {2016-09-17 20:09:08 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Query Optimization; GProM}, number = {IIT/CS-DB-2016-02}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XN16a.pdf}, projects = {GProM}, title = {Optimizing Provenance Capture and Queries - Algebraic Transformations and Cost-based Optimization}, venueshort = {Techreport}, year = {2016}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XN16a.pdf} }
-
Formal Foundations of Reenactment and Transaction Provenance
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Technical Report #IIT/CS-DB-2016-01
Illinois Institute of Technology.@techreport{AG16a, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, date-added = {2014-09-17 20:07:29 +0000}, date-modified = {2014-09-17 20:09:08 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, number = {IIT/CS-DB-2016-01}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG16.pdf}, projects = {GProM; Reenactment}, title = {Formal Foundations of Reenactment and Transaction Provenance}, venueshort = {Techreport}, year = {2016}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG16.pdf} }
-
Provenance-aware Versioned Dataworkspaces
Xing Niu, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Oliver Kennedy and Boris Glavic
Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (2016).@inproceedings{XN16, author = {Niu, Xing and Arab, Bahareh and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Kennedy, Oliver and Glavic, Boris}, booktitle = {Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance}, isworkshop = {true}, keywords = {Provenance; GProM; Data Cleaning}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XN16.pdf}, projects = {GProM}, title = {Provenance-aware Versioned Dataworkspaces}, venueshort = {TaPP}, year = {2016} }
-
Reenactment for Read-Committed Snapshot Isolation
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Proceedings of the 25th ACM International Conference on Information and Knowledge Management (2016), pp. 841–850.@inproceedings{AG17, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, booktitle = {Proceedings of the 25th ACM International Conference on Information and Knowledge Management}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, pages = {841--850}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG17.pdf}, longversionurl = {https://arxiv.org/pdf/1608.08258}, projects = {GProM; Reenactment}, title = {Reenactment for Read-Committed Snapshot Isolation}, venueshort = {CIKM}, year = {2016} }
-
Reenactment for Read-Committed Snapshot Isolation (long version)
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Illinois Institute of Technology.@techreport{AG17a, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, pdfurl = {http://cs.iit.edu/%7Edbgroup/assets/pdfpubls/AG16a.pdf}, projects = {GProM; Reenactment}, title = {Reenactment for Read-Committed Snapshot Isolation (long version)}, venueshort = {Techreport}, year = {2016} }
-
Efficiently Computing Provenance Graphs for Queries with Negation
Seokki Lee, Sven Köhler, Bertram Ludäscher and Boris Glavic
Technical Report #IIT/CS-DB-2016-03
Illinois Institute of Technology.@techreport{LS16a, author = {Lee, Seokki and K\"{o}hler, Sven and Lud\"{a}scher, Bertram and Glavic, Boris}, date-modified = {2016-10-20 12:15:28 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Datalog; GProM; Missing Answers}, number = {IIT/CS-DB-2016-03}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LS16a.pdf}, projects = {GProM; PUGS}, title = {Efficiently Computing Provenance Graphs for Queries with Negation}, venueshort = {Techreport}, year = {2016}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LS16a.pdf} }
-
Implementing Unified Why- and Why-Not Provenance Through Games
Seokki Lee, Sven Köhler, Bertram Ludäscher and Boris Glavic
Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (Poster) (2016).@inproceedings{LS16, author = {Lee, Seokki and K\"{o}hler, Sven and Lud\"{a}scher, Bertram and Glavic, Boris}, booktitle = {Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (Poster)}, isworkshop = {true}, keywords = {Provenance; Game Provenance; Datalog; GProM; Missing Answers; PUGS}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LS16.pdf}, projects = {PUGS}, title = {{Implementing Unified Why- and Why-Not Provenance Through Games}}, venueshort = {TaPP}, year = {2016}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LS16.pdf} }
-
An Efficient Implementation of Game Provenance in DBMS
Seokki Lee, Yuchen Tang, Sven Köhler, Bertram Ludäscher and Boris Glavic
Technical Report #IIT/CS-DB-2015-02
Illinois Institute of Technology.@techreport{LW15a, author = {Lee, Seokki and Tang, Yuchen and K\"{o}hler, Sven and Lud\"{a}scher, Bertram and Glavic, Boris}, date-modified = {2015-10-22 12:15:28 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Game Provenance; Datalog; GProM; Missing Answers; PUGS}, number = {IIT/CS-DB-2015-02}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LW15a.pdf}, projects = {PUGS}, title = {An Efficient Implementation of Game Provenance in DBMS}, venueshort = {Techreport}, year = {2015}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/LW15a.pdf} }
-
Heuristic and Cost-based Optimization for Provenance Computation
Xing Niu, Raghav Kapoor and Boris Glavic
Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (Poster) (2015).@inproceedings{NK15, author = {Niu, Xing and Kapoor, Raghav and Glavic, Boris}, booktitle = {Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (Poster)}, isworkshop = {true}, keywords = {Provenance; Query Optimization; GProM}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/NK15.pdf}, projects = {GProM}, title = {{Heuristic and Cost-based Optimization for Provenance Computation}}, venueshort = {TaPP}, year = {2015}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/NK15.pdf} }
-
Interoperability for Provenance-aware Databases using PROV and JSON
Xing Niu, Raghav Kapoor, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance (2015).@inproceedings{PJ15, author = {Niu, Xing and Kapoor, Raghav and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, booktitle = {Proceedings of the 7th USENIX Workshop on the Theory and Practice of Provenance}, isworkshop = {true}, keywords = {Provenance;JSON;GProM;PROV}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/PJ15.pdf}, projects = {GProM}, slideurl = {http://www.slideshare.net/lordPretzel/2015-tapp}, title = {Interoperability for Provenance-aware Databases using PROV and JSON}, venueshort = {TaPP}, year = {2015}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/PJ15.pdf} }
-
A Generic Provenance Middleware for Database Queries, Updates, and Transactions
Bahareh Arab, Dieter Gawlick, Venkatesh Radhakrishnan, Hao Guo and Boris Glavic
Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance (2014).@inproceedings{AG14, author = {Arab, Bahareh and Gawlick, Dieter and Radhakrishnan, Venkatesh and Guo, Hao and Glavic, Boris}, booktitle = {Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance}, isworkshop = {true}, keywords = {Reenactment; Provenance; Concurrency Control; GProM}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG14.pdf}, projects = {GProM}, slideurl = {http://www.slideshare.net/lordPretzel/tapp-2014-talk-boris}, title = {A Generic Provenance Middleware for Database Queries, Updates, and Transactions}, venueshort = {TaPP}, year = {2014}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG14.pdf} }
We present an architecture and prototype implementation for a generic provenance database middleware (GProM) that is based on the concept of query rewrites, which are applied to an algebraic graph representation of database operations. The system supports a wide range of provenance types and representations for queries, updates, transactions, and operations spanning multiple transactions. GProM supports several strategies for provenance generation, e.g., on-demand, rule-based, and “always on”. To the best of our knowledge, we are the first to present a solution for computing the provenance of concurrent database transactions. Our solution can retroactively trace transaction provenance as long as an audit log and time travel functionality are available (both are supported by most DBMS). Other noteworthy features of GProM include: extensibility through a declarative rewrite rule specification language, support for multiple database backends, and an optimizer for rewritten queries.
-
Reenacting Transactions to Compute their Provenance
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Technical Report #IIT/CS-DB-2014-02
Illinois Institute of Technology.@techreport{AG14a, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, date-added = {2014-09-17 20:07:29 +0000}, date-modified = {2014-09-17 20:09:08 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, number = {IIT/CS-DB-2014-02}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AD14.pdf}, projects = {GProM; Reenactment}, title = {Reenacting Transactions to Compute their Provenance}, venueshort = {Techreport}, year = {2014}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AD14.pdf} }