Provenace for Updates and Transactions
In this project we explore how provenance computation can benefit from temporal database techniques. This project is funded by and executed in collaboration with the Oracle corporation. Starting from porting rewrite-based techniques such as the ones used in Perm (Perm) to the Oracle SQL dialect, we will study how to 1) compute the provenance of past query and 2) compute the provenance for updates and transactions. This requires non-trivial extensions to current provenance techniques, because of, e.g., interaction of transactions under lower serialization level. Our solution can retroactively trace transaction provenance as long as an audit log and time travel functionality are available (both are supported by most DBMS). One of the major outcomes so far is the development of the concept of reenactment queries, queries that reenact the effects of a transaction. Reenactment queries are the main enabler of retroactive provenance computation for transactions.
Within this project we have made the following major contributions to provenance management
- Development of MV-relations, a provenance model for queries, updates, and transactional histories that extends the seminal semiring annotation model (defined for queries) with support for updates and transactions.
- Development of reenactment, a declarative replay technique with provenance capture that enables tracking the provenance of a past update or transaction retroactively by executing a query.
- Implementation of provenance tracking for transactions over a standard relational database as part of the GProM system.
MV-relations - A Provenance Model for Transactional Updates
As part of this project we have developed a provenance model that allows tracking the provenance of tuples through queries and transactional updates. In our model, the complete derivation history of a tuple - which update operations derived the tuple and one which inputs of these operations does it depend on - can be encoded in the annotation of the tuple.
Reenactment - Declarative Replay with Provenance Capture
Reenactment is a declarative replay technique that enables a transactional history (or part thereof) to be repeated by executing a so-called reenactment query. We have proven that reenactment queries produce the same result and have the same provenance as the operation(s) they are replaying. Thus, a reenactment query can be used to retroactively compute the provenance of an operation executed some time in the past as long as the database state seen by this operation can be accessed.
Implementation in GProM
To retrieve the provenance of a past update (transaction, or history) we construct its reenactment query based on a log of SQL operations executed in the past (e.g., Oracle's audit log facility). Such an reenactment query needs to be executed over the database state seen by the operation(s) to be replayed. We use time travel to access such past database states. The techniques developed in this project have been integrated in the GProM system, a database independent middleware application for computing provenance.
Collaborators
- Dieter Gawlick - Oracle
- Oliver Kennedy - SUNY Buffalo
- Vasudha Krishnaswamy - Oracle
- Venkatesh Radhakrishnan
- Zhen Hua Liu - Oracle
Funding
- Oracle - Provenance using temporal databases (Extension) (2016 - 2017), $95,829, PIs: Boris Glavic
- Oracle - Provenance using temporal databases (2015 - 2016), $85,000, PIs: Boris Glavic
Publications
-
Efficient Answering of Historical What-if Queries
Felix Campbell, Bahareh Arab and Boris Glavic
Proceedings of the 48th International Conference on Management of Data (SIGMOD) (2022), pp. 1556–1569.@inproceedings{CA22, author = {Campbell, Felix and Arab, Bahareh and Glavic, Boris}, booktitle = {Proceedings of the 48th International Conference on Management of Data ({SIGMOD})}, keywords = {Reenactment; What-if; Uncertainty}, projects = {GProM; Reenactment}, title = {Efficient Answering of Historical What-if Queries}, pages = {1556--1569}, pdfurl = {https://dl.acm.org/doi/pdf/10.1145/3514221.3526138}, doi = {10.1145/3514221.3526138}, longversionurl = {https://arxiv.org/pdf/2203.12860}, video = {https://www.youtube.com/watch?v=6O0InOM-ZbI&t=2s}, venueshort = {SIGMOD}, year = {2022} }
-
Provenance For Transactional Updates
Bahareh Arab
Illinois Institue of Technology.@phdthesis{A19, author = {Arab, Bahareh}, keywords = {Provenance; GProM; Reenactment}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/A19.pdf}, projects = {GProM}, school = {Illinois Institue of Technology}, title = {{Provenance For Transactional Updates}}, venueshort = {PhD Thesis}, year = {2019} }
Database provenance explains how results are derived by queries. However, many use cases such as auditing and debugging of transactions require understanding of how the current state of a database was derived by a transactional history. We introduce an approach for capturing the provenance of transactions. Our approach does not just work for serializable transactions but also non-serializable transaction such as read committed snapshot isolation (RC-SI). The main drivers of our approach are a provenance model for queries, updates, and transactions and reenactment, a novel technique for retroactively capturing the provenance of tuple versions. We introduce the MV-semirings provenance model for updates and transactions as an extension of the existing semiring provenance model for queries. Our reenactment technique exploits the time travel and audit logging capabilities of modern DBMS to replay parts of a transactional history using queries. Importantly, our technique requires no changes to the transactional workload or underlying DBMS and results in only moderate runtime overhead for transactions. Furthermore, we discuss how our MV-semirings model and reenactment approach can be used to serve a wide variety of applications and use cases including answering of historical what-if queries which determine the effect of hypothetical changes to past operations of a business, post- mortem debugging of transactions, and Provenance-aware Versioned Dataworkspaces (PVDs). We have implemented our approach on top of a commercial DBMS and our experiments confirm that by applying novel optimizations we can efficiently capture provenance for complex transactions over large data sets.
-
Using Reenactment to Retroactively Capture Provenance for Transactions
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
IEEE Transactions on Knowledge and Data Engineering. 30, 3 (2018) , 599–612.@article{AG17c, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, doi = {10.1109/TKDE.2017.2769056}, journal = {IEEE Transactions on Knowledge and Data Engineering}, keywords = {Provenance; GProM; Reenactment; Concurrency Control}, number = {3}, pages = {599--612}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG17c.pdf}, projects = {GProM; Reenactment}, title = {Using Reenactment to Retroactively Capture Provenance for Transactions}, venueshort = {TKDE}, volume = {30}, year = {2018} }
-
Answering Historical What-if Queries with Provenance, Reenactment, and Symbolic Execution
Bahareh Arab and Boris Glavic
Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance (2017).@inproceedings{AG17b, author = {Arab, Bahareh and Glavic, Boris}, booktitle = {Proceedings of the 8th USENIX Workshop on the Theory and Practice of Provenance}, isworkshop = {true}, keywords = {Provenance;Reenactment;What-if}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG17b.pdf}, projects = {GProM;Reenactment}, title = {Answering Historical What-if Queries with Provenance, Reenactment, and Symbolic Execution}, venueshort = {TaPP}, year = {2017}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG17b.pdf} }
-
Debugging Transactions and Tracking their Provenance with Reenactment
Xing Niu, Boris Glavic, Seokki Lee, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Su Feng and Xun Zou
Proceedings of the VLDB Endowment (Demonstration Track). 10, 12 (2017) , 1857–1860.@article{NG17, author = {Niu, Xing and Glavic, Boris and Lee, Seokki and Arab, Bahareh and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Feng, Su and Zou, Xun}, journal = {Proceedings of the VLDB Endowment (Demonstration Track)}, keywords = {Provenance; GProM; Reenactment; Debugging; Concurrency Control; Reenactment}, number = {12}, pages = {1857--1860}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XG17.pdf}, projects = {GProM; Reenactment}, title = {Debugging Transactions and Tracking their Provenance with Reenactment}, venueshort = {PVLDB}, volume = {10}, year = {2017}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XG17.pdf} }
-
Reenactment for Read-Committed Snapshot Isolation
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Proceedings of the 25th ACM International Conference on Information and Knowledge Management (2016), pp. 841–850.@inproceedings{AG17, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, booktitle = {Proceedings of the 25th ACM International Conference on Information and Knowledge Management}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, pages = {841--850}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG17.pdf}, longversionurl = {https://arxiv.org/pdf/1608.08258}, projects = {GProM; Reenactment}, title = {Reenactment for Read-Committed Snapshot Isolation}, venueshort = {CIKM}, year = {2016} }
-
Formal Foundations of Reenactment and Transaction Provenance
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Technical Report #IIT/CS-DB-2016-01
Illinois Institute of Technology.@techreport{AG16a, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, date-added = {2014-09-17 20:07:29 +0000}, date-modified = {2014-09-17 20:09:08 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, number = {IIT/CS-DB-2016-01}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG16.pdf}, projects = {GProM; Reenactment}, title = {Formal Foundations of Reenactment and Transaction Provenance}, venueshort = {Techreport}, year = {2016}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG16.pdf} }
-
Reenactment for Read-Committed Snapshot Isolation (long version)
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Illinois Institute of Technology.@techreport{AG17a, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, pdfurl = {http://cs.iit.edu/%7Edbgroup/assets/pdfpubls/AG16a.pdf}, projects = {GProM; Reenactment}, title = {Reenactment for Read-Committed Snapshot Isolation (long version)}, venueshort = {Techreport}, year = {2016} }
-
A Generic Provenance Middleware for Database Queries, Updates, and Transactions
Bahareh Arab, Dieter Gawlick, Venkatesh Radhakrishnan, Hao Guo and Boris Glavic
Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance (2014).@inproceedings{AG14, author = {Arab, Bahareh and Gawlick, Dieter and Radhakrishnan, Venkatesh and Guo, Hao and Glavic, Boris}, booktitle = {Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance}, isworkshop = {true}, keywords = {Reenactment; Provenance; Concurrency Control; GProM}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG14.pdf}, projects = {GProM}, slideurl = {http://www.slideshare.net/lordPretzel/tapp-2014-talk-boris}, title = {A Generic Provenance Middleware for Database Queries, Updates, and Transactions}, venueshort = {TaPP}, year = {2014}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AG14.pdf} }
We present an architecture and prototype implementation for a generic provenance database middleware (GProM) that is based on the concept of query rewrites, which are applied to an algebraic graph representation of database operations. The system supports a wide range of provenance types and representations for queries, updates, transactions, and operations spanning multiple transactions. GProM supports several strategies for provenance generation, e.g., on-demand, rule-based, and “always on”. To the best of our knowledge, we are the first to present a solution for computing the provenance of concurrent database transactions. Our solution can retroactively trace transaction provenance as long as an audit log and time travel functionality are available (both are supported by most DBMS). Other noteworthy features of GProM include: extensibility through a declarative rewrite rule specification language, support for multiple database backends, and an optimizer for rewritten queries.
-
Reenacting Transactions to Compute their Provenance
Bahareh Arab, Dieter Gawlick, Vasudha Krishnaswamy, Venkatesh Radhakrishnan and Boris Glavic
Technical Report #IIT/CS-DB-2014-02
Illinois Institute of Technology.@techreport{AG14a, author = {Arab, Bahareh and Gawlick, Dieter and Krishnaswamy, Vasudha and Radhakrishnan, Venkatesh and Glavic, Boris}, date-added = {2014-09-17 20:07:29 +0000}, date-modified = {2014-09-17 20:09:08 +0000}, institution = {Illinois Institute of Technology}, keywords = {Provenance; Concurrency Control; Reenactment; GProM}, number = {IIT/CS-DB-2014-02}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AD14.pdf}, projects = {GProM; Reenactment}, title = {Reenacting Transactions to Compute their Provenance}, venueshort = {Techreport}, year = {2014}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/AD14.pdf} }