Pengyuan Li, Ph.D. Student
I received my B.Sc in 2013 from Chongqing University. I received my M.S. degree in Computer Science in 2016 from Illinois Institute of Technology. I started my Ph.D. at the IIT DBGroup in 2018.
Teaching
I have been TA for the following courses:- 2018 Fall: -
- 2018 Fall: -
Research Projects
I am involved in the following research projects:- GProM - A database-independent middleware for computing the provenance of queries, updates, and transactions
Collaborators
I am collaborating with:- Dieter Gawlick - Oracle
- Oliver Kennedy - SUNY Buffalo
- Vasudha Krishnaswamy - Oracle
- Venkatesh Radhakrishnan
- Zhen Hua Liu - Oracle
Publications
-
Self-tuning Database Operations by Assessing the Importance of Data
Boris Glavic, Pengyuan Li and Ziyu Liu
Technical Report #IIT/CS-DB-2023-01
Illinois Institute of Technology.@techreport{GL23, author = {Glavic, Boris and Li, Pengyuan and Liu, Ziyu}, title = {Self-tuning Database Operations by Assessing the Importance of Data}, institution = {Illinois Institute of Technology}, year = {2023}, number = {IIT/CS-DB-2023-01}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/GL23.pdf}, projects = {Relevance-based Data Management}, keywords = {Provenance, Relevance-based Data Management}, venueshort = {Techreport} }
-
Oracle PBDS Experiments
Boris Glavic, Xing Niu, Pengyuan Li and Ziyu Liu
Technical Report #IIT/Cs-db-2022-01
Illinois Institute of Technology.@techreport{GN22, author = {Glavic, Boris and Niu, Xing and Li, Pengyuan and Liu, Ziyu}, title = {Oracle PBDS Experiments}, institution = {Illinois Institute of Technology}, year = {2022}, number = {IIT/Cs-db-2022-01}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/GN22.pdf}, projects = {Relevance-based Data Management}, keywords = {Provenance, Relevance-based Data Management}, venueshort = {Techreport} }
-
Provenance-based Data Skipping
Xing Niu, Ziyu Liu, Pengyuan Li, Boris Glavic, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu and Danica Porobic
Proceedings of the VLDB Endowment. 15, 3 (2021) , 451–464.@article{NL21, author = {Niu, Xing and Liu, Ziyu and Li, Pengyuan and Glavic, Boris and Gawlick, Dieter and Krishnaswamy, Vasudha and Liu, Zhen Hua and Porobic, Danica}, keywords = {Provenance, Data Skipping, Relevance-based Data Management}, title = {Provenance-based Data Skipping}, journal = {Proceedings of the VLDB Endowment}, projects = {Relevance-based Data Management}, pages = {451 - 464}, volume = {15}, issue = {3}, year = {2021}, doi = {10.14778/3494124.3494130}, venueshort = {{PVLDB}}, pdfurl = {https://vldb.org/pvldb/vol15/p451-niu.pdf}, longversionurl = {https://arxiv.org/pdf/2104.12815} }
Database systems use static analysis to determine upfront which data is needed for answering a query and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps.