Su Feng, Ph.D. Student
I earned my bachelor degree in Computer Science from Purdue University in 2014 and master degree in Computer Science from IIT in May 2016. I joined the IIT DBGroup in 2016 Fall.
Research Projects
I am involved in the following research projects:- GProM - A database-independent middleware for computing the provenance of queries, updates, and transactions
- Vizier - A framework for user-friendly and effective data curation.
- Uncertainty-Annotated Databases - In this project, we develop a practical, yet principled, approach for managing uncertain data.
Collaborators
I am collaborating with:- Atri Rudra - SUNY Buffalo
- Dieter Gawlick - Oracle
- Oliver Kennedy - SUNY Buffalo
- Vasudha Krishnaswamy - Oracle
- Venkatesh Radhakrishnan
- Zhen Hua Liu - Oracle
Publications
-
Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data
Su Feng, Boris Glavic and Oliver Kennedy
Proceedings of the VLDB Endowment. 16, 6 (2023) , 1346–1358.@article{LL22b, author = {Feng, Su and Glavic, Boris and Kennedy, Oliver}, keywords = {Uncertainty}, projects = {Uncertainty; UA-DB}, title = {Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data}, journal = {Proceedings of the VLDB Endowment}, volume = {16}, number = {6}, pages = {1346 - 1358}, doi = {10.14778/3583140.3583151}, pdfurl = {https://www.vldb.org/pvldb/vol16/p1346-feng.pdf}, longversionurl = {https://arxiv.org/pdf/2302.08676}, year = {2023}, venueshort = {{PVLDB}} }
-
Efficient Management of Uncertain Data
Su Feng
Illinois Institute of Technology.@phdthesis{F23, venueshort = {PhD Thesis}, author = {Feng, Su}, keywords = {Uncertainty; UA-DB}, month = may, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/F23.pdf}, projects = {GProM; Vizier; UA-DB}, school = {Illinois Institute of Technology}, title = {Efficient Management of Uncertain Data}, year = {2023} }
-
DataSense: Display Agnostic Data Documentation
Poonam Kumari, Michael Brachmann, Oliver Kennedy, Su Feng and Boris Glavic
Proceedings of the 11th Conference on Innovative Data Systems (2021).@inproceedings{KB21, author = {Kumari, Poonam and Brachmann, Michael and Kennedy, Oliver and Feng, Su and Glavic, Boris}, title = {DataSense: Display Agnostic Data Documentation}, booktitle = {Proceedings of the 11th Conference on Innovative Data Systems}, year = {2021}, projects = {Vizier; UA-DB}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/KB21.pdf}, venueshort = {CIDR} }
-
Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds
Su Feng, Aaron Huber, Boris Glavic and Oliver Kennedy
Proceedings of the 46th International Conference on Management of Data (2021), pp. 528–540.@inproceedings{FH21, author = {Feng, Su and Huber, Aaron and Glavic, Boris and Kennedy, Oliver}, booktitle = {Proceedings of the 46th International Conference on Management of Data}, keywords = {UA-DB; Vizier}, pages = {528 – 540}, doi = {10.1145/3448016.3452791}, pdfurl = {https://dl.acm.org/doi/pdf/10.1145/3448016.3452791}, projects = {Vizier; UA-DB}, video = {https://www.youtube.com/watch?v=si2HUS7idEs&list=PL3xUNnH4TdbsfndCMn02BqAAgGB0z7cwq}, title = {Efficient Uncertainty Tracking for Complex Queries with Attribute-level Bounds}, venueshort = {SIGMOD}, reproducibility = {https://github.com/fengsu91/AUDB_Reproducibility}, longversionurl = {https://arxiv.org/pdf/2102.11796}, year = {2021} }
Incomplete and probabilistic database techniques are principled methods for coping with uncertainty in data. Unfortunately, the class of queries that can be answered efficiently over such databases is severely limited, even when advanced approximation techniques are employed.We introduce attribute-annotated uncertain databases (AU-DBs), an uncertain data model that annotates tuples and attribute values with bounds to compactly approximate an incomplete database. AU-DBs are closed under relational algebra with aggregation using an efficient evaluation semantics. Using optimizations that trade accuracy for performance, our approach scales to complex queries and large datasets, and produces accurate results.
-
Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers
Su Feng, Aaron Huber, Boris Glavic and Oliver Kennedy
Proceedings of the 44th International Conference on Management of Data (2019), pp. 1313–1330.@inproceedings{FH19, author = {Feng, Su and Huber, Aaron and Glavic, Boris and Kennedy, Oliver}, booktitle = {Proceedings of the 44th International Conference on Management of Data}, keywords = {UA-DB; Vizier}, longversionurl = {https://arxiv.org/pdf/1904.00234}, pages = {1313-1330}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/FH19.pdf}, reproducibility = {https://github.com/IITDBGroup/UADB_Reproducibility}, projects = {Vizier; UA-DB}, video = {https://av.tib.eu/media/43062}, doi = {10.1145/3299869.3319887}, slideurl = {https://www.slideshare.net/lordPretzel/2019-sigmod-uncertainty-annotated-databases-a-lightweight-approach-for-approximating-certain-answers}, title = {Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers}, venueshort = {SIGMOD}, year = {2019} }
Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under- and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notion of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.
-
Data Debugging and Exploration with Vizier
Mike Brachmann, Carlos Bautista, Sonia Castelo, Su Feng, Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Müller, Rémi Rampin, William Spoth and Ying Yang
Proceedings of the 44th International Conference on Management of Data (Demonstration Track) (2019), pp. 1877–1880.@inproceedings{BB19, author = {Brachmann, Mike and Bautista, Carlos and Castelo, Sonia and Feng, Su and Freire, Juliana and Glavic, Boris and Kennedy, Oliver and M{\"u}ller, Heiko and Rampin, R{\'e}mi and Spoth, William and Yang, Ying}, booktitle = {Proceedings of the 44th International Conference on Management of Data (Demonstration Track)}, date-modified = {2019-04-04 12:25:42 -0500}, keywords = {Vizier}, pages = {1877-1880}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/BB19.pdf}, projects = {Vizier}, video = {https://www.youtube.com/watch?v=c3ICB-17kRY&t=4s}, doi = {10.1145/3299869.3320246}, title = {Data Debugging and Exploration with Vizier}, venueshort = {SIGMOD}, year = {2019} }
We present Vizier, a multi-modal data exploration and debugging tool. The system supports a wide range of operations by seamlessly integrating Python, SQL, and automated data curation and debugging methods. Using Spark as an execution backend, Vizier handles large datasets in multiple formats. Ease-of-use is attained through integration of a notebook with a spreadsheet-style interface and with visualizations that guide and support the user in the loop. In addition, native support for provenance and versioning enable collaboration and uncertainty management. In this demonstration we will illustrate the diverse features of the system using several realistic data science tasks based on real data.
-
GProM - A Swiss Army Knife for Your Provenance Needs
Bahareh Arab, Su Feng, Boris Glavic, Seokki Lee, Xing Niu and Qitian Zeng
IEEE Data Engineering Bulletin. 41, 1 (2018) , 51–62.@article{AF18, author = {Arab, Bahareh and Feng, Su and Glavic, Boris and Lee, Seokki and Niu, Xing and Zeng, Qitian}, bibsource = {dblp computer science bibliography, https://dblp.org}, biburl = {https://dblp.org/rec/bib/journals/debu/ArabFGLNZ17}, journal = {{IEEE} Data Engineering Bulletin}, keywords = {GProM; Provenance; Annotations}, number = {1}, pages = {51--62}, pdfurl = {http://sites.computer.org/debull/A18mar/p51.pdf}, projects = {GProM; Reenactment}, timestamp = {Fri, 02 Mar 2018 18:50:49 +0100}, title = {{GProM} - {A} Swiss Army Knife for Your Provenance Needs}, venueshort = {Data Eng. Bull.}, volume = {41}, year = {2018}, bdsk-url-1 = {http://sites.computer.org/debull/A18mar/p51.pdf} }
-
Debugging Transactions and Tracking their Provenance with Reenactment
Xing Niu, Boris Glavic, Seokki Lee, Bahareh Arab, Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy, Su Feng and Xun Zou
Proceedings of the VLDB Endowment (Demonstration Track). 10, 12 (2017) , 1857–1860.@article{NG17, author = {Niu, Xing and Glavic, Boris and Lee, Seokki and Arab, Bahareh and Gawlick, Dieter and Liu, Zhen Hua and Krishnaswamy, Vasudha and Feng, Su and Zou, Xun}, journal = {Proceedings of the VLDB Endowment (Demonstration Track)}, keywords = {Provenance; GProM; Reenactment; Debugging; Concurrency Control; Reenactment}, number = {12}, pages = {1857--1860}, pdfurl = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XG17.pdf}, projects = {GProM; Reenactment}, title = {Debugging Transactions and Tracking their Provenance with Reenactment}, venueshort = {PVLDB}, volume = {10}, year = {2017}, bdsk-url-1 = {http://cs.iit.edu/%7edbgroup/assets/pdfpubls/XG17.pdf} }