IIT Database Group

Header bar
Fork me on GitHub

HRDBMS

HRDBMS is a novel distributed relational database that combines the best of traditional (distributed) relational databases with ideas from modern distributed dataflow engines such as Hadoop or Spark. This allows HRDBMS to take advantage of years worth of research regarding query optimization, while also taking advantage of the scalability of Big Data platforms. The system was build from ground up to avoid many of the bottlenecks of SQL on Hadoop and Spark as well as the scalability issues of most traditional relational DBMS. The ultimate goal is to build a system that combines the per node performance of relational databases with the scalability of Big Data platforms. Some of the unique and not so unique features of HRDBMS are:

  • A cost-based query optimizer
  • Fully parallel and distributed execution engine
  • Support for index structures
  • Automatic caching through a rather traditional buffer manager
  • Support for efficient disk-based query execution using proven traditional query execution algorithms
  • Support for transactions
  • A non-blocking shuffle implementation
  • Support for horizontal partitioning and locality-aware query processing

Collaborators

Publications

  1. A High-Performance Distributed Relational Database System for Scalable OLAP Processing
    Jason Arnold, Boris Glavic and Ioan Raicu
    Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (2019).
    details
  2. Improving Data-Shuffle Performance In Data-Parallel Distributed Systems
    Shweelan Samson
    Illinois Institute of Technology.
    details
  3. HRDBMS: A NewSQL Database for Analytics
    Jason Arnold, Boris Glavic and Ioan Raicu
    Proceedings of the IEEE International Conference on Cluster Computing (Poster) (2015).
    details