An Automatic Physical Design Tool for Clustered Column-stores
Alexander Rasin
DePaul University
Date and Location: Thursday, April 11th,
2013, 12:45pm - 1:45pm @ Stuart Building, Room 111.
Abstract
There has been a significant amount of prior work on automating
physical database design. The goal of an automated designer is to
produce auxiliary structures that speed up user queries, while not
using more than the allotted resource budget (typically disk space).
Most existing research has been done in the context of commercial
row store databases such as Microsoft SQL Server, IBM DB2 or Oracle.
In fact, every commercial database ships with some sort of a tool
that can provide design recommendations for the consideration of the
database administrator.
We have done a lot of work on automating the database design process
in a column-store database. In our experiments, we primarily used
Vertica, a commercial column-store database that is based on the
C-Store research prototype. Although, on the surface, it seems like
we are simply changing the underlying storage system while the
problem of designing the physical structures remains essentially the
same, we have found that there are several fundamental differences
that turn this into a new and unsolved problem. Many of the basic
axioms that are used in row-store design do not hold in column-store
setting (and vice versa). In this talk, we demonstrate the
construction of an effective design tool and an analytic cost model
for a column-store like C-Store. We show that some techniques from
machine learning such as clustering can reduce and simplify this
design problem. To our knowledge there had been little work on the
problem of physical design in the context of column-stores and none
in the context of column-stores like C-Store or Vertica.
Biography
Alexander Rasin is an Assistant Professor in the College of
Computing and Digital Media (CDM) at DePaul University. He received
his Ph.D. and M.Sc. in Computer Science from Brown University,
Providence. His current research centers on high-performance data
warehouses and large scale data analytics. Dr. Rasin's other
research interests include resource provisioning and high
availability guarantees in distributed systems..