skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS

Thesis Proposal

 

"Incremental Physical Design in Column Store Databases"

Alex Rasin

Monday, November 30, 2009 at 9:30 A.M.

Lubrano Conference Room (CIT 4th floor)

There has been a significant amount of prior work on automating physical database design. The goal of an automated designer is to produce auxiliary structures that speed up user queries, while not using more than the allotted resource budget (typically disk space). Most existing research has been done in the context of commercial row store databases such as Microsoft SQL Server or IBM DB2. In fact, every commercial database ships with some sort of a tool that can provide design recommendations for the consideration of the DBA (database administrator). An automated tool is necessary not simply because a human DBA is not always available but because the complexity of the design problem is constantly increasing: new auxiliary structures and query processing methods are introduced in existing commercial databases, and more users and queries are being serviced.

We have done a lot of work on automating the database design process in a column store database. In our experiments, we primarily used Vertica, a commercial column store database that is based on a research prototype called C-Store that was developed at MIT, Brown, and Brandeis. Although, on the surface, it seems like we are simply changing the underlying storage system while the problem of designing the physical structures remains essentially the same, we have found that there are several fundamental differences that turn this into a new and unsolved problem. Many of the basic axioms that are used in a row store environment do not hold in a column store (and vice versa).

In this thesis we demonstrate the construction of an effective design tool for a column store like C-Store and a solution to the problem of incremental database design; i.e. a design in which the automated tool considers the cost of migrating to the target design Dt as yet another budgeted resource. Thus we might opt for an otherwise sub-optimal design Dt? in order to reduce the time spent transitioning to that target design. We show that some techniques from machine learning such as clustering can reduce and greatly simplify this design problem. To our knowledge there had been very little work on the problem of incremental design in the context of row stores and none in the column store context. Although we hope some of our solutions will be applicable in a row store setting, we think that column store databases are much better suited for implementing incremental design due to some of their inherent properties. Finally, we would like to show the benefits (and downsides) of opportunistic design migration ? when the design tool considers using the results of a current user-query computation to further shorten the design migration time.

Host: Stan Zdonik


Page Owner: Webmaster Last Modified: Wed Nov 4 09:55:26 2009