2009-09-11
Balancing Vectorized Query Execution with Bandwidth-Optimized Storage
Publication
Publication
With the performance of modern computers improving at a rapid pace, database
technology has problems with fully exploiting the benefits that each new hardware
generation brings. This has caused a significant performance gap between
general-purpose databases and specialized, application-optimized solutions for
large-volume computation-intensive processing problems, as found in areas including
information retrieval, scientific data management and decision support.
This thesis attempts to enhance the state-of-the-art in architecture-conscious
database research, both in the query execution layer as well as in the data storage
layer, and in the way these work together. Thus, rather than focusing on
an isolated problem or algorithm, the thesis presents a new database system
architecture, realized in the MonetDB/X100 prototype, that combines a coherent
set of new architecture-conscious techniques that are designed to work well
together.
The motivation for the new query execution layer comes from the analysis
of the problems of two popular approaches to query processing: tuple-at-a-time
operator pipelining, used in most existing systems, and column-at-a-time materializing
operators, found in MonetDB. MonetDB/X100 proposes a new vectorized
in-cache execution model, that exploits ideas from both approaches, and combines
the scalability of the former with the high-performance bulk processing of
the latter. This is achieved by modifying the traditional operator pipeline model
to operate on cache-resident vectors of data using highly optimized primitive
functions. Additionally, within this architecture, a set of hardware-conscious
design and programming techniques is presented, enabling efficient execution
of typical data processing tasks. The resulting query execution layer efficiently
exploits modern super-scalar CPUs and cache-memory systems and achieves
in-memory performance often one or two orders of magnitude higher than the
existing approaches.
In the storage area there are two hardware trends that significantly influence
219
220 Summary
database performance. First, the imbalance between sequential disk bandwidth
and random disk latency continuously increases. As a result, access methods that
rely on random I/O become less attractive, making various forms of sequential
access the preferred option. MonetDB/X100 follows this idea with ColumnBM
– a bandwidth-optimized column store. Secondly, both disk bandwidth and latency
improve significantly slower than the computing power of modern CPUs,
especially with the advent of multi-core CPUs. ColumnBM introduces two techniques
that address this issue. Lightweight in-cache compression allows trading
some processor time for an increased perceived disk bandwidth. High decompression
performance is achieved by applying the decompression on the RAMcache
boundary, providing cache-resident data directly to the execution layer.
Additionally, introduced family of compression methods provides performance
an order of magnitude higher than previous solutions. Cooperative scans observe
current system activity and dynamically schedule I/O operations to exploit overlapping
demands of different queries. This allows to amortize the cost of disk
access among multiple consumers, and also better utilize the available buffer
space, providing much better performance with many concurrently executing
queries.
By combining the CPU-efficient processing with a bandwidth-optimized storage
facility, MonetDB/X100 has been able to achieve its high in-memory raw
query execution power also on huge disk-resident datasets. We evaluated its
performance both on TPC-H decision support data sets as well as in the area
of large-volume information retrieval (the Terabyte TREC task), where it successfully
competed with the specialized solutions, both for in-memory and diskbased
tasks.
Additional Metadata | |
---|---|
M.L. Kersten (Martin) | |
Universiteit van Amsterdam | |
hdl.handle.net/11245/1.307784 | |
SIKS Dissertation Series ; 2009-30 | |
Organisation | Database Architectures |
Zukowski, M. (2009, September 11). Balancing Vectorized Query Execution with Bandwidth-Optimized Storage. SIKS Dissertation Series. Retrieved from http://hdl.handle.net/11245/1.307784 |