Academic Commons

Reports

Partitioned Blockmap Indexes for Multidimensional Data Access

Ross, Kenneth A.; Sitaridi, Evangelia

Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an in-memory column store tuned for on-line analytical processing or scientific data analysis. We propose a multidimensional data structure that contains a novel combination of a grid array and several bitmaps. The base data is clustered in an order matching that of the index structure. The bitmaps contain one bit per block of data, motivating the term "blockmap." The proposed data structures are compact, typically taking less than one bit of space per row of data. Partition boundaries can be chosen in a way that reflects both the query workload and the data distribution, and boundaries are not required to evenly divide the data if there is a bias in the query distribution. We examine the theoretical performance of the data structure and experimentally measure its performance on three modern CPUs and one GPU processor. We demonstrate that efficient multidimensional access can be achieved with minimal space overhead.

Subjects

Files

More About This Work

Academic Units
Computer Science
Publisher
Department of Computer Science, Columbia University
Series
Columbia University Computer Science Technical Reports, CUCS-006-12
Published Here
August 1, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.