Partitioned Blockmap Indexes for Multidimensional Data Access

Ross, Kenneth A.; Sitaridi, Evangelia

Given recent increases in the size of main memory in modern machines, it is now common to to store large data sets in RAM for faster processing. Multidimensional access methods aim to provide efficient access to large data sets when queries apply predicates to some of the data dimensions. We examine multidimensional access methods in the context of an in-memory column store tuned for on-line analytical processing or scientific data analysis. We propose a multidimensional data structure that contains a novel combination of a grid array and several bitmaps. The base data is clustered in an order matching that of the index structure. The bitmaps contain one bit per block of data, motivating the term "blockmap." The proposed data structures are compact, typically taking less than one bit of space per row of data. Partition boundaries can be chosen in a way that reflects both the query workload and the data distribution, and boundaries are not required to evenly divide the data if there is a bias in the query distribution. We examine the theoretical performance of the data structure and experimentally measure its performance on three modern CPUs and one GPU processor. We demonstrate that efficient multidimensional access can be achieved with minimal space overhead.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-006-12
Published Here
August 1, 2012