Cache Conscious Indexing for Decision-Support in Main Memory

Rao, Jun; Ross, Kenneth A.

As random access memory gets cheaper, it becomes increasingly affordable to build computers with large main memories.We consider decision support workloads within the context of a main memory database system, and consider ways to enhance the performance of query evaluation. Indexes can potentially speed up a variety of operations in a database system. In our context, we are less concerned about incremental updating of the index, since we can rebuild indexes in response to batch updates relatively quickly. Our primary concerns are the time taken for a look-up, and the space occupied by the index structure. We study indexing techniques for main memory, including hash indexes,binary search trees, T-trees, B+-trees, interpolation search, and binary search on arrays. At one extreme, hash-indexes provide fast look-up but require a lot of space. At another extreme, binary search on an array requires no additional space beyond the array, but performs relatively poorly. An analysis of binary search on an array shows that it has poor reference locality, leading to many cache misses and slow look-up times. Our goal is to provide faster look-up times than binary search by paying better attention to reference locality and cache behavior,without using substantial extra space. We propose a new indexing technique called 'Cache-Sensitive Search Trees' (CSS-trees). Our technique stores a directory structure on top of a sorted array. The directory represents a balanced search tree stored itself as an array.Nodes in this search tree are designed to have size matching the cache-line size of the machine. Because we store the tree in an array Nodes in this search tree are designed to have size matching the cache-line size of the machine. Because we store the tree in an array structure, we do not have to store internal-node pointers; child nodes can be found by performing arithmetic on array offsets. We provide an analytic comparison of the algorithms based on their time and space requirements. We have implemented all of the techniques, and present a performance study on two popular modern machines. We demonstrate that with a small space overhead, we can reduce the cost of binary search on the array by more than a factor of two. We also show that our technique dominates B+-trees, T-trees, and binary search trees in terms of both space and time. Our performance graphs show significantly different rankings of algorithms than shown in a 1986 study by Lehman and Carey. The explanation of the difference is that during the last twelve years, the relative cost of a cache miss to that of a CPU instruction cycle has increased by two orders of magnitude. As a result, it is now much more important to design techniques with good cache behavior.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-019-98
Published Here
April 25, 2011