Reports

Serving datacube Tuples from Main Memory

Ross, Kenneth A.; Zaman, Kazi A.

Datacubes compute aggregates of a set of database records at a variety of different granularities. For large datasets with many dimensions, the complete datacube may be very large. In order to support on-line access to datacube results, one would like to perform some precomputation to enhance query performance. Existing schemes materialize selected datacube tuples on disk, choosing the most beneficial cuboids (i.e., combinations of dimensions) to materialize given a space limit. However, in the context of a data-warehouse receiving frequent "append" updates to the database, the cost of keeping these disk-resident cuboids up-to-date can be high. In this paper, we propose a main memory based framework which provides rapid response to queries and requires considerably less maintenance cost than a disk based scheme in an append-only environment. We materialize in main memory (a) selected coarse-granularity tuples of the datacube, and (b) all tuples at the finest level of granularity of the cube. Our approach is limited to applications in which the finest granularity tuples of the datacube fit in main memory. We argue that there are important applications that meet this requirement. Further, as main memory sizes grow over the coming years, more and more applications will meet this requirement. For a given datacube query, we first look among our coarse-granularity tuples to see if we have a direct answer for the query. If so, that answer is returned directly. If not, we use a hash based scheme reminiscent of partial match retrieval to rapidly compute the answer to the query from the finest-level data without having to scan all of the base tuples. Our in-memory data structures allow for rapid updates in response to the appearance of new base data. We show how to choose which coarse-level tuples to precompute, and how to select partial-match keys to minimize the effects of skew. We present analytical and experimental results demonstrating the benefits of our approach.

Subjects

Files

More About This Work

Academic Units
Computer Science
Publisher
Department of Computer Science, Columbia University
Series
Columbia University Computer Science Technical Reports, CUCS-014-00
Published Here
April 22, 2011