Home

Predictive Dynamic Load Balancing of Parallel Hash-Joins over Heterogeneous Processors in the Presence of Data Skew

Hasanat M. Dewan; Salvatore Stolfo; Mauricio Hernandez; Kui W. Mok

Title:
Predictive Dynamic Load Balancing of Parallel Hash-Joins over Heterogeneous Processors in the Presence of Data Skew
Author(s):
Dewan, Hasanat M.
Stolfo, Salvatore
Hernandez, Mauricio
Mok, Kui W.
Date:
Type:
Technical reports
Department:
Computer Science
Permanent URL:
Series:
Columbia University Computer Science Technical Reports
Part Number:
CUCS-026-94
Publisher:
Department of Computer Science, Columbia University
Publisher Location:
New York
Abstract:
In this paper, we present new algorithms to balance the computation of parallel hash joins over heterogeneous processors in the presence of data skew and external loads. Heterogeneity in our model consists of disparate computing elements, as well as general purpose computing ensembles that are subject to external loading. Data skew appears as significant nonuniformities in the distribution of attribute values of underlying relations that are involved in a join. We develop cost models and predictive dynamic load balancing protocols to detect imbalance during the computation of a single large join. Our algorithms can account for imbalance due to data skew as well as heterogeneity in the computing environment. Significant performance gains are reported for a wide range of test cases on a prototype implementation of the system.
Subject(s):
Computer science
Item views:
92
Metadata:
text | xml

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services | Terms of Use