Parallel Probing of Web Databases for Top-k Query Processing

Marian, Amelie; Gravano, Luis

A top-k query specifies a set of preferred values for the attributes of a relation and expects as a result the k objects that are closest to the given preferences according to some distance function. In many web applications, the relation attributes are only available via probes to autonomous web-accessible sources. Probing these sources sequentially to process a top-k query is inefficient, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. These characteristics of web sources motivate the introduction of parallel top-k query processing strategies, which are the focus of this paper. We present efficient techniques that maximize source-access parallelism to minimize query response time, while satisfying source access constraints. A thorough experimental evaluation over both synthetic and real web sources shows that our techniques can be significantly more efficient than previously proposed sequential strategies. In addition, we adapt our parallel algorithms for the alternate optimization goal of minimizing source load while still exploiting source-access parallelism.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-008-03
Published Here
April 26, 2011