Approximate String Joins in a Database (Almost) for Free -- Erratum

Luis Gravano; H. V. Jagadish; Panagiotis G. Ipeirotis; Divesh Srivastava; Nick Koudas; S. Muthukrishnan

Approximate String Joins in a Database (Almost) for Free -- Erratum
Gravano, Luis
Jagadish, H. V.
Ipeirotis, Panagiotis G.
Srivastava, Divesh
Koudas, Nick
Muthukrishnan, S.
Technical reports
Computer Science
Permanent URL:
Columbia University Computer Science Technical Reports
Part Number:
Department of Computer Science, Columbia University
Publisher Location:
New York
In [GIJ+01a, GIJ+01b] we described how to use q-grams in an RDBMS to perform approximate string joins. We also showed how to implement the approximate join using plain SQL queries. Specifically, we described three filters, count filter, position filter, and length filter, which can be used to execute efficiently the approximate join. The intuition behind the count filter was that strings that are similar have many q-grams in common. In particular, two strings s1 and s2 can have up to maxfjs1j; js2jg + q ¡ 1 common q-grams. When s1 = s2, they have exactly that many q-grams in common. When s1 and s2 are within edit distance k, they share at least (maxfjs1j; js2jg + q ¡ 1) ¡ kq q-grams, since kq is the maximum numbers of q-grams that can be affected by k edit distance operations.
Computer science
Item views:
text | xml
Suggested Citation:
Luis Gravano, H. V. Jagadish, Panagiotis G. Ipeirotis, Divesh Srivastava, Nick Koudas, S. Muthukrishnan, 2003, Approximate String Joins in a Database (Almost) for Free -- Erratum, Columbia University Academic Commons, http://hdl.handle.net/10022/AC:P:29184.

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services | Terms of Use | Copyright