2009 Reports
Rank-Aware Subspace Clutering for Structured Datasets
In online applications such as Yahoo! Personals and Trulia.com users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in the form of a ranked list. Top results in ranked lists are typically homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with different characteristics. An alternative to ranking is to group matches on common attribute values (e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths). However, not all groups will be of interest to the user given the ranking criteria. We argue here that neither single-list ranking nor attribute-based grouping is adequate for effective exploration of ranked datasets. We formalize rank-aware clustering and develop a novel rank-aware bottom-up subspace clustering algorithm. We evaluate the performance of our algorithm over large datasets from a leading online dating site, and present an experimental evaluation of its effectiveness.
Subjects
Files
-
cucs-043-09.pdf application/pdf 3.29 MB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-043-09
- Published Here
- July 16, 2010