Academic Commons

Theses Doctoral

A Nearest-Neighbor Approach to Indicative Web Summarization

Petinot, Yves

Through their role of content proxy, in particular on search engine result pages, Web summaries play an essential part in the discovery of information and services on the Web. In their simplest form, Web summaries are snippets based on a user-query and are obtained by extracting from the content of Web pages. The focus of this work, however, is on indicative Web summarization, that is, on the generation of summaries describing the purpose, topics and functionalities of Web pages. In many scenarios — e.g. navigational queries or content-deprived pages — such summaries represent a valuable commodity to concisely describe Web pages while circumventing the need to produce snippets from inherently noisy, dynamic, and structurally complex content. Previous approaches have identified linking pages as a privileged source of indicative content from which Web summaries may be derived using traditional extractive methods. To be reliable, these approaches require sufficient anchortext redundancy, ultimately showing the limits of extractive algorithms for what is, fundamentally, an abstractive task. In contrast, we explore the viability of abstractive approaches and propose a nearest-neighbors summarization framework leveraging summaries of conceptually related (neighboring) Web pages. We examine the steps that can lead to the reuse and adaptation of existing summaries to previously unseen pages. Specifically, we evaluate two Text-to-Text transformations that cover the main types of operations applicable to neighbor summaries: (1) ranking, to identify neighbor summaries that best fit the target; (2) target adaptation, to adjust individual neighbor summaries to the target page based on neighborhood-specific template-slot models. For this last transformation, we report on an initial exploration of the use of slot-driven compression to adjust adapted summaries based on the confidence associated with token-level adaptation operations. Overall, this dissertation explores a new research avenue for indicative Web summarization and shows the potential value, given the diversity and complexity of the content of Web pages, of transferring, and, when necessary, of adapting, existing summary information between conceptually similar Web pages.

Files

  • thumnail for Petinot_columbia_0054D_13267.pdf Petinot_columbia_0054D_13267.pdf binary/octet-stream 2.43 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
McKeown, Kathleen
Degree
Ph.D., Columbia University
Published Here
May 6, 2016
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.