Presentations (Communicative Events)

Tackling the Internet Glossary Glut: Automatic Extraction and Evaluation of Genus Phrases

Klavans, Judith L.; Popper, Samuel; Passonneau, Rebecca

This paper addresses the problem of developing methods to be used in the identification and extraction of meaningful semantic components from large online glossaries. We present two sets of results. First, we report on the algorithm, ParseGloss, which was used to analyze definitions, and extract the main concept, or genus phrase. We ran the system on over 12,000 online glossary entries. Second, we present a method to evaluate our results, using human judgments on a collection of definitions from six different sources. This paper discusses our approach to the evaluation process, since the creation of a standard for evaluation is in itself a contribution to the field. The methods we have developed have required addressing the significant challenges of abstracting a single gold standard from multiple naive, human judgments on a highly subjective task. Once the method for creating the standard was developed, we then established the gold standard data. We report on our performance in running ParseGloss over this controlled collection of definitions. Our first set of results presents precision and recall on system performance. Our second results are presented in terms of techniques for determining agreement between human subjects. Success in the ParseGloss algorithm will contribute to the automatic creation of ontologies.

Files

More About This Work

Academic Units
Computer Science
Publisher
SIGIR'03 Workshop on Semantic Web
Published Here
May 17, 2013