Full-Text Indexing Based on Lexical Relations

Frank A. Smadja; Yoelle S. Maarek

Full-Text Indexing Based on Lexical Relations
Smadja, Frank A.
Maarek, Yoelle S.
Technical reports
Computer Science
Permanent URL:
Columbia University Computer Science Technical Reports
Part Number:
Department of Computer Science, Columbia University
Publisher Location:
New York
In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.
Computer science
Item views:
text | xml
Suggested Citation:
Frank A. Smadja, Yoelle S. Maarek, 1989, Full-Text Indexing Based on Lexical Relations, Columbia University Academic Commons, http://hdl.handle.net/10022/AC:P:12213.

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries | Terms of Use | Copyright