Academic Commons

Reports

Dictionaries for Language Generation Accounting for Co-occurrence Knowledge

Smadja, Frank A.

Many wording choices in English sentences cannot be accounted for on semantic or syntactic grounds. They can only be expressed in terms of co-occurrence relations. Co-occurrence knowledge has been traditionally overlooked in the past, but should be included in lexicons as it is an inherent part of the language. In this paper, we demonstrate the importance of co-occurrence knowledge for language generation and we show how to include it in computational dictionaries. Using co-occurrence knowledge in the dictionary provides the generator with the information necessary for handling many lexical decisions that were previously ignored. We focus here on the process of building the dictionary, and we show how co-occurrence knowledge can be systematically entered in lexicons. Lexical relations are first identified by a co-occurrence compiler, EXTRACT. Then, domain specific semantic information is used as a criterion for classifying them. We exemplify our approach in the banking domain and we explain how it can be used by a natural language generator.

Subjects

Files

More About This Work

Academic Units
Computer Science
Publisher
Department of Computer Science, Columbia University
Series
Columbia University Computer Science Technical Reports, CUCS-418-89
Published Here
December 22, 2011
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.