Combining corpus and machine-readable dictionary data for building bilingual lexicons

Klavans, Judith L.; Tzoukermann, Evelyne

This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources into a single lexical data structure. The BICORD system (BIlingual CORpus-enhanced Dictionaries) involves linking entries in Collins English-French and French-English bilingual dictionary with a large English-French and French-English bilingual corpus. We have concentrated on the class of action verbs of movement, building on earlier work on lexical correspondences specific to this verb class between languages (Klavans and Tzoukermann, 1989), (Klavans and Tzoukermann, 1990a), (Klavans and Tzoukermann, 1990b). We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Atkins, Duval, and Milne, 1978) bilingual dictionary, and then analyze the behavior of some of these verbs in a large bilingual corpus. We incorporate the results of linguistic research on the theory of verb types to motivate corpus analysis coupled with data from MRDs for the purpose of establishing lexical correspondences with the full range of associated translations, and with statistical data attached to the relevant nodes.


  • thumnail for klavans_tzoukermann_96.pdf klavans_tzoukermann_96.pdf application/pdf 167 KB Download File

Also Published In

Machine Translation

More About This Work

Academic Units
Computer Science
Published Here
April 26, 2013