Translating Collocations for Bilingual Lexicons: A Statistical Approach

McKeown, Kathleen; Smadja, Frank; Hatzivassiloglou, Vasileios

Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and cannot be translated on a word-by-word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, automatically produces their translations. Our goal is to provide a tool for compiling bilingual lexical information above the word level in multiple languages, for different domains. The algorithm we use is based on statistical methods and produces p-word translations of n-word collocations in which n and p need not be the same. For example, Champollion translates make...decision, employment equity, and stock market into prendre...décision, équité en matière d'emploi, and bourse respectively. Testing Champollion on three years' worth of the Hansards corpus yielded the French translations of 300 collocations for each year, evaluated at 73% accuracy on average. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.



  • thumnail for p1-Translating_Collocations_for_Bilingual.pdf p1-Translating_Collocations_for_Bilingual.pdf application/pdf 2.7 MB Download File

Also Published In

Journal Computational Linguistics

More About This Work

Academic Units
Computer Science
Published Here
April 8, 2013