A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Fung, Pascale; McKeown, Kathleen

Technical-term translation represents one of the most difficult tasks for human translators since (1) most translators are not familiar with terms and domain-specific terminology and (2) such terms are not adequately covered by printed dictionaries. This paper describes an algorithm for translating technical words and terms from noisy parallel corpora across language groups. Given any word which is part of a technical term in the source language, the algorithm produces a ranked candidate match for it in the target language. Potential translations for the term are compiled from the matched words and are also ranked. We show how this ranked list helps translators in technical-term translation. Most algorithms for lexical and term translation focus on Indo-European language pairs, and most use a sentence-aligned clean parallel corpus without insertion, deletion or OCR noise. Our algorithm is language- and character-set-independent, and is robust to noise in the corpus. We show how our algorithm requires minimum preprocessing and is able to obtain technical-word translations without sentence-boundary identification or sentence alignment, from the English–Japanese awk manual corpus with noise arising from text insertions or deletions and on the English–Chinese HKUST bilingual corpus. We obtain a precision of 55.35% from the awk corpus for word translation including rare words, counting only the best candidate and direct translations. Translation precision of the best-candidate translation is 89.93% from the HKUST corpus. Potential term translations produced by the program help bilingual speakers to get a 47% improvement in translating technical terms.


Also Published In

Machine Translation

More About This Work

Academic Units
Computer Science
Published Here
April 26, 2013