Academic Commons

Articles

Improved arabic-to-english statistical machine translation by reordering post-verbal subjects for word alignment

Carpuat, Marine; Marton, Yuval; Habash, Nizar Y.

We study challenges raised by the order of Arabic verbs and their subjects in statistical machine translation (SMT). We show that the boundaries of post-verbal subjects (VS) are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. In addition, VS constructions have highly ambiguous reordering patterns when translated to English, and these patterns are very different for matrix (main clause) VS and non-matrix (subordinate clause) VS. Based on this analysis, we propose a novel method for leveraging VS information in SMT: we reorder VS constructions into pre-verbal (SV) order for word alignment. Unlike previous approaches to source-side reordering, phrase extraction and decoding are performed using the original Arabic word order. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline. Limiting reordering to matrix VS yields further improvements.

Subjects

Files

  • thumnail for 10.1007_s10590-011-9112-y.pdf 10.1007_s10590-011-9112-y.pdf application/pdf 310 KB Download File

Also Published In

Title
Machine Translation
DOI
https://doi.org/10.1007/s10590-011-9112-y

More About This Work

Academic Units
Computer Science
Published Here
April 24, 2013