Academic Commons


Improved arabic-to-english statistical machine translation by reordering post-verbal subjects for word alignment

Carpuat, Marine; Marton, Yuval; Habash, Nizar Y.

We study challenges raised by the order of Arabic verbs and their subjects in statistical machine translation (SMT). We show that the boundaries of post-verbal subjects (VS) are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. In addition, VS constructions have highly ambiguous reordering patterns when translated to English, and these patterns are very different for matrix (main clause) VS and non-matrix (subordinate clause) VS. Based on this analysis, we propose a novel method for leveraging VS information in SMT: we reorder VS constructions into pre-verbal (SV) order for word alignment. Unlike previous approaches to source-side reordering, phrase extraction and decoding are performed using the original Arabic word order. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline. Limiting reordering to matrix VS yields further improvements.



  • thumnail for 10.1007_s10590-011-9112-y.pdf 10.1007_s10590-011-9112-y.pdf application/pdf 310 KB Download File

Also Published In

Machine Translation

More About This Work

Academic Units
Computer Science
Published Here
April 24, 2013
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.