Academic Commons

Reports

Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation

Salloum, Wael Sameer; Habash, Nizar Y.

This paper is interested in improving the quality of Arabic-English statistical machine translation (SMT) on highly dialectal Arabic text using morphological knowledge. We present a light-weight rule-based approach to producing Modern Standard Arabic (MSA) paraphrases of dialectal Arabic out-of-vocabulary words and low frequency words. Our approach extends an existing MSA analyzer with a small number of morphological clitics and transfer rules. The generated paraphrase lattices are input to a state-of-the-art phrase-based SMT system resulting in improved BLEU scores on a blind test set by 0.56 absolute BLEU (or 1.5% relative).

Subjects

Files

More About This Work

Academic Units
Center for Computational Learning Systems
Publisher
Center for Computational Learning Systems, Columbia University
Series
CCLS Technical Report, CCLS-11-01
Published Here
May 6, 2011
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.