Home

Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation

Wael Sameer Salloum; Nizar Y. Habash

Title:
Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
Author(s):
Salloum, Wael Sameer
Habash, Nizar Y.
Date:
Type:
Technical reports
Department:
Center for Computational Learning Systems
Permanent URL:
Series:
CCLS Technical Report
Part Number:
CCLS-11-01
Publisher:
Center for Computational Learning Systems, Columbia University
Publisher Location:
New York
Abstract:
This paper is interested in improving the quality of Arabic-English statistical machine translation (SMT) on highly dialectal Arabic text using morphological knowledge. We present a light-weight rule-based approach to producing Modern Standard Arabic (MSA) paraphrases of dialectal Arabic out-of-vocabulary words and low frequency words. Our approach extends an existing MSA analyzer with a small number of morphological clitics and transfer rules. The generated paraphrase lattices are input to a state-of-the-art phrase-based SMT system resulting in improved BLEU scores on a blind test set by 0.56 absolute BLEU (or 1.5% relative).
Subject(s):
Computer science
Item views:
996
Metadata:
text | xml

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services | Terms of Use