Technical reports:
Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
Wael Sameer Salloum; Nizar Y. Habash
Downloads:
- Title:
- Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
- Author(s):
-
Salloum, Wael Sameer
Habash, Nizar Y. - Date:
- 2011
- Type:
- Technical reports
- Department:
- Center for Computational Learning Systems
- Permanent URL:
- http://hdl.handle.net/10022/AC:P:10283
- Series:
- CCLS Technical Report
- Part Number:
- CCLS-11-01
- Publisher:
- Center for Computational Learning Systems, Columbia University
- Publisher Location:
- New York
- Abstract:
- This paper is interested in improving the quality of Arabic-English statistical machine translation (SMT) on highly dialectal Arabic text using morphological knowledge. We present a light-weight rule-based approach to producing Modern Standard Arabic (MSA) paraphrases of dialectal Arabic out-of-vocabulary words and low frequency words. Our approach extends an existing MSA analyzer with a small number of morphological clitics and transfer rules. The generated paraphrase lattices are input to a state-of-the-art phrase-based SMT system resulting in improved BLEU scores on a blind test set by 0.56 absolute BLEU (or 1.5% relative).
- Subject(s):
- Computer science
- Item views:
- 561