2014 Theses Doctoral
Hybrid System Combination for Machine Translation: An Integration of Phrase-level and Sentences-level Combination Approaches
Given the wide range of successful statistical MT approaches that have emerged recently, it would be beneficial to take advantage of their individual strengths and avoid their individual weaknesses. Multi-Engine Machine Translation (MEMT) attempts to do so by either fusing the output of multiple translation engines or selecting the best translation among them, aiming to improve the overall translation quality. In this thesis, we propose to use the phrase or the sentence as our combination unit instead of the word; three new phrase-level models and one sentence-level model with novel features are proposed. This contrasts with the most popular system combination technique to date which relies on word-level confusion network decoding.
Among the three new phrase-level models, the first one utilizes source sentences and target translation hypotheses to learn hierarchical phrases -- phrases that contain subphrases (Chiang 2007). It then re-decodes the source sentences using the hierarchical phrases to combine the results of multiple MT systems. The other two models we propose view combination as a paraphrasing process and use paraphrasing rules. The paraphrasing rules are composed of either string-to-string paraphrases or hierarchical paraphrases, learned from monolingual word alignments between a selected best translation hypothesis and other hypotheses. Our experimental results show that all of the three phrase-level models give superior performance in BLEU compared with the best single translation engine. The two paraphrasing models outperform the re-decoding model and the confusion network baseline model.
The sentence-level model exploits more complex syntactic and semantic information than the phrase-level models. It uses consensus, argument alignment, a supertag-based structural language model and a syntactic error detector. We use our sentence-level model in two ways: the first selects a translated sentence from multiple MT systems as the best translation to serve as a backbone for paraphrasing process; the second makes the final decision among all fused translations generated by the phrase-level models and all translated sentences of multiple MT systems. We proposed two novel hybrid combination structures for the integration of phrase-level and sentence-level combination frameworks in order to utilize the advantages of both frameworks and provide a more diverse set of plausible fused translations to consider.
- Ma_columbia_0054D_12377.pdf binary/octet-stream 3.01 MB Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- McKeown, Kathleen
- Ph.D., Columbia University
- Published Here
- October 15, 2014