Automatic Identification of Errors in Arabic Handwriting Recognition
- Automatic Identification of Errors in Arabic Handwriting Recognition
- Habash, Nizar
Habash, Nizar Y.
Roth, Ryan M.
- Center for Computational Learning Systems
- Persistent URL:
- CCLS Technical Report
- Part Number:
- Center for Computational Learning Systems, Columbia University
- Publisher Location:
- New York
- Arabic handwriting recognition (HR) is a challenging problem due to Arabic's connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. We also consider a learning curve varying in two dimensions: number of segments and number of n-best hypotheses to train on. We additionally evaluate the performance on different test sets with different degrees of errors in them. Our best approach achieves a roughly ~20% absolute increase in F-score over a simple but reasonable baseline. A detailed error analysis shows that linguistic features, such as lemma models, help improve HR-error detection precisely where we expect them to: semantically inconsistent error words.
- Computer science
- Item views
text | xml
- Suggested Citation:
- Nizar Habash, Ryan Roth, Nizar Y. Habash, Ryan M. Roth, 2010, Automatic Identification of Errors in Arabic Handwriting Recognition, Columbia University Academic Commons, https://doi.org/10.7916/D8SQ95Q9.