2012 Reports
A Modern Standard Arabic Closed-Class Word List
This document describes a list of Modern Standard Arabic closed-class words, which can be used as a stop list for a variety of natural language processing applications. The list contains 740 inflected words and clitics in the Arabic Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Habash, 2010). The inflected words are based on 309 lemmas from the Standard Arabic Morphological Analyzer, SAMA (Graff et al., 2009). To get a copy of the full list, please contact the authors.
Subjects
Files
- CCLS-12-03.pdf application/pdf 486 KB Download File
More About This Work
- Academic Units
- Center for Computational Learning Systems
- Publisher
- Center for Computational Learning Systems, Columbia University
- Series
- CCLS Technical Report, CCLS-12-03
- Published Here
- July 13, 2012