A Modern Standard Arabic Closed-Class Word List

Salloum, Wael Sameer; Habash, Nizar Y.

This document describes a list of Modern Standard Arabic closed-class words, which can be used as a stop list for a variety of natural language processing applications. The list contains 740 inflected words and clitics in the Arabic Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Habash, 2010). The inflected words are based on 309 lemmas from the Standard Arabic Morphological Analyzer, SAMA (Graff et al., 2009). To get a copy of the full list, please contact the authors.


More About This Work

Academic Units
Center for Computational Learning Systems
Center for Computational Learning Systems, Columbia University
CCLS Technical Report, CCLS-12-03
Published Here
July 13, 2012