Academic Commons

Reports

A Modern Standard Arabic Closed-Class Word List

Salloum, Wael Sameer; Habash, Nizar Y.

This document describes a list of Modern Standard Arabic closed-class words, which can be used as a stop list for a variety of natural language processing applications. The list contains 740 inflected words and clitics in the Arabic Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Habash, 2010). The inflected words are based on 309 lemmas from the Standard Arabic Morphological Analyzer, SAMA (Graff et al., 2009). To get a copy of the full list, please contact the authors.

Files

More About This Work

Academic Units
Center for Computational Learning Systems
Publisher
Center for Computational Learning Systems, Columbia University
Series
CCLS Technical Report, CCLS-12-03
Published Here
July 13, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.