Academic Commons


crep: a regular expression-matching textual corpus tool

Duford, Darrin

Crep is a UNIX2 tool which searches either a tagged or free textual corpus file and outputs each sentence that matches the specified regular expression provided by the user as a parameter. The expression consists of user-defined regular expressions of words and/or part-of-speech tags. The purpose of crep is to make the searches faster and easier than by either a) searching through corpora by hand; or b) constructing a lexical scanner for each specific search. crep achieves this facilitation by offering the user a simple expression syntax, from which it automatically constructs an appropriate scanner. The user therefore has the ability to execute a whole search in one command, invoking implicitly and explicitly several tools, including a sentence delimiter, a part of speech tagger (developed by Ken Church at AT&T Bell Laboratories), and various output filters.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-005-93
Published Here
January 20, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.