crep: a regular expression-matching textual corpus tool

Duford, Darrin

Crep is a UNIX2 tool which searches either a tagged or free textual corpus file and outputs each sentence that matches the specified regular expression provided by the user as a parameter. The expression consists of user-defined regular expressions of words and/or part-of-speech tags. The purpose of crep is to make the searches faster and easier than by either a) searching through corpora by hand; or b) constructing a lexical scanner for each specific search. crep achieves this facilitation by offering the user a simple expression syntax, from which it automatically constructs an appropriate scanner. The user therefore has the ability to execute a whole search in one command, invoking implicitly and explicitly several tools, including a sentence delimiter, a part of speech tagger (developed by Ken Church at AT&T Bell Laboratories), and various output filters.



More About This Work

Academic Units
Computer Science
Department of Computer Science, Columbia University
Columbia University Computer Science Technical Reports, CUCS-005-93
Published Here
January 20, 2012