Academic Commons

Presentations (Communicative Events)

Tagging French Without Lexical Probabilities - Combining Linguistic Knowledge And Statistical Learning

Radev, Dragomir R.; Tzoukermann, Evelyne; Gale, William A.

This paper explores morpho-syntactic ambiguities for French to develop a strategy for part-of-speech disambiguation that a) reflects the complexity of French as an inflected language, b) optimizes the estimation of probabilities, c) allows the user flexibility in choosing a tagset. The problem in extracting lexical probabilities from a limited training corpus is that the statistical model may not necessarily represent the use of a particular word in a particular context. In a highly morphologically inflected language, this argument is particularly serious since a word can be tagged with a large number of parts of speech.

Files

More About This Work

Academic Units
Computer Science
Publisher
Natural Language Processing Using Very Large Corpora
Published Here
May 3, 2013