1996 Presentations (Communicative Events)
Using word class for part-of-speech disambiguation
This paper presents a methodology for improving part-of-speech disambiguation using word classes. We build on earlier work for tagging French where we showed that statistical estimates can be computed without lexical probabilities. We investigate new directions for coming up with different kinds of probabilities based on paradigms of tags for given words. We base estimates not on the words, but on the set of tags associated with a word. We compute frequencies of unigrams, bigrams, and trigrams of word classes in order to further refine the disambiguation. This new approach gives a more efficient representation of the data in order to disambiguate word part-of-speech. We show empirical results to support our claim. We demonstrate that, besides providing good estimates for disambiguation, word classes solve some of the problems caused by sparse training data. We describe a part-of-speech tagger built on these principles and we suggest a methodology for developing an adequate training corpus.
Subjects
Files
- tzoukermann_radev_96.pdf application/pdf 239 KB Download File
More About This Work
- Academic Units
- Computer Science
- Publisher
- Fourth Workshop on Very Large Corpora (WVLC-4)
- Published Here
- April 26, 2013