Academic Commons

Data (Information)

MASC Word Sense Sentence Corpus, tab-separated format

Passonneau, Rebecca; Ide, Nancy; Baker, Collin; Fellbaum, Christiane; Xie, Boyi

Synopsis: The MASC Word Sense Sentence corpus is distributed as a set of three *tsv files (tab-separated format) that contain the sentences, annotation labels, and senses that comprise the sentence corpus: (1) the annotation labels (masc_annotations.tsv), (2) the WordNet word senses (masc_senses.tsv), and (3) the word token-sentence pairs, or instances (masc_sentences.tsv). A total of 116 distinct lemmas were selected; for each lemma, approximately 1000 example sentences were taken from the MASC corpus; and for each word in its sentence context, a trained annotator assigned a WordNet sense (WordNet version 3.1) as the annotation label. The following README describes the data in detail.

Subjects

Files

  • thumnail for masc_word_sense_sentence_corpus.V1.0.tar.gz masc_word_sense_sentence_corpus.V1.0.tar.gz application/x-gzip 7.67 MB Download File

More About This Work

Academic Units
Computer Science
Published Here
June 24, 2014

Notes

This is a dataset on word sense annotation with a comprehensive README file that describes the contents.