2014 Data (Information)
MASC Word Sense Sentence Corpus, tab-separated format
Synopsis: The MASC Word Sense Sentence corpus is distributed as a set of three *tsv files (tab-separated format) that contain the sentences, annotation labels, and senses that comprise the sentence corpus: (1) the annotation labels (masc_annotations.tsv), (2) the WordNet word senses (masc_senses.tsv), and (3) the word token-sentence pairs, or instances (masc_sentences.tsv). A total of 116 distinct lemmas were selected; for each lemma, approximately 1000 example sentences were taken from the MASC corpus; and for each word in its sentence context, a trained annotator assigned a WordNet sense (WordNet version 3.1) as the annotation label. The following README describes the data in detail.
Subjects
Files
- masc_word_sense_sentence_corpus.V1.0.tar.gz application/gzip 7.67 MB Download File
More About This Work
- Academic Units
- Computer Science
- Published Here
- June 24, 2014
Notes
This is a dataset on word sense annotation with a comprehensive README file that describes the contents.