Academic Commons

Data (Information)

MASC Word Sense Sentence Corpus, Crowdsourced subset

Passonneau, Rebecca; Carpenter, Bob

The MASC Word Sense Sentence corpus, Crowdsourced subset, is distributed as a set of three *tsv files (tab-separated format) that contain the sentences, annotation labels, and WordNet senses of the corpus. For 45 of the 116 words used from the original MASC Word Sense Sentence corpus (http://dx.doi.org/10.7916/D80V89XH), there are up to 1000 sentences per word drawn from the heterogeneous MASC corpus, with sense labels from WordNet. Each sentence exemplifies at least one MASC word, annotated for its WordNet sense. Each word/sentence pair has up to 25 crowdsourced sense labels collected on Amazon Mechanical Turk.

Subjects

Files

  • thumnail for MASC_AMT_RELEASE_1.0.tar.gz MASC_AMT_RELEASE_1.0.tar.gz application/x-gzip 6.89 MB Download File

More About This Work

Academic Units
Computer Science
Published Here
June 27, 2014

Notes

This is a dataset on word sense annotation with a comprehensive README file that describes the contents.

Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.