Academic Commons

Presentations (Communicative Events)

Corpus Creation for New Genres: A Crowdsourced Approach to PP Attachment

Jha, Mukund; Andreas, Jacob; Thadani, Kapil; Rosenthal, Sara; McKeown, Kathleen

This paper explores the task of building an accurate prepositional phrase attachment corpus for new genres while avoiding a large investment in terms of time and money by crowdsourcing judgments. We develop and present a system to extract prepositional phrases and
their potential attachments from ungrammatical and informal sentences and pose the subsequent disambiguation tasks as multiple choice
questions to workers from Amazon’s Mechanical Turk service. Our analysis shows that this two-step approach is capable of producing
reliable annotations on informal and potentially noisy blog text, and this semi-automated strategy holds promise for similar annotation
projects in new genres.

Subjects

Files

More About This Work

Academic Units
Computer Science
Published Here
April 29, 2013