Academic Commons

Presentations (Communicative Events)

Corpus Creation for New Genres: A Crowdsourced Approach to PP Attachment

Jha, Mukund; Andreas, Jacob; Thadani, Kapil; Rosenthal, Sara; McKeown, Kathleen

This paper explores the task of building an accurate prepositional phrase attachment corpus for new genres while avoiding a large investment in terms of time and money by crowdsourcing judgments. We develop and present a system to extract prepositional phrases and
their potential attachments from ungrammatical and informal sentences and pose the subsequent disambiguation tasks as multiple choice
questions to workers from Amazon’s Mechanical Turk service. Our analysis shows that this two-step approach is capable of producing
reliable annotations on informal and potentially noisy blog text, and this semi-automated strategy holds promise for similar annotation
projects in new genres.

Subjects

Files

More About This Work

Academic Units
Computer Science
Published Here
April 29, 2013
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.