Academic Commons

Presentations (Communicative Events)

Varying Input Segmentation for Story Boundary Detection in English, Arabic and Mandarin Broadcast News: Presentation Powerpoint Slides

Hirschberg, Julia Bell; Rosenberg, Andrew; Sharifi, Mehrbod

Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent processes such as question answering and information retrieval. In previous work, a decision tree trained on automatically extracted lexical and acoustic features was trained to predict story boundaries, using hypothesized sentence boundaries to define potential story boundaries. In this paper, we empirically evaluate several alternatives to choice of segmentation on three languages: English, Mandarin and Arabic. Our results suggest that the best performance can be achieved by using 250ms pause-based segmentation or sentence boundaries determined using a very low confidence score threshold.

Files

  • thumnail for IS07-Rosenberg-a.ppt IS07-Rosenberg-a.ppt application/vnd.ms-powerpoint 2.26 MB Download File

More About This Work

Academic Units
Computer Science
Publisher
Proceedings of Interspeech 2007
Published Here
August 7, 2013

Notes

Presentation paper is available at http://hdl.handle.net/10022/AC:P:21140

Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.