2003 Articles
Pitch-based emphasis detection for characterization of meeting recordings
The automatic extraction of key utterances in spoken data has emerged as an interesting and difficult topic in automatic speech recognition. "Emphasis" or "excitement" may be a useful identifier for these utterances of interest. We undertake the task of reliably and automatically identifying emphasized or excited utterances in natural speech in a meeting setting. We start by endeavoring to establish reliable ground truth emphasis labels by using several hand-labelers. The results show that human listeners can reliably identify emphasized utterances in meeting recordings. We then build an automatic emphasis detection system, which uses normalized pitch as its only acoustic predictor. The results show that this pitch-based emphasis detection scheme can distinguish between non-emphasized and emphasized utterances with an accuracy of 92% when ambiguous cases are excluded, a rate comparable to human interlabeler agreement.
Files
- asru03-emph.pdf application/pdf 60.3 KB Download File
Also Published In
- Title
- ASRU'03: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding : November 30-December [3], 2003
- Publisher
- IEEE
- DOI
- https://doi.org/10.1109/ASRU.2003.1318448
More About This Work
- Academic Units
- Electrical Engineering
- Published Here
- June 29, 2012