2009 Presentations (Communicative Events)
Restoring Punctuation and Capitalization in Transcribed Speech
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-basedn-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to
55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much.
Subjects
Files
- gravano_al_09.pdf application/pdf 544 KB Download File
More About This Work
- Academic Units
- Computer Science
- Published Here
- April 29, 2013