Academic Commons

Presentations (Communicative Events)

On-the-fly Topic Adaptation for YouTube Video Transcription

Thadani, Kapil; Biadsy, Fadi; Bikel, Daniel

Automatic closed-captioning of video is a useful application of speech recognition technology but poses numerous challenges when applied to open-domain user-uploaded videos such as those on YouTube. In this work, we explore a strategy to improve decoding accuracy for video transcription by decoding each video with a language model (LM) adapted specifically to the topics that the video covers. Taxonomic topic classifiers are used to determine the topic content of videos and to build a large set of topic-specific LMs from web documents. We consider strategies for selecting and interpolating LMs in both supervised and unsupervised scenarios in a two-pass lattice rescoring framework. Experiments on a YouTube video corpus show a 3.6 absolute reduction in WER over generic single-pass transcriptions as well as a statistically significant 0.8 absolute improvement over rescoring with a very large non-adapted LM built from all the documents.



More About This Work

Academic Units
Computer Science
Published Here
April 26, 2013
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.