Computational auditory scene analysis exploiting speech-recognition knowledge

Ellis, Daniel P. W.

The field of computational auditory scene analysis (CASA) strives to build computer models of the human ability to interpret sound mixtures as the combination of distinct sources. A major obstacle to this enterprise is defining and incorporating the kind of high level knowledge of real-world signal structure exploited by listeners. Speech recognition, while typically ignoring the problem of nonspeech inclusions, has been very successful at deriving powerful statistical models of speech structure from training data. In this paper, we describe a scene analysis system that includes both speech and nonspeech components, addressing the problem of working backwards from speech recognizer output to estimate the speech component of a mixture. Ultimately, such hybrid approaches will require more radical adaptation of current speech recognition approaches.


Also Published In

1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics: October 19-22, Mohonk Mountain House, New Paltz, New York

More About This Work

Academic Units
Electrical Engineering
Published Here
July 3, 2012