Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Ellis, Daniel P. W.

Computational auditory scene analysis — modeling the human ability to organize sound mixtures according to their sources — has experienced a rapid evolution from simple implementations of psychoacoustically inspired rules to complex systems able to process demanding real-world sounds. Phenomena such as the continuity illusion and phonemic restoration show that the brain is able to use a wide range of knowledge-based contextual constraints when interpreting obscured or overlapping mixtures: To model such processing, we need architectures that operate by confirming hypotheses about the observations rather than relying on directly extracted descriptions. One such architecture, the 'prediction-driven' approach, is presented along with results from its initial implementation. This architecture can be extended to take advantage of the high-level knowledge implicit in today's speech recognizers by modifying a recognizer to act as one of the 'component models' providing the explanations of the signal mixture. A preliminary investigation indicates the viability of this approach while at the same time raising a number of issues which are discussed. These results point to the conclusion that successful scene analysis must, at every level, exploit abstract knowledge about sound sources.


  • thumnail for S0167-6393_98_00083-1.pdf S0167-6393_98_00083-1.pdf application/x-pdf 218 KB Download File

Also Published In

Speech Communication

More About This Work

Academic Units
Electrical Engineering
Published Here
February 14, 2012