Academic Commons


Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Ellis, Daniel P. W.

Computational auditory scene analysis — modeling the human ability to organize sound mixtures according to their sources — has experienced a rapid evolution from simple implementations of psychoacoustically inspired rules to complex systems able to process demanding real-world sounds. Phenomena such as the continuity illusion and phonemic restoration show that the brain is able to use a wide range of knowledge-based contextual constraints when interpreting obscured or overlapping mixtures: To model such processing, we need architectures that operate by confirming hypotheses about the observations rather than relying on directly extracted descriptions. One such architecture, the 'prediction-driven' approach, is presented along with results from its initial implementation. This architecture can be extended to take advantage of the high-level knowledge implicit in today's speech recognizers by modifying a recognizer to act as one of the 'component models' providing the explanations of the signal mixture. A preliminary investigation indicates the viability of this approach while at the same time raising a number of issues which are discussed. These results point to the conclusion that successful scene analysis must, at every level, exploit abstract knowledge about sound sources.


  • thumnail for S0167-6393_98_00083-1.pdf S0167-6393_98_00083-1.pdf application/x-pdf 218 KB Download File

Also Published In

Speech Communication

More About This Work

Academic Units
Electrical Engineering
Published Here
February 14, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.