Underconstrained stochastic representations for top-down computational auditory scene analysis

Ellis, Daniel P. W.

I propose a structure for the first stage of a computer system capable of performing complex auditory scene analysis similar to that accomplished by human listeners. This structure contains the following innovations over previous approaches: (1) Sound is represented as discrete elements drawn from an overcomplete vocabulary encompassing both tonal and less structured sounds, designed to highlight the interdependence in the acoustic energy. (2) Through the redundancy of the basis this analysis permits and indeed requires the imposition of additional constraints, which provides for the incorporation of top-down or context-sensitive factors. (3) A modular architecture operates on an analysis-by-synthesis principle, where processes are invoked until the representation adequately accounts for the observed sound. A common goodness-of-fit criterion allows for future expansion of the system with new explanation rules, new representational elements and more abstract levels of analysis. Some initial results of applying these ideas to scenes consisting of noise bursts and dense environmental sound are presented.


Also Published In

1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, October 15-18, Mohonk Mountain House, New Paltz, New York

More About This Work

Academic Units
Electrical Engineering
Published Here
July 3, 2012