Academic Commons

Articles

Underconstrained stochastic representations for top-down computational auditory scene analysis

Ellis, Daniel P. W.

I propose a structure for the first stage of a computer system capable of performing complex auditory scene analysis similar to that accomplished by human listeners. This structure contains the following innovations over previous approaches: (1) Sound is represented as discrete elements drawn from an overcomplete vocabulary encompassing both tonal and less structured sounds, designed to highlight the interdependence in the acoustic energy. (2) Through the redundancy of the basis this analysis permits and indeed requires the imposition of additional constraints, which provides for the incorporation of top-down or context-sensitive factors. (3) A modular architecture operates on an analysis-by-synthesis principle, where processes are invoked until the representation adequately accounts for the observed sound. A common goodness-of-fit criterion allows for future expansion of the system with new explanation rules, new representational elements and more abstract levels of analysis. Some initial results of applying these ideas to scenes consisting of noise bursts and dense environmental sound are presented.

Files

Also Published In

Title
1995 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, October 15-18, Mohonk Mountain House, New Paltz, New York
DOI
https://doi.org/10.1109/ASPAA.1995.482909

More About This Work

Academic Units
Electrical Engineering
Publisher
IEEE
Published Here
July 3, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.