Articles

Editorial: Special Section on Statistical and Perceptual Audio Processing

Ellis, Daniel P. W.; Raj, Bhiksha; Brown, Judith C.; Slaney, Malcolm; Smaragdis, Paris

Human perception has always been an inspiration for automatic processing systems, not least because tasks such as speech recognition only exist because people do them—and, indeed, without that example we might wonder if they were possible at all. As computational power grows, we have increasing opportunities to model and duplicate perceptual abilities with greater fidelity, and, most importantly, based on larger and larger amounts of raw data describing both what signals exist in the real world, and how people respond to them. The power to deal with large data sets has meant that approaches that were once mere theoretical possibilities, such as exhaustive search of exponentially-sized codebooks, or real-time direct convolution of long sequences, have become increasingly practical and even unremarkable. A major consequence of this is the growth of statistical or corpus-based approaches, where complex relations, discriminations, or structures are inferred directly from example data (for instance by optimizing the parameters of a very general algorithm). An increasing number of complex tasks can be given empirically optimal solutions based on large, representative datasets. The traditional idea of perceptually-inspired processing is to develop a machine algorithm for a complex task such as melody recognition or source separation through inspiration and introspection about how individuals perform the task, and on the basis of direct psychological or neurophysiological data. The results can appear to be at odds with the statistical perspective, since perceptually-motivated work is often ad-hoc, comprising many stages whose individual contributions are difficult to separate. We believe that it is important to unify these two approaches: to employ rigorous, exhaustive techniques taking advantage of the statistics of large data sets to develop and solve perceptually-based and subjectively-defined problems. With this in mind, we organized a one-day workshop on Statistical and Perceptual Audio Processing as a satellite to the International Conference on Spoken Language Processing (ICSLP-INTERSPEECH), held in Jeju, Korea, in September 2004.

Files

Also Published In

Title
IEEE Transactions on Audio, Speech, and Language Processing
DOI
https://doi.org/10.1109/TSA.2005.862700

More About This Work

Academic Units
Electrical Engineering
Published Here
February 15, 2012