A Perceptual Representation of Audio

Daniel P. W. Ellis

A Perceptual Representation of Audio
Ellis, Daniel P. W.
Thesis Advisor(s):
Vercoe, Barry L.
Quatieri, Thomas F.
Master's theses
Electrical Engineering
Permanent URL:
M.S., Massachusetts Institute of Technology.
The human auditory system performs many remarkable feats; we only fully appreciate how sophisticated these are when we try to simulate them on a computer. Through building such computer models, we gain insight into perceptual processing in general, and develop useful new ways to analyze signals. This thesis describes a transformation of sound into a representation with various properties specifically oriented towards simulations of source separation. Source separation denotes the ability of listeners to perceive sound originating from a particular origin as separate from simultaneous interfering sounds. An example would be following the notes of a single instrument while listening to an orchestra. Using a cochlea-inspired filterbank and strategies of peak-picking and track-formation, the representation organizes time-frequency energy into distinct elements; these are argued to correspond to indivisible components of the perception. The elements contain information such as fine time structure which is important to perceptual quality and source separability. A high quality resynthesis method is described which gives good results even for modified representations. The performance and results of the analysis and synthesis methods are discussed, and the intended applications of the new domain are described in detail. This description also explains how the principles of source separation, as established by previous research in psychoacoustics, will be applied as the next step towards a fully functional source separator.
Item views:

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services.