2006 Articles
Estimating Single-Channel Source Separation Masks: Relevance Vector Machine Classifiers vs. Pitch-Based Masking
Audio sources frequently concentrate much of their energy into a relatively small proportion of the available time-frequency cells in a short-time Fourier transform (STFT). This sparsity makes it possible to separate sources, to some degree, simply by selecting STFT cells dominated by the desired source, setting all others to zero (or to an estimate of the obscured target value), and inverting the STFT to a waveform. The problem of source separation then becomes identifying the cells containing good target information. We treat this as a classification problem, and train a Relevance Vector Machine (a probabilistic relative of the Support Vector Machine) to perform this task. We compare the performance of this classifier both against SVMs (it has similar accuracy but is not as efficient as RVMs), and against a traditional Computational Auditory Scene Analysis (CASA) technique based on a noise-robust pitch tracker, which the RVM outperforms significantly. Differences between the RVM- and pitch-tracker-based mask estimation suggest benefits to be obtained by combining both.
Subjects
Files
- WeissE06-rvm.pdf application/pdf 339 KB Download File
Also Published In
- Title
- ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition: SAPA2006: 16 September 2006, Pittsburgh, PA
- Publisher
- International Speech Communication Association
More About This Work
- Academic Units
- Electrical Engineering
- Published Here
- June 27, 2012