Academic Commons

Theses Doctoral

Noise Robust Pitch Tracking by Subband Autocorrelation Classification

Lee, Byung Suk

Speech pitch tracking is one of the elementary tasks of the Computational Auditory Scene Analysis (CASA). While a human can easily listen to the voiced pitch in highly noisy recordings, the performance of automatic speech pitch tracking degrades in unknown noisy audio conditions. Traditional pitch trackers use either autocorrelation or the Fourier transform to calculate periodicity, which works well for clean recordings. For noisy recordings, however, the accuracy of these pitch trackers degrades in general. For example, the information in parts of the frequency spectrum may be lost due to analog radio band transmission and/or contain additive noise of various kinds. Instead of explicitly using the most obvious features of autocorrelation, we propose a trained classier-based approach, which we call Subband Autocorrelation Classification (SAcC). A multi-layer perceptron (MLP) classier is trained on the principal components of the autocorrelations of subbands from an auditory filterbank. The output of the MLP classifier is temporally smoothed to produce the pitch track by finding the Viterbi path of a Hidden Markov Model (HMM). Training on various types of noisy speech recordings leads to a great increase in performance over state-of-the-art algorithms, according to both the traditional Gross Pitch Error (GPE) measure, and a proposed novel Pitch Tracking Error (PTE) which more fully reflects the accuracy of both pitch estimation/extraction and voicing detection in a single measure. To verify the generalization and specificity of SAcC, we test SAcC on a real world problem that has a large-scale noisy speech corpus. The data is from the DARPA Robust Automatic Transcription of Speech (RATS) program. The experiments on the performance evaluation of SAcC pitch tracking confirm the generalization power of SAcC across various unknown noise conditions and distinct speech corpora. We also report the use of SAcC output adds a significant improvement to a Speaker Identification (SID) system for RATS as well, suggesting the potential contribution of SAcC pitch tracking in the higher-level tasks.

Files

  • thumnail for Lee_columbia_0054D_11028.pdf Lee_columbia_0054D_11028.pdf application/x-pdf 1.8 MB Download File

More About This Work

Academic Units
Electrical Engineering
Thesis Advisors
Ellis, Daniel P. W.
Degree
Ph.D., Columbia University
Published Here
October 15, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.