Academic Commons

Articles

Pushing the Envelope—Aside

Morgan, Nelson; Zhu, Qifeng; Stolcke, Andreas; Sönmez, Kemal; Sivadas, Sunil; Shinozaki, Takahiro; Ostendorf, Mari; Jain, Pratibha; Hermansky, Hynek; Ellis, Daniel P. W.; Doddington, George; Chen, Barry; Çetin, Özgür; Bourlard, Hervé; Athineos, Marios

Despite successes, there are still significant limitations to speech recognition performance, particularly for conversational speech and/or for speech with significant acoustic degradations from noise or reverberation. For this reason, authors have proposed methods that incorporate different (and larger) analysis windows, which are described in this article. Note in passing that we and many others have already taken advantage of processing techniques that incorporate information over long time ranges, for instance for normalization (by cepstral mean subtraction as stated in B. Atal (1974) or relative spectral analysis (RASTA) based in H. Hermansky and N. Morgan (1994)). They also have proposed features that are based on speech sound class posterior probabilities, which have good properties for both classification and stream combination.

Files

Also Published In

Title
IEEE Signal Processing Magazine
DOI
https://doi.org/10.1109/MSP.2005.1511826

More About This Work

Academic Units
Electrical Engineering
Published Here
February 15, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.