2001 Articles
Tandem acoustic modeling in large-vocabulary recognition
In the tandem approach to modeling the acoustic signal, a neural-net preprocessor is first discriminatively trained to estimate posterior probabilities across a phone set. These are then used as feature inputs for a conventional hidden Markov model (HMM) based speech recognizer, which relearns the associations to subword units. We apply the tandem approach to the data provided for the first Speech in Noisy Environments (SPINE1) evaluation conducted by the Naval Research Laboratory (NRL) in August 2000. In our previous experience with the ETSI Aurora noisy digits (a small-vocabulary, high-noise task) the tandem approach achieved error-rate reductions of over 50% relative to the HMM baseline. For SPINE1, a larger task involving more spontaneous speech, we find that, when context-independent models are used, the tandem features continue to result in large reductions in word-error rates relative to those achieved by systems using standard MFC or PLP features. However, these improvements do not carry over to context-dependent models. This may be attributable to several factors which are discussed in the paper.
Subjects
Files
-
Tandem_acoustic_modeling_in_large-vocabulary_recognition.pdf application/pdf 249 KB Download File
Also Published In
- Title
- 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: 7-11 May, 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA
- Publisher
- IEEE
- DOI
- https://doi.org/10.1109/ICASSP.2001.940881
More About This Work
- Academic Units
- Electrical Engineering
- Published Here
- July 2, 2012