Tandem acoustic modeling in large-vocabulary recognition

Ellis, Daniel P. W.; Singh, Rita; Sivadas, Sunil

In the tandem approach to modeling the acoustic signal, a neural-net preprocessor is first discriminatively trained to estimate posterior probabilities across a phone set. These are then used as feature inputs for a conventional hidden Markov model (HMM) based speech recognizer, which relearns the associations to subword units. We apply the tandem approach to the data provided for the first Speech in Noisy Environments (SPINE1) evaluation conducted by the Naval Research Laboratory (NRL) in August 2000. In our previous experience with the ETSI Aurora noisy digits (a small-vocabulary, high-noise task) the tandem approach achieved error-rate reductions of over 50% relative to the HMM baseline. For SPINE1, a larger task involving more spontaneous speech, we find that, when context-independent models are used, the tandem features continue to result in large reductions in word-error rates relative to those achieved by systems using standard MFC or PLP features. However, these improvements do not carry over to context-dependent models. This may be attributable to several factors which are discussed in the paper.


  • thumnail for Tandem_acoustic_modeling_in_large-vocabulary_recognition.pdf Tandem_acoustic_modeling_in_large-vocabulary_recognition.pdf application/pdf 249 KB Download File

Also Published In

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: 7-11 May, 2001, Salt Palace Convention Center, Salt Lake City, Utah, USA

More About This Work

Academic Units
Electrical Engineering
Published Here
July 2, 2012