Error visualization for tandem acoustic modeling on the Aurora task

Reyes-Gomez, Manuel; Ellis, Daniel P. W.

Tandem acoustic modeling consists of taking the outputs of a neural network discriminantly trained to estimate the phone-class posterior probabilities of speech, and using them as the input features of a conventional distribution-modeling Gaussian mixture model (GMM) speech recognizer, thereby employing two acoustic models in tandem. This structure reduces the error rate on the Aurora 2 noisy English digits task in more than 50% compared to the HTK baseline. Even though there are some reasonable hypothesis to explain this improvement, the origins are still unclear. This paper introduces the use of visualization tools for error analysis of some variations of the tandem system. The error behavior is first analyzed using word-level confusion matrices. Posteriorgrams (displays of the variation in time of per-phone posterior probabilities) provide for further analysis. The results of corroborate our previous hypothesis that the gains from tandem modeling arise from the very different training and modeling schemes of the two acoustic models.


Also Published In

International Conference on Acoustics, Speech, and Signal Processing

More About This Work

Academic Units
Electrical Engineering
Published Here
July 2, 2012