Academic Commons

Articles

Error visualization for tandem acoustic modeling on the Aurora task

Reyes-Gomez, Manuel; Ellis, Daniel P. W.

Tandem acoustic modeling consists of taking the outputs of a neural network discriminantly trained to estimate the phone-class posterior probabilities of speech, and using them as the input features of a conventional distribution-modeling Gaussian mixture model (GMM) speech recognizer, thereby employing two acoustic models in tandem. This structure reduces the error rate on the Aurora 2 noisy English digits task in more than 50% compared to the HTK baseline. Even though there are some reasonable hypothesis to explain this improvement, the origins are still unclear. This paper introduces the use of visualization tools for error analysis of some variations of the tandem system. The error behavior is first analyzed using word-level confusion matrices. Posteriorgrams (displays of the variation in time of per-phone posterior probabilities) provide for further analysis. The results of corroborate our previous hypothesis that the gains from tandem modeling arise from the very different training and modeling schemes of the two acoustic models.

Files

Also Published In

Title
ICASSP 2002

More About This Work

Academic Units
Electrical Engineering
Publisher
International Conference on Acoustics, Speech, and Signal Processing
Published Here
July 2, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.