Combining bottom-up and top-down constraints to achieve robust ASR: The multisource decoder

Barker, Jon; Cooke, Martin; Ellis, Daniel P. W.

Recognising speech in the presence of non-stationary noise presents a great challenge. Missing data techniques allow recognition based on a subset of features which reflect the speech and not the interference, but identifying these valid features is difficult. Rather than relying only on low-level signal features to locate the target (such as energy relative to an estimated noise floor), we can also employ the top-down constraints of the speech models to eliminate candidate target fragments that have a low likelihood of resembling the training set. The multisource decoder makes a simultaneous search in fragment-labelling space (target or interference) and word-string space, to find the most likely overall solution. When testing on the Aurora 2 task, this algorithm achieves up to 20% relative word error rate reduction in nonstationary noise conditions at low SNR.


Also Published In

Consistent & Reliable Acoustic Cues for Sound Analysis: One-day Workshop: Aalborg, Denmark, Sunday, September 2nd, 2001
Department of Electrical Engineering, Columbia University

More About This Work

Academic Units
Electrical Engineering
Published Here
July 2, 2012