Academic Commons


Combining bottom-up and top-down constraints to achieve robust ASR: The multisource decoder

Barker, Jon; Cooke, Martin; Ellis, Daniel P. W.

Recognising speech in the presence of non-stationary noise presents a great challenge. Missing data techniques allow recognition based on a subset of features which reflect the speech and not the interference, but identifying these valid features is difficult. Rather than relying only on low-level signal features to locate the target (such as energy relative to an estimated noise floor), we can also employ the top-down constraints of the speech models to eliminate candidate target fragments that have a low likelihood of resembling the training set. The multisource decoder makes a simultaneous search in fragment-labelling space (target or interference) and word-string space, to find the most likely overall solution. When testing on the Aurora 2 task, this algorithm achieves up to 20% relative word error rate reduction in nonstationary noise conditions at low SNR.


Also Published In

Consistent & Reliable Acoustic Cues for Sound Analysis: One-day Workshop: Aalborg, Denmark, Sunday, September 2nd, 2001

More About This Work

Academic Units
Electrical Engineering
Department of Electrical Engineering, Columbia University
Published Here
July 2, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.