2001 Articles
Combining bottom-up and top-down constraints to achieve robust ASR: The multisource decoder
Recognising speech in the presence of non-stationary noise presents a great challenge. Missing data techniques allow recognition based on a subset of features which reflect the speech and not the interference, but identifying these valid features is difficult. Rather than relying only on low-level signal features to locate the target (such as energy relative to an estimated noise floor), we can also employ the top-down constraints of the speech models to eliminate candidate target fragments that have a low likelihood of resembling the training set. The multisource decoder makes a simultaneous search in fragment-labelling space (target or interference) and word-string space, to find the most likely overall solution. When testing on the Aurora 2 task, this algorithm achieves up to 20% relative word error rate reduction in nonstationary noise conditions at low SNR.
Files
- crac01-msdec.pdf application/pdf 92.1 KB Download File
Also Published In
- Title
- Consistent & Reliable Acoustic Cues for Sound Analysis: One-day Workshop: Aalborg, Denmark, Sunday, September 2nd, 2001
- Publisher
- Department of Electrical Engineering, Columbia University
More About This Work
- Academic Units
- Electrical Engineering
- Published Here
- July 2, 2012