Decoding speech in the presence of other sound sources

Barker, Jon; Cooke, Martin; Ellis, Daniel P. W.

Conventional speech recognition is notoriously vulnerable to additive noise, and even the best compensation methods are defeated if the noise is nonstationary. To address this problem, we propose a new integration of bottom-up techniques to identify 'coherent fragments' of spectro-temporal energy (based on local features), with the top-down hypothesis search of conventional speech recognition, extended to search also across possible assignments of each fragment as speech or interference. Initial tests demonstrate the feasibility of this approach, and achieve a reduction in word error rate of more than 25% relative at 5 dB SNR over stationary noise missing data recognition.


Also Published In

6th International Conference on Spoken Language Processing:
ICSLP 2000, the proceedings of the conference, Oct. 16-Oct. 20, 2000, Beijing International Convention Center, Beijing, China
China Military Friendship Publish

More About This Work

Academic Units
Electrical Engineering
Published Here
July 3, 2012