Academic Commons


Feature extraction using non-linear transformation for robust speech recognition on the Aurora database

Sharma, Sangita; Ellis, Daniel P. W.; Kajarekar, Sachin; Jain, Pratibha; Hermansky, Hynek

We evaluate the performance of several feature sets on the Aurora task as defined by ETSI. We show that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system. The non-linear transformation is computed using a neural network which is discriminatively trained on the phonetically labeled (forcibly aligned) training data. A combination of the non-linearly transformed PLP (perceptive linear predictive coefficients), MSG (modulation filtered spectrogram) and TRAP (temporal pattern) features yields a 63% improvement in error rate as compared to baseline me frequency cepstral coefficients features. The use of the non-linearly transformed RASTA-like features, with system parameters scaled down to take into account the ETSI imposed memory and latency constraints, still yields a 40% improvement in error rate.


Also Published In

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing: Proceedings, 5-9 June, 2000, Hilton Hotel and Convention Center, Istanbul, Turkey

More About This Work

Academic Units
Electrical Engineering
Published Here
July 3, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.