Academic Commons Search Results
http://academiccommons.columbia.edu/catalog.rss?f%5Bauthor_facet%5D%5B%5D=Sharma%2C+Sangita&f%5Bdepartment_facet%5D%5B%5D=Electrical+Engineering&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usTandem connectionist feature stream extraction for conventional HMM systems
http://academiccommons.columbia.edu/catalog/ac:148941
Hermansky, Hynek; Ellis, Daniel P. W.; Sharma, Sangitahttp://hdl.handle.net/10022/AC:P:13821Tue, 03 Jul 2012 00:00:00 +0000Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. By training the network to generate the subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, we achieve relative error rate reductions of 35% or more on the multicondition Aurora noisy continuous digits taskElectrical engineering, Applied mathematicsde171Electrical EngineeringArticlesFeature extraction using non-linear transformation for robust speech recognition on the Aurora database
http://academiccommons.columbia.edu/catalog/ac:148944
Sharma, Sangita; Ellis, Daniel P. W.; Kajarekar, Sachin; Jain, Pratibha; Hermansky, Hynekhttp://hdl.handle.net/10022/AC:P:13822Tue, 03 Jul 2012 00:00:00 +0000We evaluate the performance of several feature sets on the Aurora task as defined by ETSI. We show that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system. The non-linear transformation is computed using a neural network which is discriminatively trained on the phonetically labeled (forcibly aligned) training data. A combination of the non-linearly transformed PLP (perceptive linear predictive coefficients), MSG (modulation filtered spectrogram) and TRAP (temporal pattern) features yields a 63% improvement in error rate as compared to baseline me frequency cepstral coefficients features. The use of the non-linearly transformed RASTA-like features, with system parameters scaled down to take into account the ETSI imposed memory and latency constraints, still yields a 40% improvement in error rate.Electrical engineering, Artificial intelligencede171Electrical EngineeringArticles