Speech/music discrimination based on posterior probability features

Gethin Williams; Daniel P. W. Ellis

Speech/music discrimination based on posterior probability features
Williams, Gethin
Ellis, Daniel P. W.
Electrical Engineering
Permanent URL:
Book/Journal Title:
Eurospeech 99: 6th European Conference on Speech Communication and Technology: Budapest, Hungary, September 5-9, 1999
A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled by the network. We describe four statistics that successfully capture these differences, and which can be combined to make a reliable speech/nonspeech categorization that is closely related to the likely performance of the speech recognizer. We test these features on a database of speech/music examples, and our results match the previously-reported classification error, based on a variety of special-purpose features, of 1.4% for 2.5 second segments. We also show that recognizing segments ordered according to their resemblance to clean speech can result in an error rate close to the ideal minimum over all such subsetting strategies.
Electrical engineering
Artificial intelligence
Item views:
text | xml
Suggested Citation:
Gethin Williams, Daniel P. W. Ellis, 1999, Speech/music discrimination based on posterior probability features, Columbia University Academic Commons, http://hdl.handle.net/10022/AC:P:13833.

In Partnership with the Center for Digital Research and Scholarship at Columbia University Libraries/Information Services | Terms of Use