Speech/music discrimination based on posterior probability features
Gethin Williams; Daniel P. W. Ellis
- Speech/music discrimination based on posterior probability features
Ellis, Daniel P. W.
- Electrical Engineering
- Permanent URL:
- Book/Journal Title:
- Eurospeech 99: 6th European Conference on Speech Communication and Technology: Budapest, Hungary, September 5-9, 1999
- Publisher Location:
- A hybrid connectionist-HMM speech recognizer uses a neural network acoustic classifier. This network estimates the posterior probability that the acoustic feature vectors at the current time step should be labelled as each of around 50 phone classes. We sought to exploit informal observations of the distinctions in this posterior domain between nonspeech audio and speech segments well-modeled by the network. We describe four statistics that successfully capture these differences, and which can be combined to make a reliable speech/nonspeech categorization that is closely related to the likely performance of the speech recognizer. We test these features on a database of speech/music examples, and our results match the previously-reported classification error, based on a variety of special-purpose features, of 1.4% for 2.5 second segments. We also show that recognizing segments ordered according to their resemblance to clean speech can result in an error rate close to the ideal minimum over all such subsetting strategies.
- Item views: