Spectral vs. spectro-temporal features for acoustic event detection

Cotton, Courtenay Valentine; Ellis, Daniel P. W.

Automatic detection of different types of acoustic events is an interesting problem in soundtrack processing. Typical approaches to the problem use short-term spectral features to describe the audio signal, with additional modeling on top to take temporal context into account. We propose an approach to detecting and modeling acoustic events that directly describes temporal context, using convolutive non-negative matrix factorization (NMF). NMF is useful for finding parts-based decompositions of data; here it is used to discover a set of spectro-temporal patch bases that best describe the data, with the patches corresponding to event-like structures. We derive features from the activations of these patch bases, and perform event detection on a database consisting of 16 classes of meeting-room acoustic events. We compare our approach with a baseline using standard short-term mel frequency cepstal coefficient (MFCC) features. We demonstrate that the event-based system is more robust in the presence of added noise than the MFCC-based system, and that a combination of the two systems performs even better than either individually.


  • thumnail for CottonE11-spectrotemporal.pdf CottonE11-spectrotemporal.pdf application/pdf 135 KB Download File

Also Published In

2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics: Proceedings: October 16-19, 2011, Mohonk Mountain House, New Paltz, NY, USA

More About This Work

Academic Units
Electrical Engineering
Published Here
June 25, 2012