Detecting local semantic concepts in environmental sounds using Markov model based clustering

Lee, Keansub; Ellis, Daniel P. W.; Loui, Alexander C.

Detecting the time of occurrence of an acoustic event (for instance, a cheer) embedded in a longer soundtrack is useful and important for applications such as search and retrieval in consumer video archives. We present a Markov-model based clustering algorithm able to identify and segment consistent sets of temporal frames into regions associated with different ground-truth labels, and simultaneously to exclude a set of uninformative frames shared in common from all clips. The labels are provided at the clip level, so this refinement of the time axis represents a variant of Multiple-Instance Learning (MIL). Evaluation shows that local concepts are effectively detected by this clustering technique based on coarse-scale labels, and that detection performance is significantly better than existing algorithms for classifying real-world consumer recordings.


Also Published In

2010 IEEE International Conference on Acoustics, Speech, and Signal Processing: Proceedings: March 14-19, 2010, Sheraton Dallas Hotel, Dallas, Texas, U.S.A.

More About This Work

Academic Units
Electrical Engineering
Published Here
June 26, 2012