Academic Commons


Soundtrack classification by transient events

Cotton, Courtenay Valentine; Ellis, Daniel P. W.; Loui, Alexander C.

We present a method for video classification based on information in the soundtrack. Unlike previous approaches which describe the audio via statistics of mel-frequency cepstral coefficient (MFCC) features calculated on uniformly-spaced frames, we investigate an approach to focusing our representation on audio transients corresponding to sound-track events. These event-related features can reflect the "foreground" of the soundtrack and capture its short-term temporal structure better than conventional frame-based statistics. We evaluate our method on a test set of 1873 YouTube videos labeled with 25 semantic concepts. Retrieval results based on transient features alone are comparable to an MFCC-based system, and fusing the two representations achieves a relative improvement of 7.5% in mean average precision (MAP).


Also Published In

2011 IEEE International Conference on Acoustics, Speech, and Signal Processing: Proceedings: May 22-27, 2011 Prague Congress Center, Prague, Czech Republic

More About This Work

Academic Units
Electrical Engineering
Published Here
June 25, 2012
Academic Commons provides global access to research and scholarship produced at Columbia University, Barnard College, Teachers College, Union Theological Seminary and Jewish Theological Seminary. Academic Commons is managed by the Columbia University Libraries.