Presentations (Communicative Events)

Consumer Video Understanding: A Benchmark Database and An Evaluation of Human and Machine Performance

Ellis, Daniel P. W.; Jiang, Yu Gang; Ye, Guangnan; Chang, Shih-Fu; Loui, Alexander C.

Recognizing visual content in unconstrained videos has become a very important problem for many applications. Existingcorpora for video analysis lack scale and/or content diversity,and thus limited the needed progress in this critical area. In this paper, we describe and release a new database called CCV, containing 9,317 web videos over 20 semantic categories, including events like “baseball” and “parade”, scenes like “beach”, and objects like “cat”. The database was collected with extra care to ensure relevance to consumer interest and originality of video content without post-editing. Such videos typically have very little textual annotation and thus can benefit from the development of automatic content
analysis techniques. We used Amazon MTurk platform to perform manual annotation, and studied the behaviors and performance of human annotators on MTurk.We also compared the abilities in understanding consumer video content by humans and machines. For the latter, we implemented automatic classifiers using state-of-the-art multi-modal approach that achieved top performance in recent TRECVID multimedia event detection task. Results confirmed classifiers fusing audio and video features significantly outperform single-modality solutions. We also found that humans are much better at understanding categories of nonrigid objects such as “cat”, while current automatic techniques are relatively close to humans in recognizing categories that have distinctive background scenes or audio patterns.


Also Published In

Proceedings of ACM International Conference on Multimedia Retrieval 2011

More About This Work

Academic Units
Electrical Engineering
Published Here
April 19, 2013