ABSTRACT
Music video is a popular type of entertainment by viewers. Currently, the novel indexing and retrieval approach based on the affective cues contained in music videos becomes more and more attractive to users. Music video affective analysis and understanding is one of the most popular topics in current multimedia community. In this paper, we propose a novel feature importance analysis approach to select most representative arousal and valence features for arousal and valence modeling. Compared with state-of-the-art work by Zhang on music video affective analysis, our main contributions are in the following aspects: (1) Another 3 affect-related features are extracted to enrich the feature set and exploit their correlation with arousal and valence. (2) All extracted features are ordered via feature importance analysis, and then optimal feature subset is selected after ordering. (3) Different regression methods are compared for arousal and valence modeling in order to find the fittest estimation function. Our method achieves 33.39% and 42.17% deduction in terms of mean absolute error compared with Zhang's method. Experimental results demonstrate our proposed method has a considerable improvement on music video affective understanding.
- S. Arifin and P. Y. Cheung. User attention based arousal content modeling. In Proceedings of IEEE International Conference on Image Processing (ICIP), pages 433--436, March 2006.Google ScholarCross Ref
- S. Arifin and P. Y. Cheung. Affective level video segmentation by utilizing the pleasure-arousal-dominance information. IEEE Transactions on Multimedia, 10(7):1325--1341, November 2008. Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, New York, 2001. Google ScholarDigital Library
- S. R. Gunn. Support vector machines for classification and regression. Technical report, Image Speech and Intelligent Systems Research Group, University of Southampton, U.K., 1998.Google Scholar
- A. Hanjalic and L.-Q. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1):143--154, February 2005. Google ScholarDigital Library
- I. Jolliffe. Principle Component Analysis. Springer-Verlag, New York, 1986.Google ScholarCross Ref
- D. Li, I. K. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22:533--544, 2001. Google ScholarDigital Library
- L. Lu, D. Liu, and H.-J. Zhang. Automatic mood detection and tracking of music audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):5--18, January 2006. Google ScholarDigital Library
- M. M. Ruxanda, B. Y. Chua, Alexandros, and C. S. Jensen. Emotion-based music retrieval on a well-reduced audio feature space. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 181--184, April 2009. Google ScholarDigital Library
- M. Soleymani, G. Chanel, J. J. Kierkels, and T. Pun. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In Proceedings of the Tenth IEEE International Symposium on Multimedia, pages 228--235, 2008. Google ScholarDigital Library
- K. Sun, J. Yu, Y. Huang, and X. Hu. An improved valence-arousal emotion space for video affective content representation and recognition. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME), pages 566--569, 2009. Google ScholarDigital Library
- P. Valdez and A. Mehrabian. Effects of color on emotions. Journal of Experimental Psychology, 123:394--409, 1994.Google ScholarCross Ref
- V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, Inc., New York, 1998.Google ScholarDigital Library
- H. L. Wang and L.-F. Cheong. Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):689--704, June 2006. Google ScholarDigital Library
- S. Weisberg. Applied Linear Regression. Wiley/Interscience, New York, 2005.Google ScholarCross Ref
- M. Xu, J. S. Jin, S. Luo, and L. Duan. Hierarchical movie affective content analysis based on arousal and valence features. In Proceedings of ACM Multimedia, pages 677--680, 2008. Google ScholarDigital Library
- S. Zhang, Q. Huang, Q. Tian, S. Jiang, and W. Gao. i. mtv - an integrated system for mtv affective analysis. In Proceedings of ACM Multimedia (demenstration), pages 985--986, 2008. Google ScholarDigital Library
- S. Zhang, Q. Huang, Q. Tian, S. Jiang, and W. Gao. Personalized mtv affective analysis using user profile. In Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing, pages 327--337, 2008. Google ScholarDigital Library
- S. Zhang, Q. Tian, Q. Huang, W. Gao, and S. Li. Utilizing affective analysis for effective movie browsing. In Proceedings of IEEE International Conference on Image Processing (ICIP), pages 677--680, 2009. Google ScholarDigital Library
- S. Zhang, Q. Tian, S. Jiang, Q. Huang, and W. Gao. Affective mtv analysis based on arousal and valence features. In Proceedings of IEEE International Conference on Multimedia and Expo (ICME), pages 1369--1372, 2008.Google Scholar
- T. Zhang and C.-C. J. Kuo. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9(4):441--457, May 2001.Google ScholarCross Ref
Index Terms
- Music video affective understanding using feature importance analysis
Recommendations
Hierarchical movie affective content analysis based on arousal and valence features
MM '08: Proceedings of the 16th ACM international conference on MultimediaEmotional factors directly reflect audiences' attention, evaluation and memory. Affective contents analysis not only create an index for users to access their interested movie segments, but also provide feasible entry for video highlights. Most of the ...
Affective content analysis of music video clips
MIRUM '11: Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategiesNowadays, the amount of multimedia contents is explosively increasing and it is often a challenging problem to find a content that will be appealing or matches users' current mood or affective state. In order to achieve this goal, an effcient indexing ...
Automatic highlights extraction for drama video using music emotion and human face features
This paper describes a novel system that uses music emotion and human face as features for automatic highlights extraction for drama video. These high-level audiovisual features are used because music evokes emotion response from the viewer and ...
Comments