ABSTRACT
Computer-assisted violin tutoring requires accurate violin transcription. For pitched non-percussive (PNP) sounds such as from the violin, note segmentation is a much more difficult task than pitch detection. This issue is accentuated when the audio is recorded during an instrument practice session at home which is acoustically inferior to a professional recording studio. This paper presents a new approach to the problem by using the correlation between different media streams for e-learning applications. We design a capture mechanism to record one audio and two video streams simultaneously, and exploit the relationships among them for enhanced transcription. State-of-the-art audio methods for note segmentation and pitch estimation are implemented as the audio-only baseline. Two web-cameras are employed to track the right hand (bowing) and the left hand's four fingers (fingering) on the fingerboard, respectively. The audio and visual information is then fused in the feature space. Our new approach is evaluated with an audio-visual violin music database containing 16 complete music pieces of different styles with 2157 notes in total. Experimental results show that our multimodal approach achieves a 10% increase in true positives, and a 8% reduction in false positives of overall transcription performance in comparison with the audio-only baseline.
- Yin J., Wang Y. and Hsu D., Digital Violin Tutor: An Integrated System for Beginning Violin Learners, ACM Multimedia Conf., 2005. Google ScholarDigital Library
- Perkins, D., Smart Schools: Better Thinking and Learning for Every Child, The Free Press, New York, 1992.Google Scholar
- Collins, N., A Comparison of Sound Onset Detection Algorithms with Emphasis on Psycho-Acoustically Motivated Detection Functions, Journal of the Audio Engineering Society, 2005.Google Scholar
- Bello, J. B., Daudet, L., Samer, A., Duxbury, C., Davies, M. and Sandler, M. B., A Tutorial on Onset Detection in Music Signals, IEEE Trans. on Speech and Audio Processing, pp: 1035--1047, 2005.Google Scholar
- Gillet O. and Richard G., Automatic Transcription of Drum Sequences using Audiovisual Features. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2005.Google ScholarCross Ref
- Baader A. P., Kazennikov O., and Wiesendanger M., Coordination of bowing and fingering in violin playing, Cognitive Brain Research, pp:436--1C443, 2005.Google ScholarCross Ref
- Nakamura, S., Statistcal Multimodal Integration for Audio-Visual Speech Processing, IEEE Trans. on Neural Networks, pp:854--866, 2002. Google ScholarDigital Library
- Garg, A., Potamianos, G., Neti, C., and Huang, T. S., Frame-dependent multi-stream reliability indicators for audio-visual speech recognition, Proc. IEEE Int. Conf. Multimedia and Expo (ICME03), pp:605--608, 2003. Google ScholarDigital Library
- Kaynak, M. N., Qi Z., Cheok, A. D., Sengupta, K., Zhang J., Ko C. C., Analysis of lip geometric features for audio-visual speech recognition, IEEE Trans. Systems, Man, and Cybernetics, pp:564--570, 2004. Google ScholarDigital Library
- Fragopanagos, N. and Taylor, J. G., Emotion recognition in human-computer interaction, Neural Network, pp:389--405, 2005. Google ScholarDigital Library
- Foote, J., Automatic Audio Segmentation Using a Measure of Audio Novelty, Proc. IEEE Int. Conf. Multimedia and Expo (ICME00), pp:452--455, 2000.Google Scholar
- Collins, N., Using a Pitch Detector for Onset Detection, Proc. of ISMIR2005, 2005.Google Scholar
- Boo W., Wang Y., Loscos A., A Violin Music Transcriber for Personalized Learning, IEEE Inter. Conf. on Multimedia Expo, 2006.Google Scholar
- Klapuri, A., A perceptually motivated multiple-f0 estimation method, Proc. IEEE Workshop on Applications of Audio Signal Processing to Audio and Acoustics, pp: 291--294, 2005.Google ScholarCross Ref
- Klapuri, A., Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes, Proc. of ISMIR2006, pp:216--221, 2006.Google Scholar
- Loscos A., Wang Y., Boo W., Low Level Descriptors for Automatic Violin Transcription, Proc. of ISMIR2006, 2006.Google Scholar
- Flesch C., Art of Violin Playing: Book One, Carl Fischer Music Dist, 2000.Google Scholar
- Letessier J. and Brard F., Visual tracking of bare fingers for interactive surfaces, Seventeenth Annual ACM Symposium on User Interface Software and Technology, pp:119--122, 2004. Google ScholarDigital Library
- Burns, A. and Wanderley, Visual methods for the retrieval of guitarist fingering, Proc. of the 2006 Conf. on New interfaces For Musical Expression, pp:196--199, 2006. Google ScholarDigital Library
- Wu Y., Lin J., Huang T. S., Analyzing and Capturing Articulated Hand Motion in Image Sequences, IEEE Trans. Pattern Anal. Mach. Intell, pp:1910--1922, 2005. Google ScholarDigital Library
- Leveau, P., Daudet, L., Richard G., Methodology and Tools for the Evaluation of Automatic Onset Detection Algorithms in Music, Proc. of ISMIR2004, pp:72--75, 2004.Google Scholar
Index Terms
- Educational violin transcription by fusing multimedia streams
Recommendations
Visual analysis of fingering for pedagogical violin transcription
MM '07: Proceedings of the 15th ACM international conference on MultimediaAutomatic music transcription, in spite of decades of research, remains a challenging research problem. The traditional audio-only approach has yet to achieve a satisfactory performance for any computer-aided pedagogical system. Inspired by the high ...
Event based transcription system for polyphonic piano music
Music transcription consists in transforming the musical content of audio data into a symbolic representation. The objective of this study is to investigate a transcription system for polyphonic piano, triggered by events corresponding to the played ...
Automatic Guitar Music Transcription
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and TechnologiesThis paper presents a system that helps in automatically generating guitar tablatures and musical scores based on musical audio data. Information gathered from the audio consists of pitch, onsets and durations, chords, and beat and tempo. Major issues ...
Comments