skip to main content
10.1145/1290144.1290154acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Educational violin transcription by fusing multimedia streams

Published:28 September 2007Publication History

ABSTRACT

Computer-assisted violin tutoring requires accurate violin transcription. For pitched non-percussive (PNP) sounds such as from the violin, note segmentation is a much more difficult task than pitch detection. This issue is accentuated when the audio is recorded during an instrument practice session at home which is acoustically inferior to a professional recording studio. This paper presents a new approach to the problem by using the correlation between different media streams for e-learning applications. We design a capture mechanism to record one audio and two video streams simultaneously, and exploit the relationships among them for enhanced transcription. State-of-the-art audio methods for note segmentation and pitch estimation are implemented as the audio-only baseline. Two web-cameras are employed to track the right hand (bowing) and the left hand's four fingers (fingering) on the fingerboard, respectively. The audio and visual information is then fused in the feature space. Our new approach is evaluated with an audio-visual violin music database containing 16 complete music pieces of different styles with 2157 notes in total. Experimental results show that our multimodal approach achieves a 10% increase in true positives, and a 8% reduction in false positives of overall transcription performance in comparison with the audio-only baseline.

References

  1. Yin J., Wang Y. and Hsu D., Digital Violin Tutor: An Integrated System for Beginning Violin Learners, ACM Multimedia Conf., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Perkins, D., Smart Schools: Better Thinking and Learning for Every Child, The Free Press, New York, 1992.Google ScholarGoogle Scholar
  3. Collins, N., A Comparison of Sound Onset Detection Algorithms with Emphasis on Psycho-Acoustically Motivated Detection Functions, Journal of the Audio Engineering Society, 2005.Google ScholarGoogle Scholar
  4. Bello, J. B., Daudet, L., Samer, A., Duxbury, C., Davies, M. and Sandler, M. B., A Tutorial on Onset Detection in Music Signals, IEEE Trans. on Speech and Audio Processing, pp: 1035--1047, 2005.Google ScholarGoogle Scholar
  5. Gillet O. and Richard G., Automatic Transcription of Drum Sequences using Audiovisual Features. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. Baader A. P., Kazennikov O., and Wiesendanger M., Coordination of bowing and fingering in violin playing, Cognitive Brain Research, pp:436--1C443, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. Nakamura, S., Statistcal Multimodal Integration for Audio-Visual Speech Processing, IEEE Trans. on Neural Networks, pp:854--866, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Garg, A., Potamianos, G., Neti, C., and Huang, T. S., Frame-dependent multi-stream reliability indicators for audio-visual speech recognition, Proc. IEEE Int. Conf. Multimedia and Expo (ICME03), pp:605--608, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kaynak, M. N., Qi Z., Cheok, A. D., Sengupta, K., Zhang J., Ko C. C., Analysis of lip geometric features for audio-visual speech recognition, IEEE Trans. Systems, Man, and Cybernetics, pp:564--570, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fragopanagos, N. and Taylor, J. G., Emotion recognition in human-computer interaction, Neural Network, pp:389--405, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Foote, J., Automatic Audio Segmentation Using a Measure of Audio Novelty, Proc. IEEE Int. Conf. Multimedia and Expo (ICME00), pp:452--455, 2000.Google ScholarGoogle Scholar
  12. Collins, N., Using a Pitch Detector for Onset Detection, Proc. of ISMIR2005, 2005.Google ScholarGoogle Scholar
  13. Boo W., Wang Y., Loscos A., A Violin Music Transcriber for Personalized Learning, IEEE Inter. Conf. on Multimedia Expo, 2006.Google ScholarGoogle Scholar
  14. Klapuri, A., A perceptually motivated multiple-f0 estimation method, Proc. IEEE Workshop on Applications of Audio Signal Processing to Audio and Acoustics, pp: 291--294, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. Klapuri, A., Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes, Proc. of ISMIR2006, pp:216--221, 2006.Google ScholarGoogle Scholar
  16. Loscos A., Wang Y., Boo W., Low Level Descriptors for Automatic Violin Transcription, Proc. of ISMIR2006, 2006.Google ScholarGoogle Scholar
  17. Flesch C., Art of Violin Playing: Book One, Carl Fischer Music Dist, 2000.Google ScholarGoogle Scholar
  18. Letessier J. and Brard F., Visual tracking of bare fingers for interactive surfaces, Seventeenth Annual ACM Symposium on User Interface Software and Technology, pp:119--122, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Burns, A. and Wanderley, Visual methods for the retrieval of guitarist fingering, Proc. of the 2006 Conf. on New interfaces For Musical Expression, pp:196--199, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wu Y., Lin J., Huang T. S., Analyzing and Capturing Articulated Hand Motion in Image Sequences, IEEE Trans. Pattern Anal. Mach. Intell, pp:1910--1922, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Leveau, P., Daudet, L., Richard G., Methodology and Tools for the Evaluation of Automatic Onset Detection Algorithms in Music, Proc. of ISMIR2004, pp:72--75, 2004.Google ScholarGoogle Scholar

Index Terms

  1. Educational violin transcription by fusing multimedia streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Emme '07: Proceedings of the international workshop on Educational multimedia and multimedia education
          September 2007
          138 pages
          ISBN:9781595937834
          DOI:10.1145/1290144

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 September 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader