Article

Educational violin transcription by fusing multimedia streams

Authors:
Ye Wang

National University of Singapore

National University of Singapore
View Profile

,
Bingjun Zhang

National University of Singapore

National University of Singapore
View Profile

,
Olaf Schleusing

National University of Singapore

National University of Singapore
View Profile

Emme '07: Proceedings of the international workshop on Educational multimedia and multimedia educationSeptember 2007Pages 57–66https://doi.org/10.1145/1290144.1290154

Published:28 September 2007Publication History

Emme '07: Proceedings of the international workshop on Educational multimedia and multimedia education

Pages 57–66

ABSTRACT

Computer-assisted violin tutoring requires accurate violin transcription. For pitched non-percussive (PNP) sounds such as from the violin, note segmentation is a much more difficult task than pitch detection. This issue is accentuated when the audio is recorded during an instrument practice session at home which is acoustically inferior to a professional recording studio. This paper presents a new approach to the problem by using the correlation between different media streams for e-learning applications. We design a capture mechanism to record one audio and two video streams simultaneously, and exploit the relationships among them for enhanced transcription. State-of-the-art audio methods for note segmentation and pitch estimation are implemented as the audio-only baseline. Two web-cameras are employed to track the right hand (bowing) and the left hand's four fingers (fingering) on the fingerboard, respectively. The audio and visual information is then fused in the feature space. Our new approach is evaluated with an audio-visual violin music database containing 16 complete music pieces of different styles with 2157 notes in total. Experimental results show that our multimodal approach achieves a 10% increase in true positives, and a 8% reduction in false positives of overall transcription performance in comparison with the audio-only baseline.

References

Yin J., Wang Y. and Hsu D., Digital Violin Tutor: An Integrated System for Beginning Violin Learners, ACM Multimedia Conf., 2005. Google ScholarDigital Library
Perkins, D., Smart Schools: Better Thinking and Learning for Every Child, The Free Press, New York, 1992.Google Scholar
Collins, N., A Comparison of Sound Onset Detection Algorithms with Emphasis on Psycho-Acoustically Motivated Detection Functions, Journal of the Audio Engineering Society, 2005.Google Scholar
Bello, J. B., Daudet, L., Samer, A., Duxbury, C., Davies, M. and Sandler, M. B., A Tutorial on Onset Detection in Music Signals, IEEE Trans. on Speech and Audio Processing, pp: 1035--1047, 2005.Google Scholar
Gillet O. and Richard G., Automatic Transcription of Drum Sequences using Audiovisual Features. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2005.Google ScholarCross Ref
Baader A. P., Kazennikov O., and Wiesendanger M., Coordination of bowing and fingering in violin playing, Cognitive Brain Research, pp:436--1C443, 2005.Google ScholarCross Ref
Nakamura, S., Statistcal Multimodal Integration for Audio-Visual Speech Processing, IEEE Trans. on Neural Networks, pp:854--866, 2002. Google ScholarDigital Library
Garg, A., Potamianos, G., Neti, C., and Huang, T. S., Frame-dependent multi-stream reliability indicators for audio-visual speech recognition, Proc. IEEE Int. Conf. Multimedia and Expo (ICME03), pp:605--608, 2003. Google ScholarDigital Library
Kaynak, M. N., Qi Z., Cheok, A. D., Sengupta, K., Zhang J., Ko C. C., Analysis of lip geometric features for audio-visual speech recognition, IEEE Trans. Systems, Man, and Cybernetics, pp:564--570, 2004. Google ScholarDigital Library
Fragopanagos, N. and Taylor, J. G., Emotion recognition in human-computer interaction, Neural Network, pp:389--405, 2005. Google ScholarDigital Library
Foote, J., Automatic Audio Segmentation Using a Measure of Audio Novelty, Proc. IEEE Int. Conf. Multimedia and Expo (ICME00), pp:452--455, 2000.Google Scholar
Collins, N., Using a Pitch Detector for Onset Detection, Proc. of ISMIR2005, 2005.Google Scholar
Boo W., Wang Y., Loscos A., A Violin Music Transcriber for Personalized Learning, IEEE Inter. Conf. on Multimedia Expo, 2006.Google Scholar
Klapuri, A., A perceptually motivated multiple-f0 estimation method, Proc. IEEE Workshop on Applications of Audio Signal Processing to Audio and Acoustics, pp: 291--294, 2005.Google ScholarCross Ref
Klapuri, A., Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes, Proc. of ISMIR2006, pp:216--221, 2006.Google Scholar
Loscos A., Wang Y., Boo W., Low Level Descriptors for Automatic Violin Transcription, Proc. of ISMIR2006, 2006.Google Scholar
Flesch C., Art of Violin Playing: Book One, Carl Fischer Music Dist, 2000.Google Scholar
Letessier J. and Brard F., Visual tracking of bare fingers for interactive surfaces, Seventeenth Annual ACM Symposium on User Interface Software and Technology, pp:119--122, 2004. Google ScholarDigital Library
Burns, A. and Wanderley, Visual methods for the retrieval of guitarist fingering, Proc. of the 2006 Conf. on New interfaces For Musical Expression, pp:196--199, 2006. Google ScholarDigital Library
Wu Y., Lin J., Huang T. S., Analyzing and Capturing Articulated Hand Motion in Image Sequences, IEEE Trans. Pattern Anal. Mach. Intell, pp:1910--1922, 2005. Google ScholarDigital Library
Leveau, P., Daudet, L., Richard G., Methodology and Tools for the Evaluation of Automatic Onset Detection Algorithms in Music, Proc. of ISMIR2004, pp:72--75, 2004.Google Scholar

Index Terms

Educational violin transcription by fusing multimedia streams
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems
  2. Robustness
    1. Hardware reliability
      1. Signal integrity and noise analysis

Recommendations

Visual analysis of fingering for pedagogical violin transcription
MM '07: Proceedings of the 15th ACM international conference on Multimedia

Automatic music transcription, in spite of decades of research, remains a challenging research problem. The traditional audio-only approach has yet to achieve a satisfactory performance for any computer-aided pedagogical system. Inspired by the high ...
Read More
Event based transcription system for polyphonic piano music

Music transcription consists in transforming the musical content of audio data into a symbolic representation. The objective of this study is to investigate a transcription system for polyphonic piano, triggered by events corresponding to the played ...
Read More
Automatic Guitar Music Transcription
ACSAT '12: Proceedings of the 2012 International Conference on Advanced Computer Science Applications and Technologies

This paper presents a system that helps in automatically generating guitar tablatures and musical scores based on musical audio data. Information gathered from the audio consists of pitch, onsets and durations, chords, and beat and tempo. Major issues ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Emme '07: Proceedings of the international workshop on Educational multimedia and multimedia education
September 2007
138 pages
ISBN:9781595937834
DOI:10.1145/1290144
General Chairs:
Gerald Friedland
ICSI Berkeley, USA
,
Wolfgang Hürst
University of Freiburg, Germany
,
Lars Knipping
Berlin University of Technology, Germany
,
Program Chair:
Max Mühlhäuser
Darmstadt University of Technology, Germany
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio-visual fusion
computer-assisted tutoring
detection function
music transcription
note segmentation
onset detection
Qualifiers
- Article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 343
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Educational violin transcription by fusing multimedia streams

Emme '07: Proceedings of the international workshop on Educational multimedia and multimedia education

ABSTRACT

References

Cited By

Index Terms

Recommendations

Visual analysis of fingering for pedagogical violin transcription

Event based transcription system for polyphonic piano music

Automatic Guitar Music Transcription