skip to main content
Skip header Section
An Introduction to Audio Content Analysis: Applications in Signal Processing and Music InformaticsAugust 2012
Publisher:
  • Wiley-IEEE Press
ISBN:978-1-118-26682-3
Published:14 August 2012
Pages:
272
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

With the proliferation of digital audio distribution over digital media, audio content analysis is fast becoming a requirement for designers of intelligent signal-adaptive audio processing systems. Written by a well-known expert in the field, this book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike. A review of relevant fundamentals in audio signal processing, psychoacoustics, and music theory, as well as downloadable MATLAB files are also included. Please visit the companion website: www.AudioContentAnalysis.org

Cited By

  1. ACM
    Sun J, Deng L, Afouras T, Owens A and Davis A (2023). Eventfulness for Interactive Video Alignment, ACM Transactions on Graphics, 42:4, (1-10), Online publication date: 1-Aug-2023.
  2. Yang P, Kuang S, Wu C and Hsu J Predicting Music Emotion by Using Convolutional Neural Network HCI in Business, Government and Organizations, (266-275)
  3. Trowitzsch I, Schymura C, Kolossa D and Obermayer K (2019). Joining Sound Event Detection and Localization Through Spatial Segregation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 28, (487-502), Online publication date: 1-Jan-2020.
  4. Bayle Y, Robine M and Hanna P (2019). SATIN, Multimedia Tools and Applications, 78:3, (2703-2718), Online publication date: 1-Feb-2019.
  5. ACM
    Davis A and Agrawala M (2018). Visual rhythm and beat, ACM Transactions on Graphics, 37:4, (1-11), Online publication date: 31-Aug-2018.
  6. Alam F, Danieli M and Riccardi G (2018). Annotating and modeling empathy in spoken conversations, Computer Speech and Language, 50:C, (40-61), Online publication date: 1-Jul-2018.
  7. ACM
    Ordiales H and Bruno M Sound recycling from public databases Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences, (1-8)
  8. Sanchez-Hevia H, Ayllon D, Gil-Pita R, Rosa-Zurera M, Sanchez-Hevia H, Ayllon D, Gil-Pita R and Rosa-Zurera M (2017). Maximum Likelihood Decision Fusion for Weapon Classification in Wireless Acoustic Sensor Networks, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:6, (1172-1182), Online publication date: 1-Jun-2017.
  9. Trowitzsch I, Mohr J, Kashef Y, Obermayer K, Trowitzsch I, Mohr J, Kashef Y and Obermayer K (2017). Robust Detection of Environmental Sounds in Binaural Auditory Scenes, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:6, (1344-1356), Online publication date: 1-Jun-2017.
  10. ACM
    Lu Y, Wu C, Lu C and Lerch A An Unsupervised Approach to Anomaly Detection in Music Datasets Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, (749-752)
  11. Hupperich T, Hosseini H and Holz T Leveraging Sensor Fingerprinting for Mobile Device Authentication Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment - Volume 9721, (377-396)
  12. Zhao H, Chen Y, Wang R and Malik H (2016). Anti-Forensics of Environmental-Signature-Based Audio Splicing Detection and Its Countermeasure via Rich-Features Classification, IEEE Transactions on Information Forensics and Security, 11:7, (1603-1617), Online publication date: 1-Jul-2016.
  13. Bano S and Cavallaro A (2016). ViComp, Multimedia Tools and Applications, 75:12, (7187-7210), Online publication date: 1-Jun-2016.
  14. ACM
    Bretan M and Weinberg G (2016). A survey of robotic musicianship, Communications of the ACM, 59:5, (100-109), Online publication date: 26-Apr-2016.
  15. ACM
    Mahesha P and Vinod D Automatic Segmentation and Classification of Dysfluencies in Stuttering Speech Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, (1-6)
  16. Dimoulas C and Symeonidis A (2015). Syncing Shared Multimedia through Audiovisual Bimodal Segmentation, IEEE MultiMedia, 22:3, (26-42), Online publication date: 1-Jul-2015.
  17. ACM
    Abadi M, Abad A, Subramanian R, Rostamzadeh N, Ricci E, Varadarajan J and Sebe N A Multi-task Learning Framework for Time-continuous Emotion Estimation from Crowd Annotations Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia, (17-23)
  18. ACM
    Liu Y, Liu Y, Zhao Y and Hua K What Strikes the Strings of Your Heart? Proceedings of the 22nd ACM international conference on Multimedia, (1069-1072)
  19. Sturm B A Survey of Evaluation in Music Genre Recognition Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, (29-66)
Contributors
  • Georgia Institute of Technology

Recommendations

Reviews

Soubhik Chakraborty

Audio content analysis (ACA) is actually a subtopic of the broader music information retrieval (MIR) research area. This subtopic deals with bringing out musical and perceptual properties directly from the audio signals to improve human-computer interaction (HCI) with digital audio signals. A good understanding of ACA assists in the design of intelligent MIR applications and content-adaptive audio processing systems. In the author's own words, "ACA is a multidisciplinary research field" requiring knowledge from "different research fields such as musicology and music theory, (music) psychology, psychoacoustics, audio engineering, library science, and last but not least computer science for pattern recognition and machine learning." Chapter 1 introduces ACA and chapter 2 covers the fundamentals of audio signals and signal processing. The major topics covered in the remaining chapters of the book include instantaneous features (such as statistical properties, spectral shape, and signal properties), intensity, tonal analysis, temporal analysis, alignment, musical genre, similarity and mood, audio fingerprinting, and music performance analysis. The author provides a very handy appendix on convolution properties, Fourier transforms, principal component analysis, and software for audio analysis. The book includes many salient features: It is a very good guide to ACA and its application in signal processing and music informatics. It treats various characteristics of musical information separately, including pitch, harmony, tempo, key, tonality, and timbre. It includes a helpful review of the basics of audio signal processing, music theory, and psychoacoustics (making it useful as an introductory text). It analyzes and compares different algorithms for the same task. Its companion website (http://www.audiocontentanalysis.org/) includes invaluable MATLAB programs that are freely downloadable. It concludes with a comprehensive bibliography. The author is an acknowledged expert in the music industry. This book will not only greatly help undergraduate and graduate ACA students, but will also be a boon to music researchers and music industry experts alike. The book is simply a treasure for music analysts, and I would strongly recommend it for any scientific library. It does not, however, focus on speech signals; as such, automatic speech recognition, although within the scope of ACA, has been omitted. To use the book profitably, an elementary knowledge of digital signal processing (DSP) is necessary. More reviews about this item: Amazon Online Computing Reviews Service

Vladimir Botchev

The major positive traits of this tiny book are that it gathers in one place the information that up to now had been scattered in papers, open-source code descriptions, and specialized Internet forums (mostly academic ones), and that it provides an example of MATLAB code on the book's Web site (http://www.audiocontentanalysis.org/) that is easy to understand and use. After an introductory chapter, the book begins with a chapter (2) devoted to elementary concepts in digital audio signals and their basic transforms (Fourier, constant Q , and auditory filter banks). Chapter 3 introduces the so-called instantaneous feature, in other words, a numeric qualifier for a short segment of the signal being analyzed. The emphasis is on statistical qualifiers, such as moments, and spectral shape qualifiers. The descriptions are concise and ready to use in applications that would need these features. Chapter 4 gives a short description of some intensity features, such as signal envelopes. Tonal analysis, which includes pitch processing, is given more thorough treatment in the next chapter. Chapter 6 presents details in the area of temporal analysis, such as tempo, beats, and onset detection. Chapter 7 concludes the discussions on features and their extraction with a discussion of algorithms for time alignment, including dynamic time warping. The last three chapters are devoted to basic applications, some of which are well known to smartphone users, such as musical genre recognition and music similarity, described in chapter 8. Chapter 9 gives a glimpse into audio fingerprinting. The last chapter introduces music performance analysis. There are four appendices, a short description of convolution properties, a lengthy description of Fourier transforms, a two-page courtesy mention of principal component analysis, and a quite useful summary of some of the major (and best so far) open-source software platforms that are either devoted to or usable for audio content analysis. Overall, this is a very practical book. It's a good source of concise information on many topics in audio analysis, and I recommend it for practitioners of digital audio. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.