skip to main content
Skip header Section
Introduction to Audio Analysis: A MATLAB ApproachApril 2014
Publisher:
  • Academic Press, Inc.
  • 6277 Sea Harbor Drive Orlando, FL
  • United States
ISBN:978-0-08-099388-1
Published:21 April 2014
Pages:
288
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Introduction to Audio Analysis serves as a standalone introduction to audio analysis, providing theoretical background to many state-of-the-art techniques. It covers the essential theory necessary to develop audio engineering applications, but also uses programming techniques, notably MATLAB, to take a more applied approach to the topic. Basic theory and reproducible experiments are combined to demonstrate theoretical concepts from a practical point of view and provide a solid foundation in the field of audio analysis. Audio feature extraction, audio classification, audio segmentation, and music information retrieval are all addressed in detail, along with material on basic audio processing and frequency domain representations and filtering. Throughout the text, reproducible MATLAB examples are accompanied by theoretical descriptions, illustrating how concepts and equations can be applied to the development of audio analysis systems and components. A blend of reproducible MATLAB code and essential theory provides enable the reader to delve into the world of audio signals and develop real-world audio applications in various domains. Practical approach to signal processing: The first book to focus on audio analysis from a signal processing perspective, demonstrating practical implementation alongside theoretical concepts Bridge the gap between theory and practice: The authors demonstrate how to apply equations to real-life code examples and resources, giving you the technical skills to develop real-world applications Library of MATLAB code: The book is accompanied by a well-documented library of MATLAB functions and reproducible experiments

Cited By

  1. Nicolini M and Ntalampiras S Gender-Aware Speech Emotion Recognition in Multiple Languages Pattern Recognition Applications and Methods, (111-123)
  2. Cao Y, Min X, Sun W and Zhai G (2023). Subjective and Objective Audio-Visual Quality Assessment for User Generated Content, IEEE Transactions on Image Processing, 32, (3847-3861), Online publication date: 1-Jan-2023.
  3. Angelopoulos K, Georgoulaki K and Glentis G Evaluating the impact of spectral estimators on frequency domain feature classification applications for pipe leakage detection 2022 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), (1-6)
  4. ACM
    Franzoni V, Baia A, Biondi G and Milani A Producing Artificial Male Voices with Maternal Features for Relaxation IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, (273-277)
  5. Rajakumar M, Ramya J and Maheswari B (2021). Health monitoring and fault prediction using a lightweight deep convolutional neural network optimized by Levy flight optimization algorithm, Neural Computing and Applications, 33:19, (12513-12534), Online publication date: 1-Oct-2021.
  6. Cunningham S, Ridley H, Weinel J and Picking R (2020). Supervised machine learning for audio emotion recognition, Personal and Ubiquitous Computing, 25:4, (637-650), Online publication date: 1-Aug-2021.
  7. Glentis G, Georgoulaki K and Angelopoulos K Efficient selection of time domain features for leakage detection in pipes carrying liquid commodities 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), (1-6)
  8. Gupta N, Khosravy M, Patel N, Dey N, Gupta S, Darbari H and Crespo R (2020). Economic data analytic AI technique on IoT edge devices for health monitoring of agriculture machines, Applied Intelligence, 50:11, (3990-4016), Online publication date: 1-Nov-2020.
  9. Li B, Han B, Wang Z, Jiang J and Long G Confusable Learning for Large-Class Few-Shot Classification Machine Learning and Knowledge Discovery in Databases, (707-723)
  10. Chittaragi N and Koolagudi S (2019). Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms, Language Resources and Evaluation, 54:2, (553-585), Online publication date: 1-Jun-2020.
  11. Chittaragi N, Hegde P, Mothukuri S and Koolagudi S Spectral Feature Based Kannada Dialect Classification from Stop Consonants Pattern Recognition and Machine Intelligence, (82-90)
  12. ACM
    Bhattacharya I, Foley M, Ku C, Zhang N, Zhang T, Mine C, Li M, Ji H, Riedl C, Welles B and Radke R The unobtrusive group interaction (UGI) corpus Proceedings of the 10th ACM Multimedia Systems Conference, (249-254)
  13. ACM
    Cunningham S, Weinel J and Picking R High-Level Analysis of Audio Features for Identifying Emotional Valence in Human Singing Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion, (1-4)
  14. ACM
    Zhang C, Xue Q, Waghmare A, Meng R, Jain S, Han Y, Li X, Cunefare K, Ploetz T, Starner T, Inan O and Abowd G FingerPing Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, (1-10)
  15. Yang X, He L, Qu D and Zhang W (2018). Semi-supervised minimum redundancy maximum relevance feature selection for audio classification, Multimedia Tools and Applications, 77:1, (713-739), Online publication date: 1-Jan-2018.
  16. ACM
    Rao S. B P, Rasipuram S, Das R and Jayagopi D Automatic assessment of communication skill in non-conventional interview settings: a comparative study Proceedings of the 19th ACM International Conference on Multimodal Interaction, (221-229)
  17. Strese M, Schuwerk C, Iepure A and Steinbach E (2017). Multimodal Feature-Based Surface Material Classification, IEEE Transactions on Haptics, 10:2, (226-239), Online publication date: 1-Apr-2017.
  18. Yang C, Cheung G, Stankovic V, Chan K and Ono N (2017). Sleep Apnea Detection via Depth Video and Audio Feature Learning, IEEE Transactions on Multimedia, 19:4, (822-835), Online publication date: 1-Apr-2017.
  19. Albornoz E and Milone D (2017). Emotion Recognition in Never-Seen Languages Using a Novel Ensemble Method with Emotion Profiles, IEEE Transactions on Affective Computing, 8:1, (43-53), Online publication date: 1-Jan-2017.
  20. ACM
    Rasipuram S Prediction/Assessment of communication skill using multimodal cues in social interactions Proceedings of the 18th ACM International Conference on Multimodal Interaction, (546-549)
  21. ACM
    Rasipuram S, B. P and Jayagopi D Asynchronous video interviews vs. face-to-face interviews for communication skill measurement: a systematic study Proceedings of the 18th ACM International Conference on Multimodal Interaction, (370-377)
  22. Cobb J A novel audio based approach to game control to encourage musical instrument practice Proceedings of the 30th International BCS Human Computer Interaction Conference: Fusion!, (1-3)
  23. ACM
    Giannakopoulos T and Siantikos G A ROS framework for audio-based activity recognition Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments, (1-4)
  24. ACM
    Prieto L, Sharma K, Dillenbourg P and Jesús M Teaching analytics Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, (148-157)
  25. ACM
    Mahesha P and Vinod D Automatic Segmentation and Classification of Dysfluencies in Stuttering Speech Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, (1-6)
  26. Nigam A and Riek L Social context perception for mobile robots 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (3621-3627)
  27. ACM
    Giannakopoulos T, Siantikos G, Perantonis S, Votsi N and Pantis J Automatic soundscape quality estimation using audio analysis Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, (1-9)
  28. Giannakopoulos T, Smailis C, Perantonis S and Spyropoulos C Realtime depression estimation using mid-term audio features Proceedings of the 3rd International Conference on Artificial Intelligence and Assistive Medicine - Volume 1213, (41-45)
Contributors
  • National Centre for Scientific Research "DEMOKRITOS"
  • University of Piraeus

Recommendations

Ghita Kouadri

Audio analysis is the science of dealing with the extraction of information from audio signals for the sake of analysis, classification, and synthesis. The applications of audio analysis range from surveillance and forensics to audio emotion detection. There are several audio analysis software packages available on the market, some of which are even free. They are quite useful for occasional users. However, MATLAB remains the de facto tool as it contains built-in functions to manipulate signals in general, including audio data. This book synthesizes the main techniques for audio capture, visualization, reading, analysis, and storage using MATLAB. It is divided into eight chapters. The introduction presents the MATLAB audio library and guidelines on how to use this book in the best way possible. The rest of the chapters in Part 1 (chapters 2 through 4) provide a tutorial on audio signals, transforms, and filtering essentials. Part 2 (chapters 5 through 7) delves into more advanced topics such as audio classification and segmentation. Part 3 (chapter 8) focuses on the important topic of music information retrieval, given the relevance of its applications. These applications include automatic music transcription, track separation, and instrument recognition. The book was written with the aim of providing a self-contained book. Every chapter contains a set of exercises to help the reader test the concepts learned. The provided MATLAB audio library constitutes the core of the book. Indeed, it simplifies many tasks when analyzing audio data and therefore can be used in further projects. When reading the theoretical part, I must admit that I had to make an effort to get some concepts. I believe that the authors aim at providing a relatively compact book with the necessary information on audio analysis. Therefore, some information has been condensed to fit the format. However, combined with conventional face-to-face lectures, the current book represents the best choice as a textbook. I also must admit that the book's typography and paper quality add value to its content. It is clear that a great deal of effort has been put forward to produce a high-quality textbook to be used on a daily basis by students and professionals. Online Computing Reviews Service

George Michael White

Giannakopoulos and Pikrakis discuss the scope of this book at its beginning: Before we proceed, it is important to note that, although in this book the term 'audio' does not exclude the speech signal, we are not focusing on traditional speech-related problems that have been studied by the research community for decades, e.g. speech recognition and coding. It is our intention to provide analysis methods that can be used to study various audio modalities and their relationship in mixed audio streams. ... In other words, we are not interested in providing solutions that are well tailored to specific audio types (e.g. the speech signal) but are not applicable to other modalities. The book is divided into three parts. The first part is devoted to a selection of mathematical tools that are used to extract various features of audio streams. Chapter 2 introduces some elementary techniques and properties that will prove helpful in what follows: sampling, playback mono, stereo, block reading and writing, and short-term processing. Chapter 3 brings in the heavy guns, the discrete Fourier transform (using the complex exponential formulation), the discrete cosine transform, the discrete-time wavelet transform, and digital filtering. Included are several MATLAB programs that implement these things. The following chapter explains how some of the elementary properties of audio files are extracted. Such a file may consist of a single stationary waveform. In real life, however, an audio file probably consists of one or more stationary or nonstationary waveforms mixed with “noise.” Various techniques can eliminate or reduce this noise. Time-domain and frequency-domain audio features centered around the distribution spectrum are defined here and more MATLAB programs are presented. When these tools and techniques are mastered, we can start using them to extract useful features from the audio streams, things like audio classification, segmentation, alignment, and temporal modeling. The second part of the book contains a chapter for teaching these topics. Chapter 5 begins the study of classification techniques. The features that are extracted from the files form a pyramid. The lower layers of this pyramid use short-term techniques that generate feature vectors that are passed up to higher layers that compute various statistics that, in turn, are passed up to form feature vectors. The end goal is to estimate a class label that is represented by the computed feature vector. Thus, a class label of a certain audio stream might indicate that it is part of a speech made by a certain individual, or perhaps the chirp of a black-capped chickadee or a segment of electronic music. There are approaches that can use the a priori probabilities to estimate the exact class the sound belongs to. In other cases, nothing at all is known about the sound's origins. How, then, should such a sound be classified__?__ This is explored in Part 2. The Bayesian classifier, k -nearest-neighbor classifier, and others are introduced at this time, along with the problems of training, testing, and evaluation of the results. Chapter 5 concludes with several case studies. Chapter 6 tackles the necessity of segmentation. Usually, real-life audio streams consist of sequences of different audio types, things like speech followed by music followed by more speech and so on. The goal here is to split the audio signal into homogeneous segments that can be analyzed separately. Various types of windowing may be used and classification may or may not be desirable. In chapter 7, “Audio Alignment and Temporal Modeling,” the reader will discover dynamic time warping, hidden Markov modeling, the Viterbi algorithm, the Baum-Welch algorithm, and various training methods. The chapters are each terminated by a set of exercises. Some of them will require a mathematical analysis. Others will be answered by a MATLAB program. This illustrates the strengths and weaknesses of the book. MATLAB is a very powerful programming system that is well suited for solving problems arising in this field. However, it is not as universally available as other systems such as Microsoft Visual Studio. If MATLAB is available to the reader, then go to it. MATLAB provides a suite of primitives that are eminently suitable for use in programs to solve problems in audio analysis. The MATLAB system is well worth the price for someone with a strong interest in the field. The reader should also note that a certain level of applied mathematics is required to do any serious work here. Thus, a working knowledge of complex variables and probability theory is required to really grasp the underlying concepts. At less than 300 pages, the volume is relatively slender and is written in a sparse but graceful style, skillfully edited, and well bound. It is mostly suitable for the reader seriously interested in audio analysis who likes a mathematical programming approach to the subject. Online Computing Reviews Service

Vladimir Botchev

This is a very well-written and well-presented book. It differs from some other books on signal processing, which use MATLAB as the main vehicle for conveying practical solutions, as it doesn't clutter all its pages with MATLAB code listings. MATLAB code, though an essential part of the book, and available for download, is only briefly explained, as in every good manual. The book deals less with signal processing techniques, which are covered only in the first couple of chapters after the introduction, than with pattern recognition and machine learning. Indeed, the emphasis is placed on classification and recognition algorithms. In far fewer pages than in other works dedicated to these topics, the authors clearly present the practical details of major classification and pattern search algorithms, such as k -means, dynamic programming, and hidden Markov. The only unfortunate omission among these select algorithms is an introduction to the most popular type of neural network (NN), backpropagation NN. Hopefully, they will include it in a future edition, since by application base, it is almost as widespread as the selected ones in the book. The book also presents some issues in music information retrieval, which while interesting are of lesser value. The MATLAB toolbox for that purpose, music information retrieval (MIR), has been available for many years now, with very extensive documentation. This new book on audio content analysis and the associated toolbox is highly recommended to audio signal processing practitioners. It can even serve as a first introduction to the more general area of pattern classification. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.