Fundamentals of speech recognition: | Guide books

Fundamentals of speech recognitionAugust 1993

Publisher:

Prentice-Hall, Inc.
Division of Simon and Schuster One Lake Street Upper Saddle River, NJ
United States

ISBN:978-0-13-015157-5

Published:01 August 1993

Pages:

507

Available at Amazon

Bibliometrics

Abstract

No abstract available.

Cited By

Contributors

Lawrence R Rabiner
University of California, Santa Barbara
- Publication Years1972 - 2010
- Publication counts35
- Citation count2,796
- Available for Download5
- Downloads (cumulative)1,969
- Downloads (12 months)37
- Downloads (6 weeks)4
- Average Downloads per Article394
- Average Citation per Article80
View Full Profile
Biing Hwang (Fred) Juang
Georgia Institute of Technology
- Publication Years1986 - 2018
- Publication counts49
- Citation count1,168
- Available for Download3
- Downloads (cumulative)737
- Downloads (12 months)44
- Downloads (6 weeks)5
- Average Downloads per Article246
- Average Citation per Article24
View Full Profile

Index Terms

Fundamentals of speech recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. ...
Read More
Automatic lipreading to enhance speech recognition (speech reading)
Read More
Acoustical pre-processing for robust speech recognition
HLT '89: Proceedings of the workshop on Speech and Natural Language

In this paper we describe our initial efforts to make SPHINX, the CMU continuous speech recognition system, environmentally robust. Our work has two major goals: to enable SPHINX to adapt to changes in microphone and acoustical environment, and to ...
Read More

Reviewer: James H. Bradford

The authors' goal in writing this book is set out in the preface: “…the fundamental goal of the book would be to provide a theoretically sound, technically accurate, and reasonably complete description of the basic knowledge and ideas that constitute a modern system for speech recognition by machine” (p.<__?__Pub Fmt interword-space>xxxi). The authors did not achieve their goal, because of careless writing and poor editing. The book is divided into nine chapters. Each chapter addresses a different aspect of what might be termed the engineering issues of speech recognition. Chapter 1 provides a short description of the structure and content of the remainder of the book. The substance begins in chapter 2. Chapter 2 deals with the production, perception, and acoustics of speech. Problems arise within the first few pages. In Figure 2.5, the “Glottal Volume Velocity” of a typical speaker is illustrated. Neither the figure nor the associated text offers any definition of glottal volume velocity, however, or mentions its significance. Here, as in many other parts of the book, the authors seem to have included material simply because they knew it and not because it would be of any use to their readers. I was left with the impression that many figures throughout the book had been cut from other work and pasted into the text. The result is an unhappy collage of poorly related material. The authors have failed to adequately understand their readership, and this constitutes a serious problem. At many points throughout the book, I was left wondering for whom this book was written. For example, in Section 2.5, Hopfield artificial neural networks are described. Hopfield nets are generally considered to be a moderately advanced topic, yet the material is presented in a single paragraph. It is not clear what a reader unfamiliar with the field would learn from this coverage. Another precept of good writing is to avoid forward references whenever possible. This book goes beyond normal violation of this guideline—forward references exist but are never made explicit to the reader. For example, the term “melscale” is used in Figure 2.50 on page 64, but the definition of “melscale” does not appear until page 78. Chapter 3 provides a thorough description of the basic techniques of preprocessing speech to provide suitable input to the recognition algorithms. The description includes material on filter banks, linear predictive coding, and vector quantization. Chapter 4 provides extensive coverage of the various kinds of similarity (or distortion) measures that can be used to classify patterns in speech signals. Section 4.7 gives a useful description of the dynamic time warping algorithm that is fundamental to classical speech recognition. Chapter 5 provides a wealth of practical guidelines (supported by empirical studies) on how to assemble the various distortion measures and clustering techniques to produce a practical speech recognition system. Section 5.7, on speech recognition under adverse conditions, is interesting, useful, and readable. Arguably the most important technique of modern speech recognition, hidden Markov models (HMMs), is covered in chapter 6. Much of this chapter consists of a highly informative tutorial on HMMs that is based on an earlier paper by Rabiner [1]. This chapter contains the single most irritating mistake in the book. On page 339, the authors claim the following: “Using g t i , we can solve for the individually most likely state at time t , as q * t = arg min 1?i?N g t i 1? t?T .” I puzzled over this equation for some time before going back to Rabiner's original paper [1]. In fact, the equation should have been “ q * t = arg max 1?i?N g t i 1? t?T .” This kind of mistake creates endless difficulty for a reader who is being exposed to the subject for the first time. An error in a key formula not only misleads the reader, it is apt to undermine the reader's confidence in all of the hundreds of formulas found throughout the text. For someone who does not have the knowledge and confidence that derive from familiarity with the material, the question arises: “Which of these many formulas are right, and which are wrong__?__” This leads to my most important point: scientific publication is worse than useless if the authors do not take the care to get it right. Chapter 7 is a straightforward extension of previous material to address connected word recognition (classical speech recognition deals with disconnected words—words surrounded by short periods of silence). Chapter 8 gives an overview of some of the problems encountered when large-vocabulary speech recognition is attempted. Chapter<__?__Pub Fmt hardspace>9 concludes the book by describing areas in which speech recognition has been successfully applied. This brisk and readable chapter brings this unfortunate book to a close. There is no doubt that the authors know their material. Indeed, it is hardly an exaggeration to say that they discovered much of it. But it is not enough to be a good researcher. The authors of a book must also be good communicators with a clear conception of their readership. This book fails in two senses. The structure and organization <__?__Pub Caret>have many problems. Undefined terms, unmentioned forward references, and inappropriate graphics can be found throughout. The second and perhaps greater failure is that the authors have lost track of their prospective readers. The result is a disappointing book of very limited value.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Browse Books

Sections

Cited By

Index Terms

MFCC-GMM based accent recognition system for Telugu speech signals

Automatic lipreading to enhance speech recognition (speech reading)

Acoustical pre-processing for robust speech recognition

Access critical reviews of Computing literature here

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Recommendations

MFCC-GMM based accent recognition system for Telugu speech signals

Automatic lipreading to enhance speech recognition (speech reading)

Acoustical pre-processing for robust speech recognition

Access critical reviews of Computing literature here