skip to main content
Sphinx-4: a flexible open source framework for speech recognitionMarch 2004
2004 Technical Report
Publisher:
  • Sun Microsystems, Inc.
  • An Imprint of Prentice Hall PTR 2500 Garcia Avenue Mountain View, CA
  • United States
Published:01 March 2004
Pages:
18
Bibliometrics
Skip Abstract Section
Abstract

Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "researchready" system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.

Cited By

  1. ACM
    Jia W, Zhang J, Shan J and Ding X Making Dynamic Page Coalescing Effective on Virtualized Clouds Proceedings of the Eighteenth European Conference on Computer Systems, (298-313)
  2. ACM
    Novoa J, Mahu R, Wuth J, Escudero J, Fredes J and Yoma N (2021). Automatic Speech Recognition for Indoor HRI Scenarios, ACM Transactions on Human-Robot Interaction, 10:2, (1-30), Online publication date: 1-May-2021.
  3. ACM
    Preum S, Shu S, Hotaki M, Williams R, Stankovic J and Alemzadeh H (2019). CognitiveEMS, ACM SIGBED Review, 16:2, (51-60), Online publication date: 16-Aug-2019.
  4. Lee C, Lee H, Wu S, Liu C, Fang W, Hsu J and Tseng B (2019). Machine Comprehension of Spoken Content, IEEE/ACM Transactions on Audio, Speech and Language Processing, 27:9, (1469-1480), Online publication date: 1-Sep-2019.
  5. Baumann T, Köhn A and Hennig F (2019). The Spoken Wikipedia Corpus collection, Language Resources and Evaluation, 53:2, (303-329), Online publication date: 1-Jun-2019.
  6. Lee H, Chung P, Wu Y, Lin T and Wen T (2018). Interactive Spoken Content Retrieval by Deep Reinforcement Learning, IEEE/ACM Transactions on Audio, Speech and Language Processing, 26:12, (2447-2459), Online publication date: 1-Dec-2018.
  7. ACM
    Ramunyisi N, Badenhorst J, Moors C and Gumede T Rapid development of a command and control interface for smart office environments Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists, (188-194)
  8. Kuppusamy K and Aghila G (2018). HuMan, Universal Access in the Information Society, 17:4, (841-864), Online publication date: 1-Nov-2018.
  9. Shrivastav S, Kumar S and Kumar K (2017). Towards an ontology based framework for searching multimedia contents on the web, Multimedia Tools and Applications, 76:18, (18657-18686), Online publication date: 1-Sep-2017.
  10. ACM
    Yazdani R, Arnau J and González A UNFOLD Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, (69-81)
  11. Noronha B, Dziemian S, Zito G, Konnaris C and Faisal A “Wink to grasp” — comparing eye, voice & EMG gesture control of grasp with soft-robotic gloves 2017 International Conference on Rehabilitation Robotics (ICORR), (1043-1048)
  12. ACM
    Necibi K, Frihia H and Bahi H On The Use of Decision Trees for Arabic Pronunciation Assessment Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication, (1-6)
  13. ACM
    Limerick H, Moore J and Coyle D Empirical Evidence for a Diminished Sense of Agency in Speech Interfaces Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, (3967-3970)
  14. Odriozola I, Serrano L, Hernaez I and Navas E The AhoSR Automatic Speech Recognition System Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854, (279-288)
  15. Michalevsky Y, Boneh D and Nakibly G Gyrophone Proceedings of the 23rd USENIX conference on Security Symposium, (1053-1067)
  16. Reindl K, Zheng Y, Schwarz A, Meier S, Maas R, Sehr A and Kellermann W (2013). A stereophonic acoustic signal extraction scheme for noisy and reverberant environments, Computer Speech and Language, 27:3, (726-745), Online publication date: 1-May-2013.
  17. ACM
    Nirjon S, Dickerson R, Asare P, Li Q, Hong D, Stankovic J, Hu P, Shen G and Jiang X Auditeur Proceeding of the 11th annual international conference on Mobile systems, applications, and services, (403-416)
  18. ACM
    Yu M, Vajda P, Chen D, Tsai S, Daneshi M, Araujo A, Chen H and Girod B EigenNews Proceedings of the 21st ACM international conference on Multimedia, (463-464)
  19. ACM
    Wiebusch D, Fischbach M, Latoschik M and Tramberend H Evaluating scala, actors, & ontologies for intelligent realtime interactive systems Proceedings of the 18th ACM symposium on Virtual reality software and technology, (153-160)
  20. ACM
    Hoste L, Dumas B and Signer B SpeeG Proceedings of the International Working Conference on Advanced Visual Interfaces, (156-163)
  21. Ward N, Vega A and Baumann T (2012). Prosodic and temporal features for language modeling for dialog, Speech Communication, 54:2, (161-174), Online publication date: 1-Feb-2012.
  22. Baumann T and Schlangen D The InproTK 2012 release NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data, (29-32)
  23. Prylipko D, Schnelle-Walka D, Lord S and Wendemuth A Zanzibar OpenIVR Proceedings of the 14th international conference on Text, speech and dialogue, (372-379)
  24. ACM
    Gürkök H, Hakvoort G and Poel M Modality switching and performance in a thought and speech controlled computer game Proceedings of the 13th international conference on multimodal interfaces, (41-48)
  25. Novak J, Minematsu N and Hirose K Open source WFST tools for LVCSR cascade development Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, (65-73)
  26. Baumann T and Schlangen D Predicting the micro-timing of user input for an incremental spoken dialogue system that completes a user's ongoing turn Proceedings of the SIGDIAL 2011 Conference, (120-129)
  27. Soupionis Y and Gritzalis D (2010). Audio CAPTCHA, Computers and Security, 29:5, (603-618), Online publication date: 1-Jul-2010.
  28. Hamidi F, Baljko M, Livingston N and Spalteholz L CanSpeak Proceedings of the 12th international conference on Computers helping people with special needs: Part I, (605-612)
  29. Wang C, Liu Z and Fels S Everyone can do magic Proceedings of the 9th international conference on Entertainment computing, (32-42)
  30. Buß O, Baumann T and Schlangen D Collaborating on utterances with a spoken dialogue system using an ISU-based approach to incremental dialogue management Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (233-236)
  31. Bahrani M and Sameti H (2010). A new bigram-PLSA language model for speech recognition, EURASIP Journal on Advances in Signal Processing, 2010, (1-8), Online publication date: 1-Feb-2010.
  32. Bursztein E and Bethard S Decaptcha Proceedings of the 3rd USENIX conference on Offensive technologies, (8-8)
  33. ACM
    Mendonça H, Lawson J, Vybornova O, Macq B and Vanderdonckt J A fusion framework for multimodal interactive applications Proceedings of the 2009 international conference on Multimodal interfaces, (161-168)
  34. Baumann T, Atterer M and Schlangen D Assessing and improving the performance of speech recognition for incremental systems Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, (380-388)
  35. Hoffman G and Breazeal C Anticipatory perceptual simulation for human-robot joint practice Proceedings of the 23rd national conference on Artificial intelligence - Volume 3, (1357-1362)
  36. ACM
    Qu S and Chai J Beyond attention Proceedings of the 13th international conference on Intelligent user interfaces, (237-246)
  37. ACM
    Hoffman G and Breazeal C Achieving fluency through perceptual-symbol practice in human-robot collaboration Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction, (1-8)
  38. ACM
    Dumas B, Lalanne D, Guinard D, Koenig R and Ingold R Strengths and weaknesses of software architectures for the rapid creation of tangible and multimodal interfaces Proceedings of the 2nd international conference on Tangible and embedded interaction, (47-54)
  39. Domont X, Heckmann M, Wersing H, Joublin F, Menzel S, Sendhoff B and Goerick C Word recognition with a hierarchical neural network Proceedings of the 2007 international conference on Advances in nonlinear speech processing, (142-151)
  40. Denkowski M, Hannon C and Sanchez A Spoken commands in a smart home Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence, (1025-1034)
  41. ACM
    Kubat R, DeCamp P and Roy B Totalrecall Proceedings of the 9th international conference on Multimodal interfaces, (208-215)
  42. ACM
    Qu S and Chai J Salience modeling based on non-verbal modalities for spoken language understanding Proceedings of the 8th international conference on Multimodal interfaces, (193-200)
  43. ACM
    Gold K and Scassellati B Using context and sensory data to learn first and second person pronouns Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, (110-117)
  44. Verstraeten D, Schrauwen B, Stroobandt D and Van Campenhout J (2005). Isolated word recognition with the Liquid State Machine, Information Processing Letters, 95:6, (521-528), Online publication date: 30-Sep-2005.
  45. Verstraten D, Schrauwen B, Stroobandt D and Van Campenhout J (2005). Isolated word recognition with the liquid state machine, Information Processing Letters, 95:6, (521-528), Online publication date: 30-Sep-2005.
Contributors
  • Sun Microsystems
  • Spotify USA Inc
  • Sun Microsystems
  • Carnegie Mellon University
  • Carnegie Mellon University
  • Technical University of Darmstadt
  • University of Montana
  • Mitsubishi Electric Research Laboratories

Recommendations