Article

Free Access

Open-vocabulary speech indexing for voice and video mail retrieval

Authors:
M. G. Brown

Olivetti Research Limited, 24a Trumpington St., Cambridge, CB2 1QA, UK

Olivetti Research Limited, 24a Trumpington St., Cambridge, CB2 1QA, UK
View Profile

,
J. T. Foote

Cambridge University Engineering Department, Cambridge, CB2 1PZ, UK

Cambridge University Engineering Department, Cambridge, CB2 1PZ, UK
View Profile

,
G. J. F. Jones

Cambridge University Computer Laboratory, Cambridge, CB2 3QG, UK

Cambridge University Computer Laboratory, Cambridge, CB2 3QG, UK
View Profile

,
K. Spärck Jones

View Profile

,
S. J. Young

Cambridge University Engineering Department, Cambridge, CB2 1PZ, UK

Cambridge University Engineering Department, Cambridge, CB2 1PZ, UK
View Profile

MULTIMEDIA '96: Proceedings of the fourth ACM international conference on MultimediaFebruary 1997Pages 307–316https://doi.org/10.1145/244130.244232

Published:01 February 1997Publication History

MULTIMEDIA '96: Proceedings of the fourth ACM international conference on Multimedia

Pages 307–316

References

1.R. Barber, C. Faloutsos, M. Flickner, J. Hafner, W. Niblack, and D. Petkovic. Efficient and effective querying by image content. J. Intelligent Information Sys., (3):1-31, 1994. Google ScholarDigital Library
2.M. G. Brown, J. T. Foote, G. J. F. Jones, K. Sp'~irck Jones, and S. J. Young. Automatic content-based retrieval of broadcast news. In Proc. ACM Multimedia 95, pages 35-43, San Francisco, November 1995. ACM. Google ScholarDigital Library
3.M. G. Brown, J. T. Foote, G. J. F. Jones, K. Sp'~irck Jones, and S. J. Young. Video Mail Retrieval using Voice: An overview of the Cambridge/Olivetti retrieval system. In Proc. A CM Multimedia 9~ Workshop on Multzmedia Database Management Systems, pages 47- 55, San Francisco, CA, October 1994.Google Scholar
4.C. Coker, K. Church, and M. Liberman. Morphology and rhyming: two powerful alternatives to Letter-to- Sound rules for speech synthesis. In ESCA Workshop on Speech Synthesis, pages83-86, Autrans, France, Sept 1990. ECSA.Google Scholar
5.J. T. Foote, G. J. F. Jones, K. Sp'eirck Jones, and S. J. Young. Talker-independent keyword spotting for information retrieval. In Proc. Eurospeech 95, volume 3, pages 2145-2148, Madrid, 1995. ESCA.Google Scholar
6.M. A. Hearst. TileBars: Visualisation of term distribution information in full text information access. In Proceedings of the A CM SIGCHI Conference on Human Factors in Computing Systems (CHI), Denver, CO, May 1995. ACM. Google ScholarDigital Library
7.A. Hopper. Digital video on computer workstations. In Proceedings of Eurographics, 1992.Google Scholar
8.C. E. Jacobs, A. Finkelstein, and D. H. Salesin. Fast multiresolution image querying. In Proceedings of the SIGGRAPH 95 Conference, pages 277-286, Los Angeles, CA, August 1995. ACM SIGGRAPH. Google ScholarDigital Library
9.D. A. James. The Application o} Classical information Retrieval Techniques to Spoken Documents. PhD thesis, Cambridge University, February 1995.Google Scholar
10.D. A. James and S. J. Young. A fast lattice-based approach to vocabulary independent wordspotting. In Proc. ICASSP 9d, volume I, pages 377-380, Adelaide, 1994. IEEE.Google ScholarCross Ref
11.G. J. F. Jones, J. T. Foote, K. Spiirck Jones, and S. J. Young. VMR report on keyword definition and data collection. Technical Report 335, Cambridge University Computer Laboratory, May 1994.Google Scholar
12.G. J. F. Jones, J. T. Foote, K. Sp'~irck Jones, and S. J. Young. Video Mail Retrieval: the effect of word spotting accuracy on precision. In Proc. ICASSP 95, volume I, pages 309-312, Detroit, May 1995. IEEE.Google ScholarCross Ref
13.G. J. F. Jones, J. T. Foote, K. Sp'~irck Jones, and S. J. Young. Retrieving spoken documents by combining multiple index sources. In Proc. SIGiR 96, Zfirich, August 1996. ACM. Google ScholarDigital Library
14.G. J. F. Jones, J. T. Foote, K. Sp~irck Jones, and S. J. Young. Robust talker-independent audio document retrieval. In Proc. ICASSP 96, volume I, pages 311-314, Atlanta, CA, April 1996. IEEE. Google ScholarDigital Library
15.I. Leslie, D. McAuley, and D. Tennenhouse. ATM Everywhere? IEEE Network, March 1993.Google ScholarDigital Library
16.J. McDonough, K. Ng, P. Jeanrenaud, H. Gish, and J. R. Rohlicek. Approaches to topic identification on the switchboard corpus. In Proc. ICASSP 94, volume I, pages 385-388, Adelaide, 1994. IEEE.Google ScholarCross Ref
17.M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130-137, July 1980.Google Scholar
18.L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257-286, February 1989.Google ScholarCross Ref
19.T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals. WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition. In Proc. ICASSP 95, pages 81-84, Detroit, May 1995. IEEE.Google Scholar
20.T. Robinson, M. Hochberg, and S. Renals. IPA: Improved phone modelling with recurrent neural networks. In Proc. ICASSP 9.4, volume 1, pages 37-40, Adelaide, SA, April 1994.Google ScholarCross Ref
21.R. C. Rose. Techniques for information retrieval from speech messages. Lincoln Laboratory Journal, 4(1):45- 60, 1991. Google ScholarDigital Library
22.F. Samaria and S. J. Young. A HMM-based architecture for face identification. Image and Vision Computing, 12(8):537-543, October 1994.Google ScholarCross Ref
23.M. A. Smith and M. G. Christel. Automating the creation of a digital video library. In Proc. A CM Multimedia 95, pages 357-358, San Francisco, November 1995. ACM. Google ScholarDigital Library
24.S. W. Smoliar and H. J. Zhang. Content-based video indexing and retrieval. IEEE Multimedia, 1(2):62-72, Summer 1994. Google ScholarDigital Library
25.K. Sp'~ixck Jones, J. T. Foote, G. J. F. Jones, and S. J. Young. Spoken document retrieval -- a multimedia tool. In Fourth Annual Symposium on Document Analysis and information Retrieval, pages 1-11, Las Vegas, April 1995.Google Scholar
26.K. Sphrck Jones, G. J. F. Jones, J. T. Foote, and S. J. Young. Experiments in spoken document retrieval, information Processing and Management, 32(4):399-417, 1996. Google ScholarDigital Library
27.C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 2nd edition, 1979. Google ScholarDigital Library
28.M. Wechsler and P. Schiiuble. Speech retrieval based on automatic indexing. In C. J. van Rijsbergen, editor, Proceedings of the MIRO Workshop, University of Glasgow, September 1995. Google ScholarDigital Library
29.L. Wilcox, F. Chen, and V. Balasubramanian. Segmentation of speech using speaker identification. In Proc. ICASSP 9~, volume S1, pages 161-164, Adelaide, SA, April 1994.Google ScholarCross Ref
30.L. D. Wilcox and M. A. Bush. Training and search algorithms for an interactive wordspotting system. In Proc. ICASSP 92, volume II, pages 97-100, San Francisco, 1992. IEEE.Google ScholarCross Ref
31.J. H. Wright, M. J. Carey, and E. S. Parris. Topic discrimination using higher-order statistical models of spotted keywords. Computer Speech and Language, 9(4):381-405, Oct 1995.Google ScholarCross Ref
32.S. J. Young, J. j. Ode}l, and P. C. Woodland. Treebased state tying for high accuracy acoustic modelling. in Proc. A RPA Spoken Language Technology Workshop, Plainsboro, NJ, 1994. Google ScholarDigital Library
33.S. J. Young, N. H. Russell, and J. H. S. Thornton. Token passing: a simple conceptual model for connected speech recognition systems. Technical Report CUED/F-INFENG/TR.38, Cambridge University Engineering Department, July 1989. ftp://svrftp.eng.cam.ac.uk/pub/reports/young _tr38.ps.Z.Google Scholar
34.S. J. Young, P. C. Woodland, and W. J. Byrne. HTK: Hidden Markov Model Toolkit V1.5. Entropic Research Laboratories, Inc., 600 Pennsylvania Ave. SE, Suite 202, Washington, DC 20003 USA, 1993.Google Scholar

Index Terms

Recommendations

An affect-based video retrieval system with open vocabulary querying
AMR'10: Proceedings of the 8th international conference on Adaptive Multimedia Retrieval: context, exploration, and fusion

Content-based video retrieval systems (CBVR) are creating new search and browse capabilities using metadata describing significant features of the data. An often overlooked aspect of human interpretation of multimedia data is the affective dimension. ...
Read More
Improving Acoustic Models with Captioned Multimedia Speech
ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

Speech recognition can be used to create searchable transcripts for audio indexing in digital video libraries. Large amounts of hand-transcribed speech training data are required to build or improve acoustic models of highly accurate speech recognition ...
Read More
Indexing and Retrieval of Audio: A Survey

With more and more audio being captured and stored, there is a growing need for automatic audio indexing and retrieval techniques that can retrieve relevant audio pieces quickly on demand. This paper provides a comprehensive survey of audio indexing and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '96: Proceedings of the fourth ACM international conference on Multimedia
February 1997
457 pages
ISBN:0897918711
DOI:10.1145/244130
Chairmen:
Philippe Aigrain
MIT Media Lab
,
Wendy Hall
Univ. of Southampton
,
Thomas D. C. Little
Boston Univ.
,
V. Michael Bove
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 1997
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio indexing
browsing
content-based retrieval
information retrieval
speech recognition
word spotting
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 834
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Open-vocabulary speech indexing for voice and video mail retrieval

MULTIMEDIA '96: Proceedings of the fourth ACM international conference on Multimedia

References

Cited By

Index Terms

Recommendations

An affect-based video retrieval system with open vocabulary querying

Improving Acoustic Models with Captioned Multimedia Speech

Indexing and Retrieval of Audio: A Survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media