skip to main content
10.1145/503376.503426acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Article

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Published:20 April 2002Publication History

ABSTRACT

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly

References

  1. Arons, B. Interactively skimming speech. Unpublished PhD thesis, MIT Media Lab, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Askwall, S. Computer supported reading vs reading text on paper: a comparison of two reading situations, International Journal of Man Machine Studies, 22, 425--439, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  3. Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. An Interactive Comic Book Presentation for Exploring Video. In CHI2000, 185--192, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chalfonte, B., Fish, R., and Kraut, R. Expressive richness. In CHI91, 21--26, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Degen, L., Mander, R., and Salomon, G. Working with audio. In CHI92, 413--418, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hauptmann and Witbrock, M. Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval, In M. Maybury (Ed.), Intelligent Multimedia Information Retrieval, AAAI Press, pp. 213--239, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hindus, D., Schmandt, C., and Horner, C. Capturing, structuring and representing ubiquitous audio. ACM Transactions on Information Systems, 11, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hirschberg, J., Bacchiani, M., Hindle, D., Isenhour, P., Rosenberg, A., Stark, L., Stead, L., Zamchick, G., and Whittaker, S. SCANMail: Browsing and Searching Speech Data by Content, Proceedings of Eurospeech 2001, Aalborg, 2001.Google ScholarGoogle Scholar
  9. Hirschberg, J. and Nakatani, C. Acoustic indicators of topic segmentation. In ICSLP98, 1998.Google ScholarGoogle Scholar
  10. Jones, G., Foote, J., Spärck Jones, K., and Young, S. Retrieving Spoken Documents by Combining Multiple Index Sources, In SIGIR96, 30--38, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M. Four paradigms for indexing videoconferences. In IEEE Multimedia, 3(1), 63--73, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Moran, T., Palen, L., Harrison, S., Chiu, P., Kimber, D., Minneman, S., van Melle, W., and Zellweger, P. "I'll get that off the audio": salvaging in a multimedia meeting. In CHI97, 202--209, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rice R. and Shook, D. Voice messaging coordination and communication. In C. Egido, J. Galegher and R. Kraut, eds., Intellectual Teamwork, Lawrence Erlbaum, NJ, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rice, R.E., & Tyler, J. (1995). Individual and organizational influences on voicemail use and evaluation. Behaviour and Information Technology, 14(6), 329--341.Google ScholarGoogle Scholar
  15. Salton, G. The SMART Retrieval System, Prentice-Hall, Englewood Cliffs, NJ, 1971.Google ScholarGoogle Scholar
  16. Stark, L., Whittaker, S., and Hirschberg, J. ASR satisficing: the effects of ASR accuracy on speech retrieval. In Proceedings of International Conference on Spoken Language Processing, 2000.Google ScholarGoogle Scholar
  17. Stifelman, L., Arons, B., and Schmandt, C. The audio notebook: paper and pen interaction with structured speech. In CHI2001, 182--189, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Whittaker, S., Davis, R., Hirschberg, J., and Muller, U. Jotmail: a voicemail interface that enables you to see what was said. In CHI2000, 89-96, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Whittaker, S., Hirschberg, J., Choi, J., Hindle, D., Pereira, F., and Singhal, A. SCAN: designing and evaluating user interfaces to support retrieval from speech archives. In SIGIR99, 26--33, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Whittaker, S., Hirschberg, J. and Nakatani, C. All talk and all action. In CHI98, 249--250,1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Whittaker, S., Hirschberg, J. and Nakatani, C. Play it again: a study of the factors underlying speech browsing behaviour. In CHI98, 247--248,1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Whittaker, S., Hyland, P, and Wiley, M. Filochat: handwritten notes provide access to recorded conversations. In CHI94, 271--277, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Whittaker, S. and Sidner, C. Email overload: exploring personal information management of email. http://www.research.att.com/~stevew/emlch96.pdf In CHI'96 276-283, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wilcox, L. Chen, F., Kimber, D. and Balasubramanian, V. Segmentation of Speech Using Speaker Identification. Proc. ICASSP, 1994.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SCANMail: a voicemail interface that makes speech browsable, readable and searchable

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CHI '02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
          April 2002
          478 pages
          ISBN:1581134533
          DOI:10.1145/503376
          • Conference Chair:
          • Dennis Wixon

          Copyright © 2002 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 April 2002

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          CHI '02 Paper Acceptance Rate61of414submissions,15%Overall Acceptance Rate6,199of26,314submissions,24%

          Upcoming Conference

          CHI '24
          CHI Conference on Human Factors in Computing Systems
          May 11 - 16, 2024
          Honolulu , HI , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader