Article

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Authors:
Steve Whittaker

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Julia Hirschberg

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Brian Amento

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Litza Stark

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Michiel Bacchiani

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Philip Isenhour

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Larry Stead

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Gary Zamchick

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

,
Aaron Rosenberg

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

CHI '02: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsApril 2002Pages 275–282https://doi.org/10.1145/503376.503426

Published:20 April 2002Publication History

CHI '02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 275–282

ABSTRACT

Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly

References

Arons, B. Interactively skimming speech. Unpublished PhD thesis, MIT Media Lab, 1994. Google ScholarDigital Library
Askwall, S. Computer supported reading vs reading text on paper: a comparison of two reading situations, International Journal of Man Machine Studies, 22, 425--439, 1985.Google ScholarCross Ref
Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. An Interactive Comic Book Presentation for Exploring Video. In CHI2000, 185--192, 2000. Google ScholarDigital Library
Chalfonte, B., Fish, R., and Kraut, R. Expressive richness. In CHI91, 21--26, 1991. Google ScholarDigital Library
Degen, L., Mander, R., and Salomon, G. Working with audio. In CHI92, 413--418, 1992. Google ScholarDigital Library
Hauptmann and Witbrock, M. Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval, In M. Maybury (Ed.), Intelligent Multimedia Information Retrieval, AAAI Press, pp. 213--239, 1997. Google ScholarDigital Library
Hindus, D., Schmandt, C., and Horner, C. Capturing, structuring and representing ubiquitous audio. ACM Transactions on Information Systems, 11, 1993. Google ScholarDigital Library
Hirschberg, J., Bacchiani, M., Hindle, D., Isenhour, P., Rosenberg, A., Stark, L., Stead, L., Zamchick, G., and Whittaker, S. SCANMail: Browsing and Searching Speech Data by Content, Proceedings of Eurospeech 2001, Aalborg, 2001.Google Scholar
Hirschberg, J. and Nakatani, C. Acoustic indicators of topic segmentation. In ICSLP98, 1998.Google Scholar
Jones, G., Foote, J., Spärck Jones, K., and Young, S. Retrieving Spoken Documents by Combining Multiple Index Sources, In SIGIR96, 30--38, 1996. Google ScholarDigital Library
Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M. Four paradigms for indexing videoconferences. In IEEE Multimedia, 3(1), 63--73, 1996. Google ScholarDigital Library
Moran, T., Palen, L., Harrison, S., Chiu, P., Kimber, D., Minneman, S., van Melle, W., and Zellweger, P. "I'll get that off the audio": salvaging in a multimedia meeting. In CHI97, 202--209, 1997. Google ScholarDigital Library
Rice R. and Shook, D. Voice messaging coordination and communication. In C. Egido, J. Galegher and R. Kraut, eds., Intellectual Teamwork, Lawrence Erlbaum, NJ, 1990. Google ScholarDigital Library
Rice, R.E., & Tyler, J. (1995). Individual and organizational influences on voicemail use and evaluation. Behaviour and Information Technology, 14(6), 329--341.Google Scholar
Salton, G. The SMART Retrieval System, Prentice-Hall, Englewood Cliffs, NJ, 1971.Google Scholar
Stark, L., Whittaker, S., and Hirschberg, J. ASR satisficing: the effects of ASR accuracy on speech retrieval. In Proceedings of International Conference on Spoken Language Processing, 2000.Google Scholar
Stifelman, L., Arons, B., and Schmandt, C. The audio notebook: paper and pen interaction with structured speech. In CHI2001, 182--189, 2001. Google ScholarDigital Library
Whittaker, S., Davis, R., Hirschberg, J., and Muller, U. Jotmail: a voicemail interface that enables you to see what was said. In CHI2000, 89-96, 2000. Google ScholarDigital Library
Whittaker, S., Hirschberg, J., Choi, J., Hindle, D., Pereira, F., and Singhal, A. SCAN: designing and evaluating user interfaces to support retrieval from speech archives. In SIGIR99, 26--33, 1999. Google ScholarDigital Library
Whittaker, S., Hirschberg, J. and Nakatani, C. All talk and all action. In CHI98, 249--250,1998. Google ScholarDigital Library
Whittaker, S., Hirschberg, J. and Nakatani, C. Play it again: a study of the factors underlying speech browsing behaviour. In CHI98, 247--248,1998. Google ScholarDigital Library
Whittaker, S., Hyland, P, and Wiley, M. Filochat: handwritten notes provide access to recorded conversations. In CHI94, 271--277, 1994. Google ScholarDigital Library
Whittaker, S. and Sidner, C. Email overload: exploring personal information management of email. http://www.research.att.com/~stevew/emlch96.pdf In CHI'96 276-283, 1996. Google ScholarDigital Library
Wilcox, L. Chen, F., Kimber, D. and Balasubramanian, V. Segmentation of Speech Using Speaker Identification. Proc. ICASSP, 1994.Google ScholarCross Ref

Index Terms

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Recommendations

Jotmail: a voicemail interface that enables you to see what was said
CHI '00: Proceedings of the SIGCHI conference on Human Factors in Computing Systems

Voicemail is a pervasive, but under-researched tool for workplace communication. Despite potential advantages of voicemail over email, current phone-based voicemail UIs are highly problematic for users. We present a novel, Web-based, voicemail interface,...
Read More
Automated message prioritization: making voicemail retrieval more efficient
CHI EA '02: CHI '02 Extended Abstracts on Human Factors in Computing Systems

Navigating through new voicemall messages to find messages of interest is a time-consuming task, particularly for high-volume users. When checking messages under a time contraint (e.g., during a brief meeting break), users need to identify those ...
Read More
Automatic summarization of voicemail messages using lexical and prosodic features

This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2002
478 pages
ISBN:1581134533
DOI:10.1145/503376
Conference Chair:
Dennis Wixon
Microsoft Corporation, One Redmond WA
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
"speech as data"
asynchronous communication
empirical evaluation
speech access
voicemail
what you see is almost what you hear
Qualifiers
- Article
Conference

Acceptance Rates
CHI '02 Paper Acceptance Rate61of414submissions,15%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 1,108
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

CHI '02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Jotmail: a voicemail interface that enables you to see what was said

Automated message prioritization: making voicemail retrieval more efficient

Automatic summarization of voicemail messages using lexical and prosodic features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SCANMail: a voicemail interface that makes speech browsable, readable and searchable

CHI '02: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Jotmail: a voicemail interface that enables you to see what was said

Automated message prioritization: making voicemail retrieval more efficient

Automatic summarization of voicemail messages using lexical and prosodic features

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media