ABSTRACT
Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail's support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly
- Arons, B. Interactively skimming speech. Unpublished PhD thesis, MIT Media Lab, 1994. Google ScholarDigital Library
- Askwall, S. Computer supported reading vs reading text on paper: a comparison of two reading situations, International Journal of Man Machine Studies, 22, 425--439, 1985.Google ScholarCross Ref
- Boreczky, J., Girgensohn, A., Golovchinsky, G., and Uchihashi, S. An Interactive Comic Book Presentation for Exploring Video. In CHI2000, 185--192, 2000. Google ScholarDigital Library
- Chalfonte, B., Fish, R., and Kraut, R. Expressive richness. In CHI91, 21--26, 1991. Google ScholarDigital Library
- Degen, L., Mander, R., and Salomon, G. Working with audio. In CHI92, 413--418, 1992. Google ScholarDigital Library
- Hauptmann and Witbrock, M. Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval, In M. Maybury (Ed.), Intelligent Multimedia Information Retrieval, AAAI Press, pp. 213--239, 1997. Google ScholarDigital Library
- Hindus, D., Schmandt, C., and Horner, C. Capturing, structuring and representing ubiquitous audio. ACM Transactions on Information Systems, 11, 1993. Google ScholarDigital Library
- Hirschberg, J., Bacchiani, M., Hindle, D., Isenhour, P., Rosenberg, A., Stark, L., Stead, L., Zamchick, G., and Whittaker, S. SCANMail: Browsing and Searching Speech Data by Content, Proceedings of Eurospeech 2001, Aalborg, 2001.Google Scholar
- Hirschberg, J. and Nakatani, C. Acoustic indicators of topic segmentation. In ICSLP98, 1998.Google Scholar
- Jones, G., Foote, J., Spärck Jones, K., and Young, S. Retrieving Spoken Documents by Combining Multiple Index Sources, In SIGIR96, 30--38, 1996. Google ScholarDigital Library
- Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M. Four paradigms for indexing videoconferences. In IEEE Multimedia, 3(1), 63--73, 1996. Google ScholarDigital Library
- Moran, T., Palen, L., Harrison, S., Chiu, P., Kimber, D., Minneman, S., van Melle, W., and Zellweger, P. "I'll get that off the audio": salvaging in a multimedia meeting. In CHI97, 202--209, 1997. Google ScholarDigital Library
- Rice R. and Shook, D. Voice messaging coordination and communication. In C. Egido, J. Galegher and R. Kraut, eds., Intellectual Teamwork, Lawrence Erlbaum, NJ, 1990. Google ScholarDigital Library
- Rice, R.E., & Tyler, J. (1995). Individual and organizational influences on voicemail use and evaluation. Behaviour and Information Technology, 14(6), 329--341.Google Scholar
- Salton, G. The SMART Retrieval System, Prentice-Hall, Englewood Cliffs, NJ, 1971.Google Scholar
- Stark, L., Whittaker, S., and Hirschberg, J. ASR satisficing: the effects of ASR accuracy on speech retrieval. In Proceedings of International Conference on Spoken Language Processing, 2000.Google Scholar
- Stifelman, L., Arons, B., and Schmandt, C. The audio notebook: paper and pen interaction with structured speech. In CHI2001, 182--189, 2001. Google ScholarDigital Library
- Whittaker, S., Davis, R., Hirschberg, J., and Muller, U. Jotmail: a voicemail interface that enables you to see what was said. In CHI2000, 89-96, 2000. Google ScholarDigital Library
- Whittaker, S., Hirschberg, J., Choi, J., Hindle, D., Pereira, F., and Singhal, A. SCAN: designing and evaluating user interfaces to support retrieval from speech archives. In SIGIR99, 26--33, 1999. Google ScholarDigital Library
- Whittaker, S., Hirschberg, J. and Nakatani, C. All talk and all action. In CHI98, 249--250,1998. Google ScholarDigital Library
- Whittaker, S., Hirschberg, J. and Nakatani, C. Play it again: a study of the factors underlying speech browsing behaviour. In CHI98, 247--248,1998. Google ScholarDigital Library
- Whittaker, S., Hyland, P, and Wiley, M. Filochat: handwritten notes provide access to recorded conversations. In CHI94, 271--277, 1994. Google ScholarDigital Library
- Whittaker, S. and Sidner, C. Email overload: exploring personal information management of email. http://www.research.att.com/~stevew/emlch96.pdf In CHI'96 276-283, 1996. Google ScholarDigital Library
- Wilcox, L. Chen, F., Kimber, D. and Balasubramanian, V. Segmentation of Speech Using Speaker Identification. Proc. ICASSP, 1994.Google ScholarCross Ref
Index Terms
- SCANMail: a voicemail interface that makes speech browsable, readable and searchable
Recommendations
Jotmail: a voicemail interface that enables you to see what was said
CHI '00: Proceedings of the SIGCHI conference on Human Factors in Computing SystemsVoicemail is a pervasive, but under-researched tool for workplace communication. Despite potential advantages of voicemail over email, current phone-based voicemail UIs are highly problematic for users. We present a novel, Web-based, voicemail interface,...
Automated message prioritization: making voicemail retrieval more efficient
CHI EA '02: CHI '02 Extended Abstracts on Human Factors in Computing SystemsNavigating through new voicemall messages to find messages of interest is a time-consuming task, particularly for high-volume users. When checking messages under a time contraint (e.g., during a brief meeting break), users need to identify those ...
Automatic summarization of voicemail messages using lexical and prosodic features
This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words ...
Comments