ABSTRACT
Retrospective Think aloud (RTA) is a usability method that collects the verbalization of a user's performance after the performance is over. There has been little work done to investigate the validity and reliability of RTA. This paper reports on an experiment investigating these issues with a form of the method called stimulated RTA. By comparing subjects' verbalizations with their eye movements, we support the validity and reliability of stimulated RTA: the method provides a valid account of what people attended to in completing tasks, it has a low risk of introducing fabrications, and its validity isn't affected by task complexity. More detailed analysis of RTA shows that it also provides additional information about user's inferences and strategies in completing tasks. The findings of this study provide valuable support for usability practitioners to use RTA and to trust the users' performance information collected by this method in a usability study.
- Bell, B., et al. Usability testing of a graphical programming system: things we missed in a programming walkthrough. In Proc. CHI'91. ACM Press (1991), 7--12. Google ScholarDigital Library
- Bowers, V.A.&H.L. Snyder. Concurrent versus Retrospective Verbal Protocol for Comparing Window Usability. In Proc. of the Human Factors Society 34th Annual Meeting. (1990), 1270--1274.Google ScholarCross Ref
- Branch, J.L. Investigating the Information-Seeking Processes of Adolescents: The Value of Using Think Alouds and Think Afters. Library & Information Science Research. 22,4 (2000), 371--392.Google ScholarCross Ref
- Campbell, D.J. Task Complexity: A review and analysis. The Academy of Management Review. 13,1 (1988), 40--52.Google ScholarCross Ref
- Capra, M.G. Contemporaneous versus Retrospective User-Reported Critical Incidents in Usability Evaluation. In Proc. of Human Factors Society, 46 th Annual Meeting. (2002), 1973--1977.Google Scholar
- Card, S.K., et al. Information scent as a driver of web behavior graphs: results of a protocol analysis method for web usability. In Proc. CHI'01. ACM Press (2001), 498--505. Google ScholarDigital Library
- Choi, B., et al. A Qualitative Cross-National Study of Cultural Influences on Mobile Data Service Design. In Proc. CHI 2005. ACM Press (2005), 661--670. Google ScholarDigital Library
- Ericsson, K.A.&H.A. Simon, Protocol analysis: Verbal Reports as Data. 1993: Cambridge, MA: MIT Press.Google Scholar
- Gapra, M.G. Comtemporaneous versus Retrospective User-reported Critical Incidents in Usability Evaluation. In Proceedings of the Human Factors and Ergonomics Society, 46th Annual Meeting. (2002), 1973--1977.Google Scholar
- Geiselman, R.E.&F.S. Bellezza. Eye-movements and overt rehearsal in word recall. Journal of Experimental Psychology: Human Learning and Memory. 3,3 (1977), 305--315.Google ScholarCross Ref
- Gero, J.S.&H.-h. Tang. Differences between retrospective and concurrent protocols in revealing the process-oriented aspects of the design process. Design Studies. 21,3 (2001), 283--295.Google Scholar
- Goldberg, J.H.&A.M. Wichansky, Eye tracking in usability evlauation: A practitioner's guide., in The Mind's Eyes: Cognitive and Applied Aspects of Eye Movements, R. Radach, et al., Editors.(2003), Elsevier Science: Oxford. 493--516.Google Scholar
- Gray, W.D.&M.C. Salzman. Damaged merchandise? A review of experiments that compare usability evaluation methods. Human-Computer Interaction. 13,3 (1998), 203--261. Google ScholarDigital Library
- Kensing, F. Prompted Reflections: A Technique for Understanding complex work. Interactions. Jan-Feb.,(1998), 7--15. Google ScholarDigital Library
- Kjeldskov, J.&M.B. Skov. Creating Realistic Laboratory Settings: Comparative Studies of Three Think aloud Usability Evaluations of a Mobile System. In Proc. of the 9th IFIP TC13 INTERACT 2003. (2003), 663 -- 670.Google Scholar
- Mankoff, J., et al. Is Your Web Page Accessible? A Comparative Study of Methods for Assessing Web Page Accessibility for the Blind. In Proc. of CHI'05. ACM Press (2005), 41--50. Google ScholarDigital Library
- Nielson, J., Usability Engineering. 1993: Cambridge, MA: AP Professional.Google Scholar
- Page, C.&M. Rahimi. Concurrent and Retrospective Verbal Protocols in Usability Testing: Is There Value Added In Collecting Both? In Proc. of the Human Factors and Ergonomics Society, 39th Annual Meeting. (1995), 223--227.Google ScholarCross Ref
- Preece, J., Human-Computer Interaction. 1994: Addison-Wesley, England. Google ScholarDigital Library
- Preece, J., et al., Interaction Design: Beyond Human-Computer Interaction. 2002: John Wiley & Sons. Google ScholarDigital Library
- Ramey, J., et al., Adaptation of an Ethnographic Method for Investigation the Task Domain in Diagnostic Radiology, in A Field Methods Casebook for Software Design, e. D. Wixon and J. Ramey, Editor.(1996), John Wiley and Sons. 1--15. Google ScholarDigital Library
- Rhenius, D.&G. Deffner. Evaluation of Concurrent Thinking Aloud using Eye-tracking Data. Proc. of the Human Factors and Ergonomics Society 34th Annual Meeting. (1990), 1265--1269.Google ScholarCross Ref
- Rowley, D.E. Usability Testing in the field: bringing the laboratory to the user. In Proc. CHI'94. ACM Press (1994), 252 -- 257. Google ScholarDigital Library
- Russo, J.E., et al. The Validity of Verbal Protocols. Memory and Cognition. 17,6 (1989), 759--769.Google ScholarCross Ref
- Sankoff, D.&J.B. Kruskal, An overview of sequence comparison, in Time Warps, String Edits, and Macro-Molecules: The Theory and Practice of Sequence Comparison.(1983), Addison-Wesley.Google Scholar
- Soukoreff, R.W.&I.S. MacKenzie. Measuring errors in text entry tasks: An application of the Levenshtein string distance statistic. In Proc. CHI'01. ACM Press (2001), 319--320. Google ScholarDigital Library
- St. Amant, R.&M.O. Riedl. A perception/action substrate for cognitive modeling in HCI. International Journal of Human-Computer Studies. 55,1 (2001), 15--39.Google ScholarDigital Library
- Suwa, M.&B. Tversky. What architects see in their sketches: implications for design tools. In Proc. CHI'96. ACM Press (1996), 191--192. Google ScholarDigital Library
- Teague, R., et al. Concurrent vs. Post-Task Usability Test Ratings. In Proc. CHI'01. ACM Press (2001), 289--290. Google ScholarDigital Library
- Van den Haak, M.J., et al. Retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue. Behaviour& Information Technology. 22,5 (2003), 339--351.Google ScholarCross Ref
- Waes, L.V. Thinking Aloud as a Method for Testing the Usability of Websites: The influence of Task Variation on the Evaluation of Hypertext. IEEE Transactions on Professional Communication. 43,3 (2000), 279--291.Google Scholar
- Williams, T.R., et al. Does Isolating a Visual Element Call Attention to It? Results of an Eye-tracking Investigation of the Effects of Isolation on Emphasis. Technical Communication. 52,1 (2005), 21--26.Google Scholar
- Wood, R.E. Task Complexity: Definition of the construct. Organizational Behavior and Human Decision Processes. 37,(1986), 60--82.Google ScholarCross Ref
Index Terms
- The validity of the stimulated retrospective think-aloud method as measured by eye tracking
Recommendations
Retrospective think-aloud method: using eye movements as an extra cue for participants' verbalizations
CHI '11: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsThe retrospective think-aloud method, in which participants work in silence and verbalize their thoughts afterwards while watching a recording of their performance, is often used for the evaluation of websites. However, participants may not always be ...
Does think aloud work?: how do we know?
CHI EA '06: CHI '06 Extended Abstracts on Human Factors in Computing SystemsThe think aloud method is widely used in usability research to collect user's reports of the experience of interacting with a design so that usability evaluators can find the underlying usability problems. However, concerns remain about the validity and ...
Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsWe describe an empirical, between-subjects study on the use of think-aloud protocols in usability testing of a federal data-dissemination Web site. This double-blind study used three different types of think-aloud protocols: a traditional protocol, a ...
Comments