skip to main content
10.1145/2858036.2858199acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision

Authors Info & Claims
Published:07 May 2016Publication History

ABSTRACT

A growing number of domains, including affect recognition and movement analysis, require a single, real number ground truth label capturing some property of a video clip. We term this the provision of continuum labels. Unfortunately, there is often an uncacceptable trade-off between label consistency and the efficiency of the labelling process with current tools. We present a novel interaction technique, setwise comparison, which leverages the intrinsic human capability for consistent relative judgements and the TrueSkill algorithm to solve this problem. We describe SorTable, a system demonstrating this technique. We conducted a real-world study where clinicians labelled videos of patients with multiple sclerosis for the ASSESS MS computer vision system. In assessing the efficiency-consistency trade-off of setwise versus pairwise comparison, we demonstrated that not only is setwise comparison more efficient, but it also elicits more consistent labels. We further consider how our findings relate to the interactive machine learning literature.

Skip Supplemental Material Section

Supplemental Material

pn918.mp4

mp4

33.6 MB

References

  1. 2016. TrueSkill Python Code. http://trueskill.org/. (2016). Accessed: Friday 8th January, 2016.Google ScholarGoogle Scholar
  2. C.K. Abbey and M.P. Eckstein. 2002. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments. Journal of vision 2, 1 (2002), 66--78.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Afzal and P. Robinson. 2014. Emotion Data Collection and Its Implications for Affective Computing. In The Oxford Handbook of Affective Computing. 359--369.Google ScholarGoogle Scholar
  4. K. Ali, D. Hasler, and F. Fleuret. 2011. Flowboost -- appearance learning from sparsely annotated video. In IEEE computer vision and pattern recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney S Tan. 2011. Effective End-User Interaction with Machine Learning. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011), 1529--1532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paul N Bennett, David Maxwell Chickering, and Anton Mityagin. 2009. Learning consensus opinion: mining data from a labeling game. In Proceedings of the 18th international conference on World wide web. ACM, 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J.D. Cohen. 2006. The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological review 113, 4 (2006), 700.Google ScholarGoogle Scholar
  8. RA Bradley. 1952. Rank Analysis of Incomplete Block Designs: The Method of Paired Comparisons. Biometrika 39 (1952), 324--345.Google ScholarGoogle Scholar
  9. Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research (1999), 131--167.Google ScholarGoogle Scholar
  10. Ben Carterette, Paul N. Bennett, David Maxwell Chickering, and Susan T. Dumais. 2008. Here or there preference judgments for relevance. Lecture Notes in Computer Science 4956 LNCS (2008), 16--27. DOI: http://dx.doi.org/10.1007/978--3--540--78646--7{_}5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeffrey A Cohen, Stephen C Reingold, Chris H Polman, Jerry S Wolinsky, International Advisory Committee on Clinical Trials in Multiple Sclerosis, and others. 2012. Disability outcome measures in multiple sclerosis clinical trials: current status and future prospects. The Lancet Neurology 11, 5 (2012), 467--476.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. Cowie, S. Douglas-Cowie, E. Savvidou, E. McMahon, M. Sawey, and M. Schröder. 2000. 'FEELTRACE': An instrument for recording perceived emotion in real time.. In ISCA tutorial and research workshop (ITRW) on speech and emotion.Google ScholarGoogle Scholar
  13. Jerry Alan Fails and Dan R. Olsen. 2003. Interactive machine learning. Proceedings of the 8th international conference on Intelligent user interfaces IUI '03 (2003), 39. DOI:http://dx.doi.org/10.1145/604050.604056Google ScholarGoogle ScholarCross RefCross Ref
  14. James Fogarty, Desney S Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems CHI '08 (2008), 29. DOI: http://dx.doi.org/10.1145/1357054.1357061 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Simon Fothergill, Robert Harle, and Sean Holden. 2008. Modeling the model athlete: Automatic coaching of rowing technique. In Structural, Syntactic, and Statistical Pattern Recognition. Springer, 372--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Frénay and M. Verleysen. 2014. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems 25, 5 (2014), 845--869.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alex Groce, Todd Kulesza, Chaoqiang Zhang, Shalini Shamasunder, Margaret Burnett, Weng-Keen Wong, Simone Stumpf, Shubhomoy Das, Amber Shinsel, Forrest Bice, and Kevin McIntosh. 2014. You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems. IEEE Transactions on Software Engineering 40, 3 (2014), 307--323. DOI:http://dx.doi.org/10.1109/TSE.2013.59 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Advances in psychology 52 (1988), 139--183.Google ScholarGoogle Scholar
  19. R. D. Hays, R. Anderson, and D. Revicki. 1993. Psychometric considerations in evaluating health-related quality of life measures. Quality of Life Research 2, 6 (dec 1993), 441--449. http://link.springer.com/article/10.1007/BF00422218Google ScholarGoogle ScholarCross RefCross Ref
  20. Ralf Herbrich, Tom Minka, and Thore Graepel. TrueSkill(TM): A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems (NIPS2006). 2006.Google ScholarGoogle Scholar
  21. P. G. Ipeirotis, F. Provost, V. S. Sheng, and J. Wang. 2014. Repeated labeling using multiple noisy labelers. Data Mining and Knowledge Discovery 28, 2 (2014), 402--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christian P Kamm, Bernard MJ Uitdehaag, and Chris H Polman. 2014. Multiple sclerosis: current knowledge and future outlook. European neurology 72, 3--4 (2014), 132--141.Google ScholarGoogle Scholar
  23. Peter Kontschieder, Jonas F Dorn, Cecily Morrison, Robert Corish, Darko Zikic, Abigail Sellen, Marcus D'Souza, Christian P Kamm, Jessica Burggraaff, Prejaas Tewarie, and others. 2014. Quantifying Progression of Multiple Sclerosis via Classification of Depth Videos. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2014. Springer, 429--437.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Informatica 31 (2007), 249--268.Google ScholarGoogle Scholar
  25. Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. Proceedings of the 32nd annual ACM conference on Human factors in computing systems CHI '14 (2014), 3075--3084. DOI: http://dx.doi.org/10.1145/2556288.2557238 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John F Kurtzke. 1983. Rating neurologic impairment in multiple sclerosis an expanded disability status scale (EDSS). Neurology 33, 11 (1983), 1444--1444.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning Realistic Human Actions from Movies. In IEEE conference on computer vision and pattern recognition CVPR. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  28. Walter S Lasecki, Mitchell Gordon, Steven P Dow, and Jeffrey P Bigham. 2014. Glance : Rapidly Coding Behavioral Video with the Crowd. In Proceedings of UIST'14. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dan Lockton, David Harrison, and Neville Stanton. 2008. Design with Intent: Persuasive Technology in a Wider Context. In Persuasive Technology. Springer Berlin Heidelberg, Berlin, Heidelberg, 274--278. DOI: http://dx.doi.org/10.1007/978--3--540--68504--3{_}30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kenneth O McGraw and Seok P Wong. 1996. Forming inferences about some intraclass correlation coefficients. Psychological methods 1, 1 (1996), 30.Google ScholarGoogle Scholar
  31. G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2012. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent. IEEE Transactions on Affective Computing 3, 1 (Jan 2012), 5--17. DOI: http://dx.doi.org/10.1109/T-AFFC.2011.20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Metze, D. Ding, E. Younessian, and A. Hauptmann. 2013. Beyond audio and video retrieval: topic-oriented multimedia summarization. International Journal of Multimedia Information Retrieval 2, 2 (2013), 131--144.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. Morrison, K. Huckvale, B. Corish, J. Dorn, P. Kontschieder, K. O'Hara, ASSESS MS Team, A. Criminisi, and A. Sellen. 2016. Assessing Multiple Sclerosis with Kinect: Designing Computer Vision Systems for Real-World Use. To appear in Human-Computer Interaction (2016). http://research. microsoft.com/apps/pubs/default.aspx?id=255951 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. JH Noseworthy, MK Vandervoort, CJ Wong, and GC Ebers. 1990. Interrater variability with the Expanded Disability Status Scale (EDSS) and Functional Systems (FS) in a multiple sclerosis clinical trial. Neurology 40, 6 (1990), 971--971.Google ScholarGoogle ScholarCross RefCross Ref
  35. Advait Sarkar, Mateja Jamnik, Alan F. Blackwell, and Martin Spott. 2015. Interactive visual machine learning in spreadsheets. In Visual Languages and Human-Centric Computing (VL/HCC), 2015 IEEE Symposium on. IEEE, 159--163.Google ScholarGoogle ScholarCross RefCross Ref
  36. LL Thurstone. 1927. A law of comparative judgment. Psychol Rev 34 (1927), 273--286.Google ScholarGoogle ScholarCross RefCross Ref
  37. Job Van Exel and Gjalt de Graaf. 2005. Q methodology: A sneak preview. http://www.qmethodology.net/PDF/Q-methodology. (2005). Accessed: Friday 8th January, 2016.Google ScholarGoogle Scholar
  38. Carl Vondrick, Donald Patterson, and Deva Ramanan. 2013. Efficiently scaling up crowdsourced video annotation: A set of best practices for high quality, economical video labeling. International Journal of Computer Vision 101, 1 (2013), 184--204. DOI: http://dx.doi.org/10.1007/s11263-012-0564--1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Yan, R. Rosales, G. Fung, M. W. Schmidt, G. H. Valadez, L. Bogoni, L Moy, and J. G. Dy. 2010. Modeling annotator expertise: Learning when everybody knows a bit of something. (pp. 932--939).. In International conference on artificial intelligence and statistics. 932--939.Google ScholarGoogle Scholar

Index Terms

  1. Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
      May 2016
      6108 pages
      ISBN:9781450333627
      DOI:10.1145/2858036

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 May 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader