research-article

Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision

Authors:
Advait Sarkar

Microsoft Research Cambridge & University of Cambridge, Cambridge, United Kingdom

Microsoft Research Cambridge & University of Cambridge, Cambridge, United Kingdom
View Profile

,
Cecily Morrison

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Jonas F. Dorn

Novartis Pharma AG, Basel, Switzerland

Novartis Pharma AG, Basel, Switzerland
View Profile

,
Rishi Bedi

Novartis Pharma AG & Stanford University, Basel, Switzerland

Novartis Pharma AG & Stanford University, Basel, Switzerland
View Profile

,
Saskia Steinheimer

Inselspital, Bern University Hospital, Bern, Switzerland

Inselspital, Bern University Hospital, Bern, Switzerland
View Profile

,
Jacques Boisvert

Novartis Pharma AG, Basel, Switzerland

Novartis Pharma AG, Basel, Switzerland
View Profile

,
Jessica Burggraaff

VU University Medical Center, Amsterdam, Netherlands

VU University Medical Center, Amsterdam, Netherlands
View Profile

,
Marcus D'Souza

University Hospital Basel, Basel, Switzerland

University Hospital Basel, Basel, Switzerland
View Profile

,
Peter Kontschieder

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Samuel Rota Bulò

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Lorcan Walsh

Norvatis Pharma AG, Basel, Switzerland

Norvatis Pharma AG, Basel, Switzerland
View Profile

,
Christian P. Kamm

University Hospital Bern, Bern, Switzerland

University Hospital Bern, Bern, Switzerland
View Profile

,
Yordan Zaykov

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Abigail Sellen

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Siân Lindley

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing SystemsMay 2016Pages 261–271https://doi.org/10.1145/2858036.2858199

Published:07 May 2016Publication History

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

Pages 261–271

ABSTRACT

A growing number of domains, including affect recognition and movement analysis, require a single, real number ground truth label capturing some property of a video clip. We term this the provision of continuum labels. Unfortunately, there is often an uncacceptable trade-off between label consistency and the efficiency of the labelling process with current tools. We present a novel interaction technique, setwise comparison, which leverages the intrinsic human capability for consistent relative judgements and the TrueSkill algorithm to solve this problem. We describe SorTable, a system demonstrating this technique. We conducted a real-world study where clinicians labelled videos of patients with multiple sclerosis for the ASSESS MS computer vision system. In assessing the efficiency-consistency trade-off of setwise versus pairwise comparison, we demonstrated that not only is setwise comparison more efficient, but it also elicits more consistent labels. We further consider how our findings relate to the interactive machine learning literature.

Supplemental Material

pn918.mp4

mp4

33.6 MB

Download

References

2016. TrueSkill Python Code. http://trueskill.org/. (2016). Accessed: Friday 8th January, 2016.Google Scholar
C.K. Abbey and M.P. Eckstein. 2002. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments. Journal of vision 2, 1 (2002), 66--78.Google ScholarCross Ref
S. Afzal and P. Robinson. 2014. Emotion Data Collection and Its Implications for Affective Computing. In The Oxford Handbook of Affective Computing. 359--369.Google Scholar
K. Ali, D. Hasler, and F. Fleuret. 2011. Flowboost -- appearance learning from sparsely annotated video. In IEEE computer vision and pattern recognition (CVPR). Google ScholarDigital Library
Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney S Tan. 2011. Effective End-User Interaction with Machine Learning. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011), 1529--1532. Google ScholarDigital Library
Paul N Bennett, David Maxwell Chickering, and Anton Mityagin. 2009. Learning consensus opinion: mining data from a labeling game. In Proceedings of the 18th international conference on World wide web. ACM, 121--130. Google ScholarDigital Library
R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J.D. Cohen. 2006. The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological review 113, 4 (2006), 700.Google Scholar
RA Bradley. 1952. Rank Analysis of Incomplete Block Designs: The Method of Paired Comparisons. Biometrika 39 (1952), 324--345.Google Scholar
Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. Journal of Artificial Intelligence Research (1999), 131--167.Google Scholar
Ben Carterette, Paul N. Bennett, David Maxwell Chickering, and Susan T. Dumais. 2008. Here or there preference judgments for relevance. Lecture Notes in Computer Science 4956 LNCS (2008), 16--27. DOI: http://dx.doi.org/10.1007/978--3--540--78646--7{_}5 Google ScholarDigital Library
Jeffrey A Cohen, Stephen C Reingold, Chris H Polman, Jerry S Wolinsky, International Advisory Committee on Clinical Trials in Multiple Sclerosis, and others. 2012. Disability outcome measures in multiple sclerosis clinical trials: current status and future prospects. The Lancet Neurology 11, 5 (2012), 467--476.Google ScholarCross Ref
R. Cowie, S. Douglas-Cowie, E. Savvidou, E. McMahon, M. Sawey, and M. Schröder. 2000. 'FEELTRACE': An instrument for recording perceived emotion in real time.. In ISCA tutorial and research workshop (ITRW) on speech and emotion.Google Scholar
Jerry Alan Fails and Dan R. Olsen. 2003. Interactive machine learning. Proceedings of the 8th international conference on Intelligent user interfaces IUI '03 (2003), 39. DOI:http://dx.doi.org/10.1145/604050.604056Google ScholarCross Ref
James Fogarty, Desney S Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems CHI '08 (2008), 29. DOI: http://dx.doi.org/10.1145/1357054.1357061 Google ScholarDigital Library
Simon Fothergill, Robert Harle, and Sean Holden. 2008. Modeling the model athlete: Automatic coaching of rowing technique. In Structural, Syntactic, and Statistical Pattern Recognition. Springer, 372--381. Google ScholarDigital Library
B. Frénay and M. Verleysen. 2014. Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems 25, 5 (2014), 845--869.Google ScholarCross Ref
Alex Groce, Todd Kulesza, Chaoqiang Zhang, Shalini Shamasunder, Margaret Burnett, Weng-Keen Wong, Simone Stumpf, Shubhomoy Das, Amber Shinsel, Forrest Bice, and Kevin McIntosh. 2014. You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems. IEEE Transactions on Software Engineering 40, 3 (2014), 307--323. DOI:http://dx.doi.org/10.1109/TSE.2013.59 Google ScholarDigital Library
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Advances in psychology 52 (1988), 139--183.Google Scholar
R. D. Hays, R. Anderson, and D. Revicki. 1993. Psychometric considerations in evaluating health-related quality of life measures. Quality of Life Research 2, 6 (dec 1993), 441--449. http://link.springer.com/article/10.1007/BF00422218Google ScholarCross Ref
Ralf Herbrich, Tom Minka, and Thore Graepel. TrueSkill(TM): A Bayesian Skill Rating System. In Advances in Neural Information Processing Systems (NIPS2006). 2006.Google Scholar
P. G. Ipeirotis, F. Provost, V. S. Sheng, and J. Wang. 2014. Repeated labeling using multiple noisy labelers. Data Mining and Knowledge Discovery 28, 2 (2014), 402--441. Google ScholarDigital Library
Christian P Kamm, Bernard MJ Uitdehaag, and Chris H Polman. 2014. Multiple sclerosis: current knowledge and future outlook. European neurology 72, 3--4 (2014), 132--141.Google Scholar
Peter Kontschieder, Jonas F Dorn, Cecily Morrison, Robert Corish, Darko Zikic, Abigail Sellen, Marcus D'Souza, Christian P Kamm, Jessica Burggraaff, Prejaas Tewarie, and others. 2014. Quantifying Progression of Multiple Sclerosis via Classification of Depth Videos. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2014. Springer, 429--437.Google ScholarCross Ref
S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Informatica 31 (2007), 249--268.Google Scholar
Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. Proceedings of the 32nd annual ACM conference on Human factors in computing systems CHI '14 (2014), 3075--3084. DOI: http://dx.doi.org/10.1145/2556288.2557238 Google ScholarDigital Library
John F Kurtzke. 1983. Rating neurologic impairment in multiple sclerosis an expanded disability status scale (EDSS). Neurology 33, 11 (1983), 1444--1444.Google ScholarCross Ref
Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning Realistic Human Actions from Movies. In IEEE conference on computer vision and pattern recognition CVPR. 1--8.Google ScholarCross Ref
Walter S Lasecki, Mitchell Gordon, Steven P Dow, and Jeffrey P Bigham. 2014. Glance : Rapidly Coding Behavioral Video with the Crowd. In Proceedings of UIST'14. 1--11. Google ScholarDigital Library
Dan Lockton, David Harrison, and Neville Stanton. 2008. Design with Intent: Persuasive Technology in a Wider Context. In Persuasive Technology. Springer Berlin Heidelberg, Berlin, Heidelberg, 274--278. DOI: http://dx.doi.org/10.1007/978--3--540--68504--3{_}30 Google ScholarDigital Library
Kenneth O McGraw and Seok P Wong. 1996. Forming inferences about some intraclass correlation coefficients. Psychological methods 1, 1 (1996), 30.Google Scholar
G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2012. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent. IEEE Transactions on Affective Computing 3, 1 (Jan 2012), 5--17. DOI: http://dx.doi.org/10.1109/T-AFFC.2011.20 Google ScholarDigital Library
F. Metze, D. Ding, E. Younessian, and A. Hauptmann. 2013. Beyond audio and video retrieval: topic-oriented multimedia summarization. International Journal of Multimedia Information Retrieval 2, 2 (2013), 131--144.Google ScholarCross Ref
C. Morrison, K. Huckvale, B. Corish, J. Dorn, P. Kontschieder, K. O'Hara, ASSESS MS Team, A. Criminisi, and A. Sellen. 2016. Assessing Multiple Sclerosis with Kinect: Designing Computer Vision Systems for Real-World Use. To appear in Human-Computer Interaction (2016). http://research. microsoft.com/apps/pubs/default.aspx?id=255951 Google ScholarDigital Library
JH Noseworthy, MK Vandervoort, CJ Wong, and GC Ebers. 1990. Interrater variability with the Expanded Disability Status Scale (EDSS) and Functional Systems (FS) in a multiple sclerosis clinical trial. Neurology 40, 6 (1990), 971--971.Google ScholarCross Ref
Advait Sarkar, Mateja Jamnik, Alan F. Blackwell, and Martin Spott. 2015. Interactive visual machine learning in spreadsheets. In Visual Languages and Human-Centric Computing (VL/HCC), 2015 IEEE Symposium on. IEEE, 159--163.Google ScholarCross Ref
LL Thurstone. 1927. A law of comparative judgment. Psychol Rev 34 (1927), 273--286.Google ScholarCross Ref
Job Van Exel and Gjalt de Graaf. 2005. Q methodology: A sneak preview. http://www.qmethodology.net/PDF/Q-methodology. (2005). Accessed: Friday 8th January, 2016.Google Scholar
Carl Vondrick, Donald Patterson, and Deva Ramanan. 2013. Efficiently scaling up crowdsourced video annotation: A set of best practices for high quality, economical video labeling. International Journal of Computer Vision 101, 1 (2013), 184--204. DOI: http://dx.doi.org/10.1007/s11263-012-0564--1 Google ScholarDigital Library
Y. Yan, R. Rosales, G. Fung, M. W. Schmidt, G. H. Valadez, L. Bogoni, L Moy, and J. G. Dy. 2010. Modeling annotator expertise: Learning when everybody knows a bit of something. (pp. 932--939).. In International conference on artificial intelligence and statistics. 932--939.Google Scholar

Index Terms

Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision
1. Human-centered computing

Recommendations

Structured labeling for facilitating concept evolution in machine learning
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Labeling data is a seemingly simple task required for training many machine learning systems, but is actually fraught with problems. This paper introduces the notion of concept evolution, the changing nature of a person's underlying concept (the ...
Read More
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Read More
A study on zero-shot learning from semantic viewpoint
Abstract
Recognition of unseen object class by a human being is always based on the relationship between seen and unseen classes, given that human has some background knowledge of the unseen object class. Zero-shot learning is a learning paradigm that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems
May 2016
6108 pages
ISBN:9781450333627
DOI:10.1145/2858036
General Chairs:
Jofish Kaye
Yahoo
,
Allison Druin
University of Maryland / National Park Service
,
Program Chairs:
Cliff Lampe
University of Michigan
,
Dan Morris
Microsoft
,
Juan Pablo Hourcade
University of Iowa
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer vision
continuum labels
health
interactive machine learning
machine learning
setwise comparison
video media
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '16 Paper Acceptance Rate565of2,435submissions,23%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 454
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Setwise Comparison: Consistent, Scalable, Continuum Labels for Computer Vision

CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Structured labeling for facilitating concept evolution in machine learning

Transductive Multilabel Learning via Label Set Propagation

A study on zero-shot learning from semantic viewpoint