research-article

Dissonance Between Human and Machine Understanding

Authors:
Zijian Zhang

Leibniz University of Hannover, Hannover, Germany

Leibniz University of Hannover, Hannover, Germany
View Profile

,
Jaspreet Singh

Leibniz University of Hannover, Hannover, Germany

Leibniz University of Hannover, Hannover, Germany
View Profile

,
Ujwal Gadiraju

Leibniz University of Hannover, Hannover, Germany

Leibniz University of Hannover, Hannover, Germany
View Profile

,
Avishek Anand

Leibniz Universität Hannover, Hannover, Germany

Leibniz Universität Hannover, Hannover, Germany
View Profile

Proceedings of the ACM on Human-Computer Interaction Volume 3 Issue CSCWArticle No.: 56pp 1–23https://doi.org/10.1145/3359158

Published:07 November 2019Publication History

Proceedings of the ACM on Human-Computer Interaction

Abstract

Complex machine learning models are deployed in several critical domains including healthcare and autonomous vehicles nowadays, albeit as functional blackboxes. Consequently, there has been a recent surge in interpreting decisions of such complex models in order to explain their actions to humans. Models which correspond to human interpretation of a task are more desirable in certain contexts and can help attribute liability, build trust, expose biases and in turn build better models. It is therefore crucial to understand how and which models conform to human understanding of tasks. In this paper we present a large-scale crowdsourcing study that reveals and quantifies the dissonance between human and machine understanding, through the lens of an image classification task. In particular, we seek to answer the following questions: Which (well performing) complex ML models are closer to humans in their use of features to make accurate predictions? How does task difficulty affect the feature selection capability of machines in comparison to humans? Are humans consistently better at selecting features that make image recognition more accurate? Our findings have important implications on human-machine collaboration, considering that a long term goal in the field of artificial intelligence is to make machines capable of learning and reasoning like humans.

References

Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y Lim, and Mohan Kankanhalli. 2018. Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 582.Google ScholarDigital Library
Arash Afraz, Daniel LK Yamins, and James J DiCarlo. 2014. Neural mechanisms underlying visual object recognition. In Cold Spring Harbor symposia on quantitative biology, Vol. 79. Cold Spring Harbor Laboratory Press, 99--107.Google Scholar
Avishek Anand, Kilian Bizer, Alexander Erlei, Ujwal Gadiraju, Christian Heinze, Lukas Meub, Wolfgang Nejdl, and Bjoern Steinroetter. 2018. Effects of Algorithmic Decision-Making and Interpretability on Human Behavior: Experiments using Crowdsourcing. In Proceedings of the HCOMP 2018 Works in Progress and Demonstration Papers Track of the sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018), Zurich, Switzerland, July 5--8, 2018.Google Scholar
Mark E Auckland, Kyle R Cave, and Nick Donnelly. 2007. Nontarget objects can influence perceptual processes during object recognition. Psychonomic bulletin & review, Vol. 14, 2 (2007), 332--337.Google Scholar
Shlomo Berkovsky, Ronnie Taib, and Dan Conway. 2017. How to recommend?: User trust factors in movie recommender systems. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 287--300.Google ScholarDigital Library
Irving Biederman. 1985. Human image understanding: Recent research and a theory. Computer vision, graphics, and image processing, Vol. 32, 1 (1985), 29--73.Google Scholar
Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 377, bibinfonumpages14 pages. https://doi.org/10.1145/3173574.3173951Google ScholarDigital Library
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1721--1730.Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 248--255.Google ScholarCross Ref
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. (2017).Google Scholar
Leonidas AA Doumas, Guillermo Puebla, and Andrea E Martin. 2018. Human-like generalization in a machine through predicate learning. arXiv preprint arXiv:1806.01709 (2018).Google Scholar
Michael W Eysenck and Mark T Keane. 2013. Cognitive psychology: A student's handbook .Psychology press.Google Scholar
Gerhard Friedrich and Markus Zanker. 2011. A taxonomy for generating explanations in recommender systems. AI Magazine, Vol. 32, 3 (2011), 90--98.Google ScholarDigital Library
Ujwal Gadiraju, Besnik Fetahu, and Ricardo Kawase. 2015a. Training workers for improving performance in crowdsourcing microtasks. In Design for Teaching and Learning in a Networked World. Springer, 100--114.Google Scholar
Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015b. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1631--1640.Google ScholarDigital Library
Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a worthwhile quality: On the role of task clarity in microtask crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, 5--14.Google ScholarDigital Library
Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. 2018. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems. 7549--7561.Google Scholar
Justin Scott Giboney, Susan A Brown, Paul Benjamin Lowry, and Jay F Nunamaker Jr. 2015. User acceptance of knowledge-based system recommendations: Explanations, arguments, and fit. Decision Support Systems, Vol. 72 (2015), 1--10.Google ScholarDigital Library
Shirley Gregor and Izak Benbasat. 1999. Explanations from intelligent systems: Theoretical foundations and implications for practice. MIS quarterly (1999), 497--530.Google Scholar
Anikó Hannák, Claudia Wagner, David Garcia, Alan Mislove, Markus Strohmaier, and Christo Wilson. 2017. Bias in online freelance marketplaces: Evidence from taskrabbit and fiverr. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 1914--1933.Google ScholarDigital Library
IEEE Global Initiative et al. 2016. Ethically Aligned Design. IEEE Standards v1 (2016).Google Scholar
Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.Google ScholarDigital Library
Tatiana Josephy, Matt Lease, Praveen Paritosh, Markus Krause, Mihai Georgescu, Michael Tjalve, and Daniela Braga. 2014. CrowdScale 2013: Crowdsourcing at Scale Workshop Report. AI Magazine, Vol. 35, 2 (2014), 75--78.Google ScholarCross Ref
Daniel Kahneman. 2003. A perspective on judgment and choice: mapping bounded rationality. American psychologist, Vol. 58, 9 (2003), 697.Google Scholar
Daniel Kahneman, Andrew M Rosenfield, Linnea Gandhi, and Tom Blaser. 2016. Noise: How to overcome the high, hidden cost of inconsistent decision making. Harvard business review, Vol. 94, 10 (2016), 38--46.Google Scholar
Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems. 2280--2288.Google Scholar
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. Human decisions and machine predictions. The quarterly journal of economics, Vol. 133, 1 (2017), 237--293.Google ScholarCross Ref
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).Google ScholarDigital Library
Ranjay A Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A Shamma, Li Fei-Fei, and Michael S Bernstein. 2016. Embracing error to enable rapid crowdsourcing. In Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, 3167--3179.Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google Scholar
Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences, Vol. 40 (2017).Google ScholarCross Ref
Wallace Lawson, Laura Hiatt, and J Trafton. 2014. Leveraging cognitive context for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 381--386.Google ScholarDigital Library
Min Kyung Lee and Su Baykal. 2017. Algorithmic mediation in group decisions: Fairness perceptions of algorithmically mediated vs. discussion-based social division. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 1035--1048.Google ScholarDigital Library
Min Kyung Lee, Daniel Kusbit, Evan Metsky, and Laura Dabbish. 2015. Working with machines: The impact of algorithmic and data-driven management on human workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1603--1612.Google ScholarDigital Library
Benjamin Letham, Cynthia Rudin, Tyler H McCormick, David Madigan, et al. 2015. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, Vol. 9, 3 (2015), 1350--1371.Google ScholarCross Ref
Zachary C Lipton. 2016. The mythos of model interpretability. ICML Workshop on Human Interpretability of Machine Learning (2016).Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google Scholar
Gaspard Monge. 1781. Mémoire sur la théorie des déblais et des remblais. Histoire de l'Académie Royale des Sciences de Paris ( 1781).Google Scholar
David G Myers. 2002. The powers & perils of intuition. Psychology Today, Vol. 35, 6 (2002), 42--52.Google Scholar
Kenya Freeman Oduor and Eric N Wiebe. 2008. The effects of automated decision algorithm modality and transparency on reported trust and task performance. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52. SAGE Publications Sage CA: Los Angeles, CA, 302--306.Google ScholarCross Ref
David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human computation, Vol. 11, 11 (2011).Google ScholarDigital Library
Cathy O'Neill. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Nueva York, NY: Crown Publishing Group (2016).Google Scholar
Alexis Papadimitriou, Panagiotis Symeonidis, and Yannis Manolopoulos. 2012. A generalized taxonomy of explanations styles for traditional and social recommender systems. Data Mining and Knowledge Discovery, Vol. 24, 3 (2012), 555--583.Google ScholarDigital Library
Robin L Plackett. 1975. The analysis of permutations. Applied Statistics (1975), 193--202.Google Scholar
Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice interfaces in everyday life. In proceedings of the 2018 CHI conference on human factors in computing systems. ACM, 640.Google ScholarDigital Library
Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations As Mechanisms for Supporting Algorithmic Transparency. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 103, bibinfonumpages13 pages. https://doi.org/10.1145/3173574.3173677Google ScholarDigital Library
Iyad Rahwan, Manuel Cebrian, Nick Obradovich, Josh Bongard, Jean-Francc ois Bonnefon, Cynthia Breazeal, Jacob W Crandall, Nicholas A Christakis, Iain D Couzin, Matthew O Jackson, et al. 2019. Machine behaviour. Nature, Vol. 568, 7753 (2019), 477.Google Scholar
Rishi Rajalingham, Elias B Issa, Pouya Bashivan, Kohitij Kar, Kailyn Schmidt, and James J DiCarlo. 2018. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, Vol. 38, 33 (2018), 7255--7269.Google ScholarCross Ref
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarDigital Library
Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez. 2017. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 2662--2670. https://doi.org/10.24963/ijcai.2017/371Google ScholarCross Ref
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In Computer Vision, 1998. Sixth International Conference on. IEEE, 59--66.Google ScholarCross Ref
Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J Majaj, Rishi Rajalingham, Elias B Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, et al. 2018. Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? BioRxiv (2018), 407007.Google Scholar
Grace S Shieh. 1998. A weighted Kendall's tau statistic. Statistics & probability letters, Vol. 39, 1 (1998), 17--24.Google Scholar
Hirokazu Shirado and Nicholas A Christakis. 2017. Locally noisy autonomous agents improve global human coordination in network experiments. Nature, Vol. 545, 7654 (2017), 370.Google Scholar
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484.Google Scholar
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of Go without human knowledge. Nature, Vol. 550, 7676 (2017), 354.Google Scholar
Herbert Alexander Simon. 1997. Models of bounded rationality: Empirically grounded economic reason. Vol. 3. MIT press.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Elizabeth Stowell, Mercedes C Lyson, Herman Saksono, Reneé C Wurth, Holly Jimison, Misha Pavel, and Andrea G Parker. 2018. Designing and Evaluating mHealth Interventions for Vulnerable Populations: A Systematic Review. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 15.Google ScholarDigital Library
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning.. In AAAI, Vol. 4. 12.Google ScholarDigital Library
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.Google ScholarCross Ref
Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recommender systems. In Data Engineering Workshop, 2007 IEEE 23rd International Conference on. IEEE, 801--810.Google ScholarDigital Library
Alexandra Vtyurina and Adam Fourney. 2018. Exploring the role of conversational cues in guided task support with virtual assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 208.Google ScholarDigital Library
Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens. 2018. Learning Credible Models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 2417--2426. https://doi.org/10.1145/3219819.3220070Google ScholarDigital Library
Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens. 2018. Learning credible models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2417--2426.Google ScholarDigital Library
Weiquan Wang and Izak Benbasat. 2007. Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems, Vol. 23, 4 (2007), 217--246.Google ScholarDigital Library
William Webber, Alistair Moffat, and Justin Zobel. 2010. A Similarity Measure for Indefinite Rankings. ACM Trans. Inf. Syst., Vol. 28, 4, Article 20 (Nov. 2010), bibinfonumpages38 pages. https://doi.org/10.1145/1852102.1852106Google ScholarDigital Library
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057.Google ScholarDigital Library
Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, Vol. 111, 23 (2014), 8619--8624.Google ScholarCross Ref
Tal Zarsky. 2016. The trouble with algorithmic decisions: An analytic road map to examine efficiency and fairness in automated and opaque decision making. Science, Technology, & Human Values, Vol. 41, 1 (2016), 118--132.Google ScholarCross Ref
Nan-ning Zheng, Zi-yi Liu, Peng-ju Ren, Yong-qiang Ma, Shi-tao Chen, Si-yu Yu, Jian-ru Xue, Ba-dong Chen, and Fei-yue Wang. 2017. Hybrid-augmented intelligence: collaboration and cognition. Frontiers of Information Technology & Electronic Engineering, Vol. 18, 2 (2017), 153--179.Google ScholarCross Ref

Index Terms

Dissonance Between Human and Machine Understanding

Recommendations

Analysis of the User Experience with a Multiperspective Tool for Explainable Machine Learning in Light of Interactive Principles
IHC '21: Proceedings of the XX Brazilian Symposium on Human Factors in Computing Systems

Machine Learning (ML) models have been widely used nowadays, as "magical black boxes", in many different domains and for distinct goals, but the way they generate their results is not fully understood yet, including by knowledgeable users. If users ...
Read More
A human-centred intelligent system framework: meta-synthetic engineering

From the viewpoint of knowledge and intelligence, to bridge data mining and agents, this paper deal with an efficient way that is building intelligent systems by means of meta-synthesis proposed by Chinese scientists, which is used multiple human ...
Read More
Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses

Detecting objects in cluttered scenes and estimating articulated human body parts from 2D images are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g., playing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Human-Computer Interaction Volume 3, Issue CSCW
November 2019
5026 pages
EISSN:2573-0142
DOI:10.1145/3371885
Editors:
Airi Lampinen
Stockholm University, Sweden
,
Darren Gergle
Northwestern University, USA
,
David A. Shamma
FXPAL, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2019
Published in pacmhci Volume 3, Issue CSCW

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
dissonance
human intelligence
humans
image understanding
interpretability
machine learning models
machines
neural networks
object recognition
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 821
  Total Downloads
- Downloads (Last 12 months)107
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dissonance Between Human and Machine Understanding

Proceedings of the ACM on Human-Computer Interaction

Abstract

References

Cited By

Index Terms

Recommendations

Analysis of the User Experience with a Multiperspective Tool for Explainable Machine Learning in Light of Interactive Principles

A human-centred intelligent system framework: meta-synthetic engineering

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses