skip to main content
research-article

Dissonance Between Human and Machine Understanding

Published:07 November 2019Publication History
Skip Abstract Section

Abstract

Complex machine learning models are deployed in several critical domains including healthcare and autonomous vehicles nowadays, albeit as functional blackboxes. Consequently, there has been a recent surge in interpreting decisions of such complex models in order to explain their actions to humans. Models which correspond to human interpretation of a task are more desirable in certain contexts and can help attribute liability, build trust, expose biases and in turn build better models. It is therefore crucial to understand how and which models conform to human understanding of tasks. In this paper we present a large-scale crowdsourcing study that reveals and quantifies the dissonance between human and machine understanding, through the lens of an image classification task. In particular, we seek to answer the following questions: Which (well performing) complex ML models are closer to humans in their use of features to make accurate predictions? How does task difficulty affect the feature selection capability of machines in comparison to humans? Are humans consistently better at selecting features that make image recognition more accurate? Our findings have important implications on human-machine collaboration, considering that a long term goal in the field of artificial intelligence is to make machines capable of learning and reasoning like humans.

References

  1. Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y Lim, and Mohan Kankanhalli. 2018. Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arash Afraz, Daniel LK Yamins, and James J DiCarlo. 2014. Neural mechanisms underlying visual object recognition. In Cold Spring Harbor symposia on quantitative biology, Vol. 79. Cold Spring Harbor Laboratory Press, 99--107.Google ScholarGoogle Scholar
  3. Avishek Anand, Kilian Bizer, Alexander Erlei, Ujwal Gadiraju, Christian Heinze, Lukas Meub, Wolfgang Nejdl, and Bjoern Steinroetter. 2018. Effects of Algorithmic Decision-Making and Interpretability on Human Behavior: Experiments using Crowdsourcing. In Proceedings of the HCOMP 2018 Works in Progress and Demonstration Papers Track of the sixth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2018), Zurich, Switzerland, July 5--8, 2018.Google ScholarGoogle Scholar
  4. Mark E Auckland, Kyle R Cave, and Nick Donnelly. 2007. Nontarget objects can influence perceptual processes during object recognition. Psychonomic bulletin & review, Vol. 14, 2 (2007), 332--337.Google ScholarGoogle Scholar
  5. Shlomo Berkovsky, Ronnie Taib, and Dan Conway. 2017. How to recommend?: User trust factors in movie recommender systems. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 287--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Irving Biederman. 1985. Human image understanding: Recent research and a theory. Computer vision, graphics, and image processing, Vol. 32, 1 (1985), 29--73.Google ScholarGoogle Scholar
  7. Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. 'It's Reducing a Human Being to a Percentage': Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 377, bibinfonumpages14 pages. https://doi.org/10.1145/3173574.3173951Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1721--1730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  10. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. (2017).Google ScholarGoogle Scholar
  11. Leonidas AA Doumas, Guillermo Puebla, and Andrea E Martin. 2018. Human-like generalization in a machine through predicate learning. arXiv preprint arXiv:1806.01709 (2018).Google ScholarGoogle Scholar
  12. Michael W Eysenck and Mark T Keane. 2013. Cognitive psychology: A student's handbook .Psychology press.Google ScholarGoogle Scholar
  13. Gerhard Friedrich and Markus Zanker. 2011. A taxonomy for generating explanations in recommender systems. AI Magazine, Vol. 32, 3 (2011), 90--98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ujwal Gadiraju, Besnik Fetahu, and Ricardo Kawase. 2015a. Training workers for improving performance in crowdsourcing microtasks. In Design for Teaching and Learning in a Networked World. Springer, 100--114.Google ScholarGoogle Scholar
  15. Ujwal Gadiraju, Ricardo Kawase, Stefan Dietze, and Gianluca Demartini. 2015b. Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1631--1640.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ujwal Gadiraju, Jie Yang, and Alessandro Bozzon. 2017. Clarity is a worthwhile quality: On the role of task clarity in microtask crowdsourcing. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, 5--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. 2018. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems. 7549--7561.Google ScholarGoogle Scholar
  18. Justin Scott Giboney, Susan A Brown, Paul Benjamin Lowry, and Jay F Nunamaker Jr. 2015. User acceptance of knowledge-based system recommendations: Explanations, arguments, and fit. Decision Support Systems, Vol. 72 (2015), 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shirley Gregor and Izak Benbasat. 1999. Explanations from intelligent systems: Theoretical foundations and implications for practice. MIS quarterly (1999), 497--530.Google ScholarGoogle Scholar
  20. Anikó Hannák, Claudia Wagner, David Garcia, Alan Mislove, Markus Strohmaier, and Christo Wilson. 2017. Bias in online freelance marketplaces: Evidence from taskrabbit and fiverr. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 1914--1933.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IEEE Global Initiative et al. 2016. Ethically Aligned Design. IEEE Standards v1 (2016).Google ScholarGoogle Scholar
  22. Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), Vol. 20, 4 (2002), 422--446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tatiana Josephy, Matt Lease, Praveen Paritosh, Markus Krause, Mihai Georgescu, Michael Tjalve, and Daniela Braga. 2014. CrowdScale 2013: Crowdsourcing at Scale Workshop Report. AI Magazine, Vol. 35, 2 (2014), 75--78.Google ScholarGoogle ScholarCross RefCross Ref
  24. Daniel Kahneman. 2003. A perspective on judgment and choice: mapping bounded rationality. American psychologist, Vol. 58, 9 (2003), 697.Google ScholarGoogle Scholar
  25. Daniel Kahneman, Andrew M Rosenfield, Linnea Gandhi, and Tom Blaser. 2016. Noise: How to overcome the high, hidden cost of inconsistent decision making. Harvard business review, Vol. 94, 10 (2016), 38--46.Google ScholarGoogle Scholar
  26. Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems. 2280--2288.Google ScholarGoogle Scholar
  27. Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. Human decisions and machine predictions. The quarterly journal of economics, Vol. 133, 1 (2017), 237--293.Google ScholarGoogle ScholarCross RefCross Ref
  28. Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ranjay A Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A Shamma, Li Fei-Fei, and Michael S Bernstein. 2016. Embracing error to enable rapid crowdsourcing. In Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, 3167--3179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google ScholarGoogle Scholar
  31. Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences, Vol. 40 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  32. Wallace Lawson, Laura Hiatt, and J Trafton. 2014. Leveraging cognitive context for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 381--386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Min Kyung Lee and Su Baykal. 2017. Algorithmic mediation in group decisions: Fairness perceptions of algorithmically mediated vs. discussion-based social division. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 1035--1048.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Min Kyung Lee, Daniel Kusbit, Evan Metsky, and Laura Dabbish. 2015. Working with machines: The impact of algorithmic and data-driven management on human workers. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 1603--1612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Benjamin Letham, Cynthia Rudin, Tyler H McCormick, David Madigan, et al. 2015. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, Vol. 9, 3 (2015), 1350--1371.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zachary C Lipton. 2016. The mythos of model interpretability. ICML Workshop on Human Interpretability of Machine Learning (2016).Google ScholarGoogle Scholar
  37. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google ScholarGoogle Scholar
  38. Gaspard Monge. 1781. Mémoire sur la théorie des déblais et des remblais. Histoire de l'Académie Royale des Sciences de Paris ( 1781).Google ScholarGoogle Scholar
  39. David G Myers. 2002. The powers & perils of intuition. Psychology Today, Vol. 35, 6 (2002), 42--52.Google ScholarGoogle Scholar
  40. Kenya Freeman Oduor and Eric N Wiebe. 2008. The effects of automated decision algorithm modality and transparency on reported trust and task performance. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52. SAGE Publications Sage CA: Los Angeles, CA, 302--306.Google ScholarGoogle ScholarCross RefCross Ref
  41. David Oleson, Alexander Sorokin, Greg P Laughlin, Vaughn Hester, John Le, and Lukas Biewald. 2011. Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. Human computation, Vol. 11, 11 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Cathy O'Neill. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Nueva York, NY: Crown Publishing Group (2016).Google ScholarGoogle Scholar
  43. Alexis Papadimitriou, Panagiotis Symeonidis, and Yannis Manolopoulos. 2012. A generalized taxonomy of explanations styles for traditional and social recommender systems. Data Mining and Knowledge Discovery, Vol. 24, 3 (2012), 555--583.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Robin L Plackett. 1975. The analysis of permutations. Applied Statistics (1975), 193--202.Google ScholarGoogle Scholar
  45. Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice interfaces in everyday life. In proceedings of the 2018 CHI conference on human factors in computing systems. ACM, 640.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations As Mechanisms for Supporting Algorithmic Transparency. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 103, bibinfonumpages13 pages. https://doi.org/10.1145/3173574.3173677Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Iyad Rahwan, Manuel Cebrian, Nick Obradovich, Josh Bongard, Jean-Francc ois Bonnefon, Cynthia Breazeal, Jacob W Crandall, Nicholas A Christakis, Iain D Couzin, Matthew O Jackson, et al. 2019. Machine behaviour. Nature, Vol. 568, 7753 (2019), 477.Google ScholarGoogle Scholar
  48. Rishi Rajalingham, Elias B Issa, Pouya Bashivan, Kohitij Kar, Kailyn Schmidt, and James J DiCarlo. 2018. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, Vol. 38, 33 (2018), 7255--7269.Google ScholarGoogle ScholarCross RefCross Ref
  49. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).Google ScholarGoogle Scholar
  50. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why Should I Trust You?: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez. 2017. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 2662--2670. https://doi.org/10.24963/ijcai.2017/371Google ScholarGoogle ScholarCross RefCross Ref
  52. Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. 1998. A metric for distributions with applications to image databases. In Computer Vision, 1998. Sixth International Conference on. IEEE, 59--66.Google ScholarGoogle ScholarCross RefCross Ref
  53. Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J Majaj, Rishi Rajalingham, Elias B Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, et al. 2018. Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? BioRxiv (2018), 407007.Google ScholarGoogle Scholar
  54. Grace S Shieh. 1998. A weighted Kendall's tau statistic. Statistics & probability letters, Vol. 39, 1 (1998), 17--24.Google ScholarGoogle Scholar
  55. Hirokazu Shirado and Nicholas A Christakis. 2017. Locally noisy autonomous agents improve global human coordination in network experiments. Nature, Vol. 545, 7654 (2017), 370.Google ScholarGoogle Scholar
  56. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484.Google ScholarGoogle Scholar
  57. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of Go without human knowledge. Nature, Vol. 550, 7676 (2017), 354.Google ScholarGoogle Scholar
  58. Herbert Alexander Simon. 1997. Models of bounded rationality: Empirically grounded economic reason. Vol. 3. MIT press.Google ScholarGoogle Scholar
  59. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  60. Elizabeth Stowell, Mercedes C Lyson, Herman Saksono, Reneé C Wurth, Holly Jimison, Misha Pavel, and Andrea G Parker. 2018. Designing and Evaluating mHealth Interventions for Vulnerable Populations: A Systematic Review. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning.. In AAAI, Vol. 4. 12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  63. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.Google ScholarGoogle ScholarCross RefCross Ref
  64. Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recommender systems. In Data Engineering Workshop, 2007 IEEE 23rd International Conference on. IEEE, 801--810.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Alexandra Vtyurina and Adam Fourney. 2018. Exploring the role of conversational cues in guided task support with virtual assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens. 2018. Learning Credible Models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 2417--2426. https://doi.org/10.1145/3219819.3220070Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens. 2018. Learning credible models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2417--2426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Weiquan Wang and Izak Benbasat. 2007. Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems, Vol. 23, 4 (2007), 217--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. William Webber, Alistair Moffat, and Justin Zobel. 2010. A Similarity Measure for Indefinite Rankings. ACM Trans. Inf. Syst., Vol. 28, 4, Article 20 (Nov. 2010), bibinfonumpages38 pages. https://doi.org/10.1145/1852102.1852106Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning. 2048--2057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. 2014. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, Vol. 111, 23 (2014), 8619--8624.Google ScholarGoogle ScholarCross RefCross Ref
  72. Tal Zarsky. 2016. The trouble with algorithmic decisions: An analytic road map to examine efficiency and fairness in automated and opaque decision making. Science, Technology, & Human Values, Vol. 41, 1 (2016), 118--132.Google ScholarGoogle ScholarCross RefCross Ref
  73. Nan-ning Zheng, Zi-yi Liu, Peng-ju Ren, Yong-qiang Ma, Shi-tao Chen, Si-yu Yu, Jian-ru Xue, Ba-dong Chen, and Fei-yue Wang. 2017. Hybrid-augmented intelligence: collaboration and cognition. Frontiers of Information Technology & Electronic Engineering, Vol. 18, 2 (2017), 153--179.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dissonance Between Human and Machine Understanding

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the ACM on Human-Computer Interaction
          Proceedings of the ACM on Human-Computer Interaction  Volume 3, Issue CSCW
          November 2019
          5026 pages
          EISSN:2573-0142
          DOI:10.1145/3371885
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 November 2019
          Published in pacmhci Volume 3, Issue CSCW

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader