skip to main content
10.1145/3287560.3287574acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

Explaining Explanations in AI

Published:29 January 2019Publication History

ABSTRACT

Recent work on interpretability in machine learning and AI has focused on the building of simplified models that approximate the true criteria used to make decisions. These models are a useful pedagogical device for teaching trained professionals how to predict what decisions will be made by the complex system, and most importantly how the system might break. However, when considering any such model it's important to remember Box's maxim that "All models are wrong but some are useful." We focus on the distinction between these models and explanations in philosophy and sociology. These models can be understood as a "do it yourself kit" for explanations, allowing a practitioner to directly answer "what if questions" or generate contrastive explanations without external assistance. Although a valuable ability, giving these models as explanations appears more difficult than necessary, and other forms of explanation may not have the same trade-offs. We contrast the different schools of thought on what makes an explanation, and suggest that machine learning might benefit from viewing the problem more broadly.

References

  1. Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2016. Auditing Black-box Models by Obscuring Features. arXiv:1602.07043 {cs, stat} (22 2 2016). http://arxiv.org/abs/1602.07043 arXiv: 1602.07043.Google ScholarGoogle Scholar
  2. Charles Antaki and Ivan Leudar. 1992. Explaining in conversation: Towards an argument model. European Journal of Social Psychology 22, 2 (1992), 181--194.Google ScholarGoogle ScholarCross RefCross Ref
  3. David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research 11, Jun (2010), 1803--1831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Solon Barocas and Andrew D. Selbst. 2016. Big data's disparate impact. California Law Review 104, 3 (2016).Google ScholarGoogle Scholar
  5. Osbert Bastani, Carolyn Kim, and Hamsa Bastani. 2017. Interpretability via Model Extraction. arXiv:1706.09773 {cs, stat} (29 6 2017). http://arxiv.org/abs/1706.09773 arXiv: 1706.09773.Google ScholarGoogle Scholar
  6. Bettina Berendt and Sören Preibusch. 2017. Toward Accountable Discrimination-Aware Data Mining: The Importance of Keeping the Human in the Loop and Under the Looking Glass. Big Data 5, 2 (1 6 2017), 135--152.Google ScholarGoogle Scholar
  7. Reuben Binns. 2017. Algorithmic Accountability and Public Reason. Philosophy & Technology (24 5 2017), 1--14.Google ScholarGoogle Scholar
  8. Or Biran and Kathleen McKeown. 2014. Justification narratives for individual classifications. In Proceedings of the AutoML workshop at ICML, Vol. 2014.Google ScholarGoogle Scholar
  9. B. Bodo, N. Helberger, K. Irion, F. Zuiderveen Borgesius, J. Moller, B. van de Velde, N. Bol, B. van Es, and C. de Vreese. 2017. Tackling the algorithmic control crisis-the technical, legal, and ethical challenges of research into algorithmic agents. Yale JL & Tech. 19 (2017), 133.Google ScholarGoogle Scholar
  10. George EP Box. 1979. Robustness in the strategy of scientific model building. In Robustness in statistics. Elsevier, 201--236.Google ScholarGoogle Scholar
  11. Jenna Burrell. 2016. How the Machine 'Thinks:' Understanding Opacity in Machine Learning Algorithms. Big Data & Society (2016).Google ScholarGoogle Scholar
  12. R. Caruana, H. Kangarloo, J. D. Dionisio, U. Sinha, and D. Johnson. 1999. Case-based explanation of non-case-based learning methods. Proceedings of the AMIA Symposium (1999), 212--215. PMID: 10566351 PMCID: PMC2232607.Google ScholarGoogle Scholar
  13. Junxiang Chen, Yale Chang, Brian Hobbs, Peter Castaldi, Michael Cho, Edwin Silverman, and Jennifer Dy. 2016. Interpretable Clustering via Discriminative Rectangle Mixture Model. Data Mining (ICDM), 2016 IEEE 16th International Conference on, 823--828. http://ieeexplore.ieee.org/abstract/document/7837910/ {Online; accessed 2017-10-16}.Google ScholarGoogle ScholarCross RefCross Ref
  14. Danielle Keats Citron and Frank Pasquale. 2014. The scored society: due process for automated predictions. Wash. L. Rev. 89 (2014), 1.Google ScholarGoogle Scholar
  15. Mark Craven and Jude W. Shavlik. 1996. Extracting tree-structured representations of trained networks. Advances in neural information processing systems, 24--30. http://papers.nips.cc/paper/1152-extracting-tree-structured-representations-of-trained-networks.pdf {Online; accessed 2017-10-16}.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems. IEEE, 598--617. {Online; accessed 2016-09-12}.Google ScholarGoogle Scholar
  17. Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra Wood. 2017. Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134 (2017).Google ScholarGoogle Scholar
  18. Simant Dube. 2018. High Dimensional Spaces, Deep Learning and Adversarial Examples. arXiv preprint arXiv:1801.00634 (2018).Google ScholarGoogle Scholar
  19. Ruth C. Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. arXiv preprint arXiv: 1704.03296 (2017).Google ScholarGoogle Scholar
  20. John Fox, David Glasspool, Dan Grecu, Sanjay Modgil, Matthew South, and Vivek Patkar. 2007. Argumentation-based inference and decision making--A medical perspective. IEEE intelligent systems 22, 6 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Roman Frigg. 2006. Scientific Representation and the Semantic View of Theories. Theoria 55 (2006), 37--53.Google ScholarGoogle Scholar
  22. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google ScholarGoogle Scholar
  23. Shirley Gregor and Izak Benbasat. 1999. Explanations from intelligent systems: Theoretical foundations and implications for practice. MIS quarterly (1999), 497--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Habermas. 1984. The Theory of Communicative Action: Volume 1: Reason and the Rationalization of Society. Beacon, Boston.Google ScholarGoogle Scholar
  25. William Herfel, Wladiyslaw Krajewski, Ilkka Niiniluoto, and Wojcicki (Eds.). 1995. Theories and Models in Scientific Process. Rodopi, Amsterdam.Google ScholarGoogle Scholar
  26. Mary B. Hesse. 1965. Models and analogies in science. (1965).Google ScholarGoogle Scholar
  27. Germund Hesslow. 1988. The problem of causal selection. Contemporary science and natural explanation: Commonsense conceptions of causality (1988), 11--32.Google ScholarGoogle Scholar
  28. Mireille Hildebrandt and Bert-Jaap Koops. 2010. The Challenges of Ambient Law and Legal Protection in the Profiling Era. The Modern Law Review 73, 3 (May 2010), 428--460.Google ScholarGoogle ScholarCross RefCross Ref
  29. Denis J. Hilton. 1990. Conversational processes and causal explanation. Psychological Bulletin 107, 1 (1990), 65.Google ScholarGoogle ScholarCross RefCross Ref
  30. Denis J. Hilton. 1996. Mental models and causal explanation: Judgements of probable cause and explanatory relevance. Thinking & Reasoning 2, 4 (1996), 273--308.Google ScholarGoogle ScholarCross RefCross Ref
  31. Denis J. Hilton and Ben R. Slugoski. 1986. Knowledge-based causal attribution: The abnormal conditions focus model. Psychological review 93, 1 (1986), 75.Google ScholarGoogle Scholar
  32. Ujwal Kayande, Arnaud De Bruyn, Gary L. Lilien, Arvind Rangaswamy, and Gerrit H. Van Bruggen. 2009. How incorporating feedback mechanisms in a DSS affects DSS evaluations. Information Systems Research 20, 4 (2009), 527--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in Neural Information Processing Systems, 1952--1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pauline T. Kim. 2016. Data-driven discrimination at work. Wm. & Mary L. Rev. 58 (2016), 857.Google ScholarGoogle Scholar
  35. Boris Kment. 2006. Counterfactuals and explanation. Mind 115, 458 (2006), 261--310.Google ScholarGoogle ScholarCross RefCross Ref
  36. Joshua A. Kroll, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu. 2016. Accountable algorithms. U. Pa. L. Rev. 165 (2016), 633.Google ScholarGoogle Scholar
  37. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. ACM Press, 126--137. {Online; accessed 2018-05-06}.Google ScholarGoogle Scholar
  38. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable & Explorable Approximations of Black Box Models. arXiv:1707.01154 {cs} (4 7 2017). http://arxiv.org/abs/1707.01154 arXiv: 1707.01154.Google ScholarGoogle Scholar
  39. Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2017. Fair, Transparent, and Accountable Algorithmic Decision-making Processes: The Premise, the Proposed Solutions, and the Open Challenges. Philosophy & Technology (15 8 2017). {Online; accessed 2017-08-25}.Google ScholarGoogle Scholar
  40. David Lewis. 1973. Counterfactuals. Blackwell, Oxford.Google ScholarGoogle Scholar
  41. Brian Y. Lim and Anind K. Dey. 2009. Assessing demand for intelligibility in context-aware applications. In Proceedings of the 11th international conference on Ubiquitous computing - Ubicomp '09. ACM Press, Orlando, Florida, USA, 195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Peter Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supplements 27 (1990), 247--266.Google ScholarGoogle ScholarCross RefCross Ref
  43. Zachary C. Lipton. 2016. The Mythos of Model Interpretability. arXiv:1606.03490 {cs, stat} (10 6 2016). http://arxiv.org/abs/1606.03490 arXiv: 1606.03490.Google ScholarGoogle Scholar
  44. Paulo JG Lisboa. 2013. Interpretability in Machine Learning -- Principles and Practice. In Fuzzy Logic and Applications. Springer, 15--21. http://link.springer.com/chapter/10.1007/978-3-319-03200-9_2 {Online; accessed 2015-12-19}.Google ScholarGoogle Scholar
  45. Tania Lombrozo. 2009. Explanation and categorization: How "why?" informs "what?". Cognition 110, 2 (2009), 248--253.Google ScholarGoogle ScholarCross RefCross Ref
  46. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Alessandro Mantelero. 2016. Personal data for decisional purposes in the age of analytics: From an individual to a collective dimension of data protection. Computer law & security review 32, 2 (2016), 238--255.Google ScholarGoogle Scholar
  48. David Martens, Bart Baesens, Tony Van Gestel, and Jan Vanthienen. 2007. Comprehensible credit scoring models using rule extraction from support vector machines. European journal of operational research 183, 3 (2007), 1466--1476.Google ScholarGoogle Scholar
  49. David Martens and Foster Provost. 2013. Explaining data-driven document classifications. (2013). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2282998 {Online; accessed 2017-09-22}.Google ScholarGoogle Scholar
  50. Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. ACM Press, 165--172. {Online; accessed 2017-09-24}.Google ScholarGoogle Scholar
  51. John L. McClure, Robbie M. Sutton, and Denis J. Hilton. 2003. The Role of Goal-Based Explanations. Social judgments: Implicit and explicit processes 5 (2003).Google ScholarGoogle Scholar
  52. Tim Miller. 2017. Explanation in artificial intelligence: Insights from the social sciences. arXiv preprint arXiv:1706.07269 (2017).Google ScholarGoogle Scholar
  53. Tim Miller, Piers Howe, and Liz Sonenberg. 2017. Explainable AI: Beware of Inmates Running the Asylum Or: How I Learnt to Stop Worrying and Love the Social and Behavioural Sciences. arXiv:1712.00547 {cs} (1 12 2017). http://arxiv.org/abs/1712.00547 arXiv: 1712.00547.Google ScholarGoogle Scholar
  54. Brent Mittelstadt. 2016. Automation, Algorithms, and Politics| Auditing for Transparency in Content Personalization Systems. International Journal of Communication 10 (2016), 12.Google ScholarGoogle Scholar
  55. Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2017. Methods for interpreting and understanding deep neural networks. Digital Signal Processing (2017).Google ScholarGoogle Scholar
  56. Helen Nissenbaum. 1996. Accountability in a computerized society. Science and engineering ethics 2, 1 (1996), 25--42.Google ScholarGoogle Scholar
  57. S. C. Olhede and P. J. Wolfe. 2018. The growing ubiquity of algorithms in society: implications, impacts and innovations. Phil. Trans. R. Soc. A 376, 2128 (Sept. 2018), 20170364.Google ScholarGoogle ScholarCross RefCross Ref
  58. Frank Pasquale. 2015. The black box society: The secret algorithms that control money and information. Harvard University Press. Google ScholarGoogle Scholar
  59. Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russ Greiner, D S Wishart, Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. 2006. Visual Explanation of Evidence in Additive Classifiers. (2006), 8.Google ScholarGoogle Scholar
  60. Bob Rehder. 2003. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition 29, 6 (2003), 1141.Google ScholarGoogle ScholarCross RefCross Ref
  61. Bob Rehder. 2006. When similarity and causality compete in category-based property generalization. Memory & Cognition 34, 1 (2006), 3--16.Google ScholarGoogle ScholarCross RefCross Ref
  62. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 1135--1144. {Online; accessed 2017-09-24}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. David-Hillel Ruben. 2004. Explaining explanation. Routledge.Google ScholarGoogle Scholar
  64. Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller. 2017. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arXiv preprint arXiv:1708.08296 (2017). https://arxiv.org/abs/1708.08296 {Online; accessed 2017-09-22}.Google ScholarGoogle Scholar
  65. Jana Samland and Michael R. Waldmann. 2014. Do Social Norms Influence Causal Inferences? Proceedings of the Annual Meeting of the Cognitive Science Society 36.Google ScholarGoogle Scholar
  66. Ivan Sanchez, Tim Rocktaschel, Sebastian Riedel, and Sameer Singh. 2015. Towards extracting faithful and descriptive representations of latent variable models. AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches (2015). http://www.aaai.org/ocs/index.php/SSS/SSS15/paper/viewFile/10304/10033 {Online; accessed 2017-10-16}.Google ScholarGoogle Scholar
  67. Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry (2014), 1--23.Google ScholarGoogle Scholar
  68. Andrew D. Selbst and Solon Barocas. 2018. The intuitive appeal of explainable machines. (2018).Google ScholarGoogle Scholar
  69. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2016. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. See https://arxiv.org/abs/1610.02391 v3 (2016). https://pdfs.semanticscholar.org/5582/bebed97947a41e3ddd9bd1f284b73f1648c2.pdf {Online; accessed 2017-10-16}.Google ScholarGoogle Scholar
  70. Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. CoRR.Google ScholarGoogle Scholar
  71. Avanti Shrikumar, Peyton Greenside, and Anna Shcherbina. 2016. Not just a black box: Learning important features through propagating activation differences. CoRR.Google ScholarGoogle Scholar
  72. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013). https://arxiv.org/abs/1312.6034 {Online; accessed 2017-09-24}.Google ScholarGoogle Scholar
  73. Ben R. Slugoski, Mansur Lalljee, Roger Lamb, and Gerald P. Gins-burg. 1993. Attribution in conversational context: Effect of mutual knowledge on explanation--giving. European Journal of Social Psychology 23, 3 (1993), 219--238.Google ScholarGoogle ScholarCross RefCross Ref
  74. Paolo Tamagnini, Josua Krause, Aritra Dasgupta, and Enrico Bertini. 2017. Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations. ACM Press, 1--6. {Online; accessed 2017-09-22}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Michael Veale and Lilian Edwards. 2018. Clarity, surprises, and further questions in the Article 29 Working Party draft guidance on automated decision-making and profiling. Computer Law & Security Review 34, 2 (2018), 398--404.Google ScholarGoogle ScholarCross RefCross Ref
  76. Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2018. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology forthcoming (2018).Google ScholarGoogle Scholar
  77. Douglas Walton. 2004. A new dialectical theory of explanation. Philosophical Explorations 7, 1 (2004), 71--89.Google ScholarGoogle ScholarCross RefCross Ref
  78. Douglas Walton. 2007. Dialogical Models of Explanation. ExaCt 2007 (2007), 1--9.Google ScholarGoogle Scholar
  79. Fulton Wang and Cynthia Rudin. 2015. Falling rule lists. Artificial Intelligence and Statistics, 1013--1022.Google ScholarGoogle Scholar
  80. Adrian Weller. 2017. Challenges for Transparency. arXiv:1708.01870 {cs} (29 7 2017). http://arxiv.org/abs/1708.01870 arXiv: 1708.01870.Google ScholarGoogle Scholar
  81. Jim Woodward. 1997. Explanation, Invariance, and Intervention. Philosophy of Science 64 (1997), S26--S41. https:/www.jstor.org/stable/188387Google ScholarGoogle ScholarCross RefCross Ref
  82. James Woodward and E. Zalta. 2003. Scientific explanation.Google ScholarGoogle Scholar
  83. Petri Ylikoski. 2013. Causal and constitutive explanation compared. Erkenntnis 78, 2 (2013), 277--297.Google ScholarGoogle ScholarCross RefCross Ref
  84. Tal Z. Zarsky. 2013. Transparent predictions. U. Ill. L. Rev. (2013), 1503.Google ScholarGoogle Scholar
  85. Jiaming Zeng, Berk Ustun, and Cynthia Rudin. 2017. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180, 3 (2017), 689--722.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Explaining Explanations in AI

        Recommendations

        Reviews

        Vladik Kreinovich

        Everyone agrees that artificial intelligence (AI) should be explainable; there is even an abbreviation for this: xAI. But opinions differ on what it means. This paper is a survey of different approaches to xAI. Ideally, AI systems should provide fully transparent recommendations, that is, recommendations that come from a sequence of clear, agreed-upon rules. In practice, this is rarely possible. Even when we can formulate such rules, the derivation is usually too long for a human to grasp. A usual alternative is to use, as an explanation, a derivation in a simplified easier-to-grasp model-just like simple approximate physical reasoning helps us understand the results of solving complex physical equations. However, in physics, we usually understand how accurate an approximate model is and what the limits of its applicability are, while most xAI systems do not provide this information and thus tend to apply the simplified models even when they are not applicable. Also, the systems explain why a certain conclusion A was made; however, the user is also interested in contrastive explanations: why A and not B In general, users would like the systems to be interactive. They should be able to ask, for example, what can we do to change the recommendation to B What is the evidence behind the rules They should be able to argue when the recommendations and/or rules seem unfair. In view of these user needs, the paper surveys current attempts to design contrastive and interactive xAI.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
          January 2019
          388 pages
          ISBN:9781450361255
          DOI:10.1145/3287560

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 January 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Upcoming Conference

          FAccT '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader