ABSTRACT
Recent work on interpretability in machine learning and AI has focused on the building of simplified models that approximate the true criteria used to make decisions. These models are a useful pedagogical device for teaching trained professionals how to predict what decisions will be made by the complex system, and most importantly how the system might break. However, when considering any such model it's important to remember Box's maxim that "All models are wrong but some are useful." We focus on the distinction between these models and explanations in philosophy and sociology. These models can be understood as a "do it yourself kit" for explanations, allowing a practitioner to directly answer "what if questions" or generate contrastive explanations without external assistance. Although a valuable ability, giving these models as explanations appears more difficult than necessary, and other forms of explanation may not have the same trade-offs. We contrast the different schools of thought on what makes an explanation, and suggest that machine learning might benefit from viewing the problem more broadly.
- Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2016. Auditing Black-box Models by Obscuring Features. arXiv:1602.07043 {cs, stat} (22 2 2016). http://arxiv.org/abs/1602.07043 arXiv: 1602.07043.Google Scholar
- Charles Antaki and Ivan Leudar. 1992. Explaining in conversation: Towards an argument model. European Journal of Social Psychology 22, 2 (1992), 181--194.Google ScholarCross Ref
- David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research 11, Jun (2010), 1803--1831. Google ScholarDigital Library
- Solon Barocas and Andrew D. Selbst. 2016. Big data's disparate impact. California Law Review 104, 3 (2016).Google Scholar
- Osbert Bastani, Carolyn Kim, and Hamsa Bastani. 2017. Interpretability via Model Extraction. arXiv:1706.09773 {cs, stat} (29 6 2017). http://arxiv.org/abs/1706.09773 arXiv: 1706.09773.Google Scholar
- Bettina Berendt and Sören Preibusch. 2017. Toward Accountable Discrimination-Aware Data Mining: The Importance of Keeping the Human in the Loop and Under the Looking Glass. Big Data 5, 2 (1 6 2017), 135--152.Google Scholar
- Reuben Binns. 2017. Algorithmic Accountability and Public Reason. Philosophy & Technology (24 5 2017), 1--14.Google Scholar
- Or Biran and Kathleen McKeown. 2014. Justification narratives for individual classifications. In Proceedings of the AutoML workshop at ICML, Vol. 2014.Google Scholar
- B. Bodo, N. Helberger, K. Irion, F. Zuiderveen Borgesius, J. Moller, B. van de Velde, N. Bol, B. van Es, and C. de Vreese. 2017. Tackling the algorithmic control crisis-the technical, legal, and ethical challenges of research into algorithmic agents. Yale JL & Tech. 19 (2017), 133.Google Scholar
- George EP Box. 1979. Robustness in the strategy of scientific model building. In Robustness in statistics. Elsevier, 201--236.Google Scholar
- Jenna Burrell. 2016. How the Machine 'Thinks:' Understanding Opacity in Machine Learning Algorithms. Big Data & Society (2016).Google Scholar
- R. Caruana, H. Kangarloo, J. D. Dionisio, U. Sinha, and D. Johnson. 1999. Case-based explanation of non-case-based learning methods. Proceedings of the AMIA Symposium (1999), 212--215. PMID: 10566351 PMCID: PMC2232607.Google Scholar
- Junxiang Chen, Yale Chang, Brian Hobbs, Peter Castaldi, Michael Cho, Edwin Silverman, and Jennifer Dy. 2016. Interpretable Clustering via Discriminative Rectangle Mixture Model. Data Mining (ICDM), 2016 IEEE 16th International Conference on, 823--828. http://ieeexplore.ieee.org/abstract/document/7837910/ {Online; accessed 2017-10-16}.Google ScholarCross Ref
- Danielle Keats Citron and Frank Pasquale. 2014. The scored society: due process for automated predictions. Wash. L. Rev. 89 (2014), 1.Google Scholar
- Mark Craven and Jude W. Shavlik. 1996. Extracting tree-structured representations of trained networks. Advances in neural information processing systems, 24--30. http://papers.nips.cc/paper/1152-extracting-tree-structured-representations-of-trained-networks.pdf {Online; accessed 2017-10-16}.Google ScholarDigital Library
- Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems. IEEE, 598--617. {Online; accessed 2016-09-12}.Google Scholar
- Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra Wood. 2017. Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134 (2017).Google Scholar
- Simant Dube. 2018. High Dimensional Spaces, Deep Learning and Adversarial Examples. arXiv preprint arXiv:1801.00634 (2018).Google Scholar
- Ruth C. Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. arXiv preprint arXiv: 1704.03296 (2017).Google Scholar
- John Fox, David Glasspool, Dan Grecu, Sanjay Modgil, Matthew South, and Vivek Patkar. 2007. Argumentation-based inference and decision making--A medical perspective. IEEE intelligent systems 22, 6 (2007). Google ScholarDigital Library
- Roman Frigg. 2006. Scientific Representation and the Semantic View of Theories. Theoria 55 (2006), 37--53.Google Scholar
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
- Shirley Gregor and Izak Benbasat. 1999. Explanations from intelligent systems: Theoretical foundations and implications for practice. MIS quarterly (1999), 497--530. Google ScholarDigital Library
- J. Habermas. 1984. The Theory of Communicative Action: Volume 1: Reason and the Rationalization of Society. Beacon, Boston.Google Scholar
- William Herfel, Wladiyslaw Krajewski, Ilkka Niiniluoto, and Wojcicki (Eds.). 1995. Theories and Models in Scientific Process. Rodopi, Amsterdam.Google Scholar
- Mary B. Hesse. 1965. Models and analogies in science. (1965).Google Scholar
- Germund Hesslow. 1988. The problem of causal selection. Contemporary science and natural explanation: Commonsense conceptions of causality (1988), 11--32.Google Scholar
- Mireille Hildebrandt and Bert-Jaap Koops. 2010. The Challenges of Ambient Law and Legal Protection in the Profiling Era. The Modern Law Review 73, 3 (May 2010), 428--460.Google ScholarCross Ref
- Denis J. Hilton. 1990. Conversational processes and causal explanation. Psychological Bulletin 107, 1 (1990), 65.Google ScholarCross Ref
- Denis J. Hilton. 1996. Mental models and causal explanation: Judgements of probable cause and explanatory relevance. Thinking & Reasoning 2, 4 (1996), 273--308.Google ScholarCross Ref
- Denis J. Hilton and Ben R. Slugoski. 1986. Knowledge-based causal attribution: The abnormal conditions focus model. Psychological review 93, 1 (1986), 75.Google Scholar
- Ujwal Kayande, Arnaud De Bruyn, Gary L. Lilien, Arvind Rangaswamy, and Gerrit H. Van Bruggen. 2009. How incorporating feedback mechanisms in a DSS affects DSS evaluations. Information Systems Research 20, 4 (2009), 527--546. Google ScholarDigital Library
- Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in Neural Information Processing Systems, 1952--1960. Google ScholarDigital Library
- Pauline T. Kim. 2016. Data-driven discrimination at work. Wm. & Mary L. Rev. 58 (2016), 857.Google Scholar
- Boris Kment. 2006. Counterfactuals and explanation. Mind 115, 458 (2006), 261--310.Google ScholarCross Ref
- Joshua A. Kroll, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu. 2016. Accountable algorithms. U. Pa. L. Rev. 165 (2016), 633.Google Scholar
- Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. ACM Press, 126--137. {Online; accessed 2018-05-06}.Google Scholar
- Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable & Explorable Approximations of Black Box Models. arXiv:1707.01154 {cs} (4 7 2017). http://arxiv.org/abs/1707.01154 arXiv: 1707.01154.Google Scholar
- Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2017. Fair, Transparent, and Accountable Algorithmic Decision-making Processes: The Premise, the Proposed Solutions, and the Open Challenges. Philosophy & Technology (15 8 2017). {Online; accessed 2017-08-25}.Google Scholar
- David Lewis. 1973. Counterfactuals. Blackwell, Oxford.Google Scholar
- Brian Y. Lim and Anind K. Dey. 2009. Assessing demand for intelligibility in context-aware applications. In Proceedings of the 11th international conference on Ubiquitous computing - Ubicomp '09. ACM Press, Orlando, Florida, USA, 195. Google ScholarDigital Library
- Peter Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supplements 27 (1990), 247--266.Google ScholarCross Ref
- Zachary C. Lipton. 2016. The Mythos of Model Interpretability. arXiv:1606.03490 {cs, stat} (10 6 2016). http://arxiv.org/abs/1606.03490 arXiv: 1606.03490.Google Scholar
- Paulo JG Lisboa. 2013. Interpretability in Machine Learning -- Principles and Practice. In Fuzzy Logic and Applications. Springer, 15--21. http://link.springer.com/chapter/10.1007/978-3-319-03200-9_2 {Online; accessed 2015-12-19}.Google Scholar
- Tania Lombrozo. 2009. Explanation and categorization: How "why?" informs "what?". Cognition 110, 2 (2009), 248--253.Google ScholarCross Ref
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774. Google ScholarDigital Library
- Alessandro Mantelero. 2016. Personal data for decisional purposes in the age of analytics: From an individual to a collective dimension of data protection. Computer law & security review 32, 2 (2016), 238--255.Google Scholar
- David Martens, Bart Baesens, Tony Van Gestel, and Jan Vanthienen. 2007. Comprehensible credit scoring models using rule extraction from support vector machines. European journal of operational research 183, 3 (2007), 1466--1476.Google Scholar
- David Martens and Foster Provost. 2013. Explaining data-driven document classifications. (2013). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2282998 {Online; accessed 2017-09-22}.Google Scholar
- Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. ACM Press, 165--172. {Online; accessed 2017-09-24}.Google Scholar
- John L. McClure, Robbie M. Sutton, and Denis J. Hilton. 2003. The Role of Goal-Based Explanations. Social judgments: Implicit and explicit processes 5 (2003).Google Scholar
- Tim Miller. 2017. Explanation in artificial intelligence: Insights from the social sciences. arXiv preprint arXiv:1706.07269 (2017).Google Scholar
- Tim Miller, Piers Howe, and Liz Sonenberg. 2017. Explainable AI: Beware of Inmates Running the Asylum Or: How I Learnt to Stop Worrying and Love the Social and Behavioural Sciences. arXiv:1712.00547 {cs} (1 12 2017). http://arxiv.org/abs/1712.00547 arXiv: 1712.00547.Google Scholar
- Brent Mittelstadt. 2016. Automation, Algorithms, and Politics| Auditing for Transparency in Content Personalization Systems. International Journal of Communication 10 (2016), 12.Google Scholar
- Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2017. Methods for interpreting and understanding deep neural networks. Digital Signal Processing (2017).Google Scholar
- Helen Nissenbaum. 1996. Accountability in a computerized society. Science and engineering ethics 2, 1 (1996), 25--42.Google Scholar
- S. C. Olhede and P. J. Wolfe. 2018. The growing ubiquity of algorithms in society: implications, impacts and innovations. Phil. Trans. R. Soc. A 376, 2128 (Sept. 2018), 20170364.Google ScholarCross Ref
- Frank Pasquale. 2015. The black box society: The secret algorithms that control money and information. Harvard University Press. Google Scholar
- Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russ Greiner, D S Wishart, Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. 2006. Visual Explanation of Evidence in Additive Classifiers. (2006), 8.Google Scholar
- Bob Rehder. 2003. A causal-model theory of conceptual representation and categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition 29, 6 (2003), 1141.Google ScholarCross Ref
- Bob Rehder. 2006. When similarity and causality compete in category-based property generalization. Memory & Cognition 34, 1 (2006), 3--16.Google ScholarCross Ref
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 1135--1144. {Online; accessed 2017-09-24}. Google ScholarDigital Library
- David-Hillel Ruben. 2004. Explaining explanation. Routledge.Google Scholar
- Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller. 2017. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arXiv preprint arXiv:1708.08296 (2017). https://arxiv.org/abs/1708.08296 {Online; accessed 2017-09-22}.Google Scholar
- Jana Samland and Michael R. Waldmann. 2014. Do Social Norms Influence Causal Inferences? Proceedings of the Annual Meeting of the Cognitive Science Society 36.Google Scholar
- Ivan Sanchez, Tim Rocktaschel, Sebastian Riedel, and Sameer Singh. 2015. Towards extracting faithful and descriptive representations of latent variable models. AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches (2015). http://www.aaai.org/ocs/index.php/SSS/SSS15/paper/viewFile/10304/10033 {Online; accessed 2017-10-16}.Google Scholar
- Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry (2014), 1--23.Google Scholar
- Andrew D. Selbst and Solon Barocas. 2018. The intuitive appeal of explainable machines. (2018).Google Scholar
- Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2016. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. See https://arxiv.org/abs/1610.02391 v3 (2016). https://pdfs.semanticscholar.org/5582/bebed97947a41e3ddd9bd1f284b73f1648c2.pdf {Online; accessed 2017-10-16}.Google Scholar
- Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. CoRR.Google Scholar
- Avanti Shrikumar, Peyton Greenside, and Anna Shcherbina. 2016. Not just a black box: Learning important features through propagating activation differences. CoRR.Google Scholar
- Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013). https://arxiv.org/abs/1312.6034 {Online; accessed 2017-09-24}.Google Scholar
- Ben R. Slugoski, Mansur Lalljee, Roger Lamb, and Gerald P. Gins-burg. 1993. Attribution in conversational context: Effect of mutual knowledge on explanation--giving. European Journal of Social Psychology 23, 3 (1993), 219--238.Google ScholarCross Ref
- Paolo Tamagnini, Josua Krause, Aritra Dasgupta, and Enrico Bertini. 2017. Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations. ACM Press, 1--6. {Online; accessed 2017-09-22}. Google ScholarDigital Library
- Michael Veale and Lilian Edwards. 2018. Clarity, surprises, and further questions in the Article 29 Working Party draft guidance on automated decision-making and profiling. Computer Law & Security Review 34, 2 (2018), 398--404.Google ScholarCross Ref
- Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2018. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. Harvard Journal of Law & Technology forthcoming (2018).Google Scholar
- Douglas Walton. 2004. A new dialectical theory of explanation. Philosophical Explorations 7, 1 (2004), 71--89.Google ScholarCross Ref
- Douglas Walton. 2007. Dialogical Models of Explanation. ExaCt 2007 (2007), 1--9.Google Scholar
- Fulton Wang and Cynthia Rudin. 2015. Falling rule lists. Artificial Intelligence and Statistics, 1013--1022.Google Scholar
- Adrian Weller. 2017. Challenges for Transparency. arXiv:1708.01870 {cs} (29 7 2017). http://arxiv.org/abs/1708.01870 arXiv: 1708.01870.Google Scholar
- Jim Woodward. 1997. Explanation, Invariance, and Intervention. Philosophy of Science 64 (1997), S26--S41. https:/www.jstor.org/stable/188387Google ScholarCross Ref
- James Woodward and E. Zalta. 2003. Scientific explanation.Google Scholar
- Petri Ylikoski. 2013. Causal and constitutive explanation compared. Erkenntnis 78, 2 (2013), 277--297.Google ScholarCross Ref
- Tal Z. Zarsky. 2013. Transparent predictions. U. Ill. L. Rev. (2013), 1503.Google Scholar
- Jiaming Zeng, Berk Ustun, and Cynthia Rudin. 2017. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180, 3 (2017), 689--722.Google ScholarCross Ref
Index Terms
- Explaining Explanations in AI
Recommendations
Transparency as design publicity: explaining and justifying inscrutable algorithms
AbstractIn this paper we argue that transparency of machine learning algorithms, just as explanation, can be defined at different levels of abstraction. We criticize recent attempts to identify the explanation of black box algorithms with making their ...
“That's (not) the output I expected!” On the role of end user expectations in creating explanations of AI systems
AbstractResearch in the social sciences has shown that expectations are an important factor in explanations as used between humans: rather than explaining the cause of an event per se, the explainer will often address another event that did ...
Do Explanations Improve the Quality of AI-assisted Human Decisions? An Algorithm-in-the-Loop Analysis of Factual & Counterfactual Explanations
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent SystemsThe increased use of AI algorithmic aids in high-stakes decision making has prompted interest in explainable AI (xAI), and the role of counterfactual explanations to increase trust in human-algorithm collaborations and to mitigate unfair outcomes. ...
Comments