skip to main content
10.1145/3173574.3174014acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open Access

Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making

Authors Info & Claims
Published:21 April 2018Publication History

ABSTRACT

Calls for heightened consideration of fairness and accountability in algorithmically-informed public decisions-like taxation, justice, and child protection-are now commonplace. How might designers support such human values? We interviewed 27 public sector machine learning practitioners across 5 OECD countries regarding challenges understanding and imbuing public values into their work. The results suggest a disconnect between organisational and institutional realities, constraints and needs, and those addressed by current research into usable, transparent and 'discrimination-aware' machine learning-absences likely to undermine practical initiatives unless addressed. We see design opportunities in this disconnect, such as in supporting the tracking of concept drift in secondary data sources, and in building usable transparency tools to identify risks and incorporate domain knowledge, aimed both at managers and at the 'street-level bureaucrats' on the frontlines of public service. We conclude by outlining ethical challenges and future directions for collaboration in these high-stakes applications.

Skip Supplemental Material Section

Supplemental Material

pn3665-file5.mp4

mp4

10.6 MB

pn3665.mp4

mp4

257.4 MB

References

  1. Monsuru Adepeju, Gabriel Rosser, and Tao Cheng. 2016. Novel evaluation metrics for sparse spatio-temporal point process hotspot predictions-a crime case study. International Journal of Geographical Information Science 30, 11 (2016), 2133--2154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Administrative Data Taskforce. 2012. The UK Administrative Data Research Network: Improving access for research and policy. Economic and Social Research Council. http://www.esrc.ac.uk/files/ research/administrative-data-taskforce-adt/ improving-access-for-research-and-policy/Google ScholarGoogle Scholar
  3. AI Now. 2016. The AI Now Report: The Social and Economic Implications of Artificial Intelligence Technologies in the Near-Term. https://artificialintelligencenow.com/Google ScholarGoogle Scholar
  4. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica (2016). https://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencingGoogle ScholarGoogle Scholar
  5. Solon Barocas and Andrew D Selbst. 2016. Big Data's Disparate Impact. California Law Review 104 (2016), 671--732.Google ScholarGoogle Scholar
  6. Richard L Baskerville and A Trevor Wood-Harper. 1996. A critical perspective on action research as a method for information systems research. Journal of Information Technology 11, 3 (1996), 235--246.Google ScholarGoogle ScholarCross RefCross Ref
  7. Gwyn Bevan and Christopher Hood. 2006. What's measured is what matters: Targets and gaming in the English public health care system. Public Administration 84, 3 (2006), 517--538.Google ScholarGoogle ScholarCross RefCross Ref
  8. Julia Black. 2005. The emergence of risk-based regulation and the new public risk management in the United Kingdom. Public Law (2005), 512--549. Issue Autumn. https://perma.cc/Z8AU-4VNNGoogle ScholarGoogle Scholar
  9. danah boyd. 2016. Undoing the neutrality of Big Data. Florida Law Review Forum 16 (2016), 226--232.Google ScholarGoogle Scholar
  10. Aurélien Buffat. 2015. Street-level bureaucracy and e-government. Public Management Review 17, 1 (2015), 149--161.Google ScholarGoogle ScholarCross RefCross Ref
  11. Matthew Chalmers and Ian MacColl. 2003. Seamful and seamless design in ubiquitous computing. In Workshop at the crossroads: The interaction of HCI and systems issues in UbiComp, Vol. 8. https://perma.cc/2A3D-NMJPGoogle ScholarGoogle Scholar
  12. Hsinchun Chen, Homa Atabakhsh, Chunju Tseng, Byron Marshall, Siddharth Kaza, Shauna Eggers, Hemanth Gowda, Ankit Shah, Tim Petersen, and Chuck Violette. 2005. Visualization in law enforcement. In CHI'05 Extended Abstracts on Human Factors in Computing Systems. 1268--1271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5, 2 (2017), 153--163.Google ScholarGoogle ScholarCross RefCross Ref
  14. Cary Coglianese and David Lehr. 2016. Regulating by Robot: Administrative Decision Making in the Machine-Learning Era. Geo. LJ 105 (2016), 1147. https://ssrn.com/abstract=2928293Google ScholarGoogle Scholar
  15. Nancy J. Cooke. 1994. Varieties of knowledge elicitation techniques. International Journal of Human-Computer Studies 41, 6 (1994), 801--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Patrick Dunleavy, Helen Margetts, Simon Bastow, and Jane Tinkler. 2006. Digital Era Governance: IT Corporations, the State and e-Government. Oxford University Press, Oxford. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). 214--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. 2003. The role of trust in automation reliance. International Journal of Human-Computer Studies 58, 6 (2003), 697--718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Editor. 2016. More accountability for big-data algorithms. Nature 537, 7621 (2016), 449.Google ScholarGoogle Scholar
  20. Lilian Edwards and Michael Veale. 2017. Slave to the Algorithm? Why a 'Right to an Explanation' is Probably not the Remedy You are Looking For. Duke Law & Technology Review 16, 1 (2017), 18--84.Google ScholarGoogle Scholar
  21. Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway Feedback Loops in Predictive Policing. Presented as a talk at the 4th Workshop on Fairness, Accountability and Transparency in Machine Learning (FAT/ML 2017), Halifax, Canada (2017). https://arxiv.org/abs/1706.09847Google ScholarGoogle Scholar
  22. European Commission. 2017. Tender specifications: Study on Algorithmic Awareness Building, SMART 2017/0055. https://etendering.ted.europa.eu/cft/ cft-document.html?docId=28267Google ScholarGoogle Scholar
  23. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gerhard Fischer. 1991. The importance of models in making complex systems comprehensible. In Mental Models and Human-Computer Interaction, MJ Tauber and D Ackermann (Eds.). Elsevier, Noord Holland.Google ScholarGoogle Scholar
  25. Diana E Forsythe. 1995. Using ethnography in the design of an explanation system. Expert Systems with Applications 8, 4 (1995), 403--417.Google ScholarGoogle ScholarCross RefCross Ref
  26. Batya Friedman and Helen Nissenbaum. 1996. Bias in Computer Systems. ACM Trans. Inf. Syst. 14, 3 (July 1996), 330--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Robert D Galliers and Frank F Land. 1987. Choosing appropriate information systems research methodologies. Commun. ACM 30, 11 (1987), 901--902. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J Gama, Indre Žliobaite, A Bifet, M Pechenizkiy, and A Bouchachia. 2013. A survey on concept drift adaptation. Comput. Surveys 1, 1 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Raphaël Gellert, Katja de Vries, Paul de Hert, and Serge Gutwirth. 2013. A Comparative Analysis of Anti-Discrimination and Data Protection Legislations. In Discrimination and privacy in the information society, Bart Custers, Toon Calders, Bart Schermer, and Tal Zarsky (Eds.). Springer, Heidelberg.Google ScholarGoogle Scholar
  30. Government Digital Service. 2015. Data science ethical framework. HM Government, London. https://www.gov.uk/government/publications/ data-science-ethical-frameworkGoogle ScholarGoogle Scholar
  31. Government Office for Science. 2016. Artificial intelligence: Opportunities and implications for the future of decision making. HM Government, London. https://www.gov.uk/government/publications/ artificial-intelligence-an-overview-for-policy-makersGoogle ScholarGoogle Scholar
  32. Sara Hajian and Josep Domingo-Ferrer. 2012. Direct and indirect discrimination prevention methods. In Discrimination and privacy in the information society, Bart Custers, Toon Calders, Bart Schermer, and Tal Zarsky (Eds.). Springer, Berlin, Heidelberg, 241--254.Google ScholarGoogle Scholar
  33. Gillian R Hayes. 2011. The relationship of action research to human-computer interaction. ACM Transactions on Computer-Human Interaction (TOCHI) 18, 3 (2011), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert R Hoffman. 2008. Human factors contributions to knowledge elicitation. Human factors 50, 3 (2008), 481--488.Google ScholarGoogle Scholar
  35. Robert R Hoffman, Beth Crandall, and Nigel Shadbolt. 1998. Use of the critical decision method to elicit expert knowledge: A case study in the methodology of cognitive task analysis. Human Factors 40, 2 (1998), 254--276.Google ScholarGoogle ScholarCross RefCross Ref
  36. Christopher Hood. 1991. A public management for all seasons? Public Administration 69 (1991), 3--19.Google ScholarGoogle ScholarCross RefCross Ref
  37. V David Hopkin. 1995. Human factors in air traffic control. CRC Press, London.Google ScholarGoogle Scholar
  38. Robert Hoppe. 2011. The governance of problems: Puzzling, powering and participation. Policy Press.Google ScholarGoogle Scholar
  39. House of Common Science and Technology Committee. 2016. Robotics and artificial intelligence (HC 145). The House of Commons, London. http://www.publications.parliament.uk/pa/cm201617/ cmselect/cmsctech/145/145.pdfGoogle ScholarGoogle Scholar
  40. House of Commons Science and Technology Committee. 2016. The big data dilemma (HC 468). House of Commons, London. http://www.publications.parliament. uk/pa/cm201516/cmselect/cmsctech/468/468.pdfGoogle ScholarGoogle Scholar
  41. Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin I P Rubinstein, and J D Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. 43--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Nathalie Japkowicz and Mohak Shah. 2011. Evaluating learning algorithms: A classification perspective. Cambridge University Press, Cambridge, UK. Google ScholarGoogle Scholar
  43. Torben Beck Jørgensen and Barry Bozeman. 2007. Public values: An inventory. Administration & Society 39, 3 (2007), 354--381.Google ScholarGoogle ScholarCross RefCross Ref
  44. Frans Jorna and Pieter Wagenaar. 2007. The 'iron cage' strengthened? Discretion and digital discipline. Public Administration 85, 1 (2007), 189--214.Google ScholarGoogle ScholarCross RefCross Ref
  45. Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (2012), 1--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination aware decision tree learning. In 2010 IEEE International Conference on Data Mining. 869--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Kensaku Kawamoto, Caitlin A Houlihan, E Andrew Balas, and David F Lobach. 2005. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 330, 7494 (2005), 765.Google ScholarGoogle Scholar
  48. Sara Kiesler and Jennifer Goetz. 2002. Mental Models of Robotic Assistants. In CHI '02 Extended Abstracts on Human Factors in Computing Systems (CHI EA '02). 576--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Iacovos Kirlappos, Simon Parkin, and M. Angela Sasse. 2015. "Shadow Security" As a Tool for the Learning Organization. SIGCAS Comput. Soc. 45, 1 (2015), 29--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Daniel Antony Kolkman, Paolo Campo, Tina Balke-Visser, and Nigel Gilbert. 2016. How to build models for government: Criteria driving model acceptance in policymaking. Policy Sciences 49, 4 (2016), 489--504.Google ScholarGoogle ScholarCross RefCross Ref
  51. Christopher A Le Dantec and W Keith Edwards. 2010. Across boundaries of influence and accountability: The multiple scales of public sector information systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI'10. ACM, 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Michael Lipsky. 2010. Street-level bureaucracy: Dilemmas of the individual in public services. Russell Sage Foundation, New York.Google ScholarGoogle Scholar
  53. Zachary C Lipton. 2016. The Mythos of Model Interpretability. In 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). New York. https://arxiv.org/abs/1606.03490Google ScholarGoogle Scholar
  54. Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65 (2017), 211 -- 222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Robin Moore (Ed.). 2015. A compendium of research and analysis on the Offender Assessment System. Ministry of Justice Analytical Series, London. DOI: http://dx.doi.org/https://perma.cc/W2FT-NFWZGoogle ScholarGoogle Scholar
  56. J. David Morgenthaler, Misha Gridnev, Raluca Sauciuc, and Sanjay Bhansali. 2012. Searching for Build Debt: Experiences Managing Technical Debt at Google. In Proceedings of the Third International Workshop on Managing Technical Debt, MTD'12, Zurich, Switzerland - June 05, 2012. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Kathleen L. Mosier, Linda J. Skitka, Susan Heers, and Mark Burdick. 1998. Automation Bias: Decision Making and Performance in High-Tech Cockpits. The International Journal of Aviation Psychology 8, 1 (1998), 47--63.Google ScholarGoogle ScholarCross RefCross Ref
  58. Nesta. 2015. Machines that learn in the wild: Machine learning capabilities, limitations and implications. Nesta, London. https://perma.cc/A6AM-GV6XGoogle ScholarGoogle Scholar
  59. BBC News. 2016. Kent slavery raids 'uncover 21 victims'. BBC News (7 Dec. 2016). https://perma.cc/AM4S-RMHRGoogle ScholarGoogle Scholar
  60. Donald A Norman. 1983. Some observations on mental models. In Mental Models, Dedre Gentner and Albert L Stevens (Eds.). Psychology Press, New York City, NY, 7--14.Google ScholarGoogle Scholar
  61. Teresa Odendahl and Aileen M Shaw. 2002. Interviewing elites. Handbook of Interview Research (2002), 299--316.Google ScholarGoogle Scholar
  62. Marion Oswald, Jamie Grace, Sheena Urwin, and Geoffrey C. Barnes. forthcoming. Algorithmic Risk Assessment Policing Models: Lessons from the Durham Hart Model and 'Experimental' Proportionality. Information & Communications Technology Laws (forthcoming). https://ssrn.com/abstract=3029345Google ScholarGoogle Scholar
  63. Edward C Page and Bill Jenkins. 2005. Policy bureaucracy: Government with a cast of thousands. Oxford University Press, Oxford.Google ScholarGoogle Scholar
  64. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 560--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2009. Dataset shift in machine learning. The MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montréal, Canada - December 07--12, 2015. MIT Press, Cambridge, MA, 2503--2511. https://perma.cc/G6VN-9B86 Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Nick Seaver. 2013. Knowing algorithms. Media in Transition 8 (2013). https://perma.cc/8USJ-VTWSGoogle ScholarGoogle Scholar
  69. Nick Seaver. 2014. On reverse engineering: Looking for the cultural work of engineers {blog. Medium (2014). https://medium.com/anthropology-and-algorithms/ on-reverse-engineering-d9f5bae87812Google ScholarGoogle Scholar
  70. Andrew Selbst. forthcoming. Disparate Impact in Big Data Policing. Georgia Law Review (forthcoming).Google ScholarGoogle Scholar
  71. Linda J Skitka, Kathleen L Mosier, and Mark Burdick. 1999. Does automation bias decision-making? International Journal of Human-Computer Studies 51 (1999), 991--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. The Royal Society. 2017. Machine learning: The power and promise of computers that learn by example. The Royal Society, London. https://royalsociety.org/~/ media/policy/projects/machine-learning/publications/ machine-learning-report.pdfGoogle ScholarGoogle Scholar
  73. The Royal Society and the British Academy. 2017. Data management and use: Governance in the 21st Century. The Royal Society and the British Academy, London. https://royalsociety.org/~/media/policy/projects/ data-governance/data-management-governance.pdfGoogle ScholarGoogle Scholar
  74. Mary E Thomson, Dilek Önkal, Ali Avcioğu, and Paul Goodwin. 2004. Aviation risk perception: A comparison between experts and novices. Risk Analysis 24, 6 (2004), 1585--1595.Google ScholarGoogle ScholarCross RefCross Ref
  75. Alan B Tickle, Robert Andrews, Mostefa Golea, and Joachim Diederich. 1998. The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks. IEEE Transactions on Neural Networks 9, 6 (1998), 1057--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Nikolaj Tollenaar, B. S. J. Wartna, P.G.M Van Der Heijden, and Stefan Bogaerts. 2016. StatRec - Performance, validation and preservability of a static risk prediction instrument. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 129, 1 (2016), 25--44.Google ScholarGoogle ScholarCross RefCross Ref
  77. Joe Tullio, Anind K. Dey, Jason Chalecki, and James Fogarty. 2007. How It Works: A Field Study of Non-technical Users Interacting with an Intelligent System. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07). New York, NY, USA, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Berk Ustun and Cynthia Rudin. 2016. Supersparse Linear Integer Models for Optimized Medical Scoring Systems. Machine Learning 102, 3 (2016), 349--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017).Google ScholarGoogle Scholar
  80. Wetenschappelijke Raad voor het Regeringsbeleid. 2016. Big Data in een vrije en veilige samenleving (WRR-Rapport 95). WRR, Den Haag. http://www.wrr.nl/publicaties/publicatie/article/ big-data-in-een-vrije-en-veilige-samenleving/Google ScholarGoogle Scholar
  81. Michael R Wick and William B Thompson. 1992. Reconstructive expert system explanation. Artificial Intelligence 54, 1--2 (1992), 33--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Langdon Winner. 1980. Do Artifacts Have Politics? Dædelus 109, 1 (1980), 121--136. http://www.jstor.org/stable/20024652Google ScholarGoogle Scholar
  83. Qian Yang, John Zimmerman, Aaron Steinfeld, Lisa Carey, and James F Antaki. 2016. Investigating the Heart Pump Implant Decision Process: Opportunities for Decision Support Tools to Help. In Proceedings of the 2016 SIGCHI Conference on Human Factors in Computing Systems, CHI'16. 4477--4488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Yunfeng Zhang, Rachel KE Bellamy, and Wendy A Kellogg. 2015. Designing information for remediating cognitive biases in decision-making. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI'15. 2211--2220. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader