research-article

Open Access

Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making

Authors:
Michael Veale

University College London, London, United Kingdom

University College London, London, United Kingdom
View Profile

,
Max Van Kleek

University of Oxford, Oxford, Oxfordshire, United Kingdom

University of Oxford, Oxford, Oxfordshire, United Kingdom
View Profile

,
Reuben Binns

University of Oxford, Oxford, Oxfordshire, United Kingdom

University of Oxford, Oxford, Oxfordshire, United Kingdom
View Profile

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsApril 2018Paper No.: 440Pages 1–14https://doi.org/10.1145/3173574.3174014

Published:21 April 2018Publication History

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Pages 1–14

ABSTRACT

Calls for heightened consideration of fairness and accountability in algorithmically-informed public decisions-like taxation, justice, and child protection-are now commonplace. How might designers support such human values? We interviewed 27 public sector machine learning practitioners across 5 OECD countries regarding challenges understanding and imbuing public values into their work. The results suggest a disconnect between organisational and institutional realities, constraints and needs, and those addressed by current research into usable, transparent and 'discrimination-aware' machine learning-absences likely to undermine practical initiatives unless addressed. We see design opportunities in this disconnect, such as in supporting the tracking of concept drift in secondary data sources, and in building usable transparency tools to identify risks and incorporate domain knowledge, aimed both at managers and at the 'street-level bureaucrats' on the frontlines of public service. We conclude by outlining ethical challenges and future directions for collaboration in these high-stakes applications.

Supplemental Material

pn3665-file5.mp4

mp4

10.6 MB

Download

pn3665.mp4

mp4

257.4 MB

Download

References

Monsuru Adepeju, Gabriel Rosser, and Tao Cheng. 2016. Novel evaluation metrics for sparse spatio-temporal point process hotspot predictions-a crime case study. International Journal of Geographical Information Science 30, 11 (2016), 2133--2154. Google ScholarDigital Library
Administrative Data Taskforce. 2012. The UK Administrative Data Research Network: Improving access for research and policy. Economic and Social Research Council. http://www.esrc.ac.uk/files/ research/administrative-data-taskforce-adt/ improving-access-for-research-and-policy/Google Scholar
AI Now. 2016. The AI Now Report: The Social and Economic Implications of Artificial Intelligence Technologies in the Near-Term. https://artificialintelligencenow.com/Google Scholar
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica (2016). https://www.propublica.org/article/ machine-bias-risk-assessments-in-criminal-sentencingGoogle Scholar
Solon Barocas and Andrew D Selbst. 2016. Big Data's Disparate Impact. California Law Review 104 (2016), 671--732.Google Scholar
Richard L Baskerville and A Trevor Wood-Harper. 1996. A critical perspective on action research as a method for information systems research. Journal of Information Technology 11, 3 (1996), 235--246.Google ScholarCross Ref
Gwyn Bevan and Christopher Hood. 2006. What's measured is what matters: Targets and gaming in the English public health care system. Public Administration 84, 3 (2006), 517--538.Google ScholarCross Ref
Julia Black. 2005. The emergence of risk-based regulation and the new public risk management in the United Kingdom. Public Law (2005), 512--549. Issue Autumn. https://perma.cc/Z8AU-4VNNGoogle Scholar
danah boyd. 2016. Undoing the neutrality of Big Data. Florida Law Review Forum 16 (2016), 226--232.Google Scholar
Aurélien Buffat. 2015. Street-level bureaucracy and e-government. Public Management Review 17, 1 (2015), 149--161.Google ScholarCross Ref
Matthew Chalmers and Ian MacColl. 2003. Seamful and seamless design in ubiquitous computing. In Workshop at the crossroads: The interaction of HCI and systems issues in UbiComp, Vol. 8. https://perma.cc/2A3D-NMJPGoogle Scholar
Hsinchun Chen, Homa Atabakhsh, Chunju Tseng, Byron Marshall, Siddharth Kaza, Shauna Eggers, Hemanth Gowda, Ankit Shah, Tim Petersen, and Chuck Violette. 2005. Visualization in law enforcement. In CHI'05 Extended Abstracts on Human Factors in Computing Systems. 1268--1271. Google ScholarDigital Library
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data 5, 2 (2017), 153--163.Google ScholarCross Ref
Cary Coglianese and David Lehr. 2016. Regulating by Robot: Administrative Decision Making in the Machine-Learning Era. Geo. LJ 105 (2016), 1147. https://ssrn.com/abstract=2928293Google Scholar
Nancy J. Cooke. 1994. Varieties of knowledge elicitation techniques. International Journal of Human-Computer Studies 41, 6 (1994), 801--849. Google ScholarDigital Library
Patrick Dunleavy, Helen Margetts, Simon Bastow, and Jane Tinkler. 2006. Digital Era Governance: IT Corporations, the State and e-Government. Oxford University Press, Oxford. Google ScholarDigital Library
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). 214--226. Google ScholarDigital Library
Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. 2003. The role of trust in automation reliance. International Journal of Human-Computer Studies 58, 6 (2003), 697--718. Google ScholarDigital Library
Editor. 2016. More accountability for big-data algorithms. Nature 537, 7621 (2016), 449.Google Scholar
Lilian Edwards and Michael Veale. 2017. Slave to the Algorithm? Why a 'Right to an Explanation' is Probably not the Remedy You are Looking For. Duke Law & Technology Review 16, 1 (2017), 18--84.Google Scholar
Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway Feedback Loops in Predictive Policing. Presented as a talk at the 4th Workshop on Fairness, Accountability and Transparency in Machine Learning (FAT/ML 2017), Halifax, Canada (2017). https://arxiv.org/abs/1706.09847Google Scholar
European Commission. 2017. Tender specifications: Study on Algorithmic Awareness Building, SMART 2017/0055. https://etendering.ted.europa.eu/cft/ cft-document.html?docId=28267Google Scholar
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). 259--268. Google ScholarDigital Library
Gerhard Fischer. 1991. The importance of models in making complex systems comprehensible. In Mental Models and Human-Computer Interaction, MJ Tauber and D Ackermann (Eds.). Elsevier, Noord Holland.Google Scholar
Diana E Forsythe. 1995. Using ethnography in the design of an explanation system. Expert Systems with Applications 8, 4 (1995), 403--417.Google ScholarCross Ref
Batya Friedman and Helen Nissenbaum. 1996. Bias in Computer Systems. ACM Trans. Inf. Syst. 14, 3 (July 1996), 330--347. Google ScholarDigital Library
Robert D Galliers and Frank F Land. 1987. Choosing appropriate information systems research methodologies. Commun. ACM 30, 11 (1987), 901--902. Google ScholarDigital Library
J Gama, Indre Žliobaite, A Bifet, M Pechenizkiy, and A Bouchachia. 2013. A survey on concept drift adaptation. Comput. Surveys 1, 1 (2013). Google ScholarDigital Library
Raphaël Gellert, Katja de Vries, Paul de Hert, and Serge Gutwirth. 2013. A Comparative Analysis of Anti-Discrimination and Data Protection Legislations. In Discrimination and privacy in the information society, Bart Custers, Toon Calders, Bart Schermer, and Tal Zarsky (Eds.). Springer, Heidelberg.Google Scholar
Government Digital Service. 2015. Data science ethical framework. HM Government, London. https://www.gov.uk/government/publications/ data-science-ethical-frameworkGoogle Scholar
Government Office for Science. 2016. Artificial intelligence: Opportunities and implications for the future of decision making. HM Government, London. https://www.gov.uk/government/publications/ artificial-intelligence-an-overview-for-policy-makersGoogle Scholar
Sara Hajian and Josep Domingo-Ferrer. 2012. Direct and indirect discrimination prevention methods. In Discrimination and privacy in the information society, Bart Custers, Toon Calders, Bart Schermer, and Tal Zarsky (Eds.). Springer, Berlin, Heidelberg, 241--254.Google Scholar
Gillian R Hayes. 2011. The relationship of action research to human-computer interaction. ACM Transactions on Computer-Human Interaction (TOCHI) 18, 3 (2011), 15. Google ScholarDigital Library
Robert R Hoffman. 2008. Human factors contributions to knowledge elicitation. Human factors 50, 3 (2008), 481--488.Google Scholar
Robert R Hoffman, Beth Crandall, and Nigel Shadbolt. 1998. Use of the critical decision method to elicit expert knowledge: A case study in the methodology of cognitive task analysis. Human Factors 40, 2 (1998), 254--276.Google ScholarCross Ref
Christopher Hood. 1991. A public management for all seasons? Public Administration 69 (1991), 3--19.Google ScholarCross Ref
V David Hopkin. 1995. Human factors in air traffic control. CRC Press, London.Google Scholar
Robert Hoppe. 2011. The governance of problems: Puzzling, powering and participation. Policy Press.Google Scholar
House of Common Science and Technology Committee. 2016. Robotics and artificial intelligence (HC 145). The House of Commons, London. http://www.publications.parliament.uk/pa/cm201617/ cmselect/cmsctech/145/145.pdfGoogle Scholar
House of Commons Science and Technology Committee. 2016. The big data dilemma (HC 468). House of Commons, London. http://www.publications.parliament. uk/pa/cm201516/cmselect/cmsctech/468/468.pdfGoogle Scholar
Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin I P Rubinstein, and J D Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. 43--58. Google ScholarDigital Library
Nathalie Japkowicz and Mohak Shah. 2011. Evaluating learning algorithms: A classification perspective. Cambridge University Press, Cambridge, UK. Google Scholar
Torben Beck Jørgensen and Barry Bozeman. 2007. Public values: An inventory. Administration & Society 39, 3 (2007), 354--381.Google ScholarCross Ref
Frans Jorna and Pieter Wagenaar. 2007. The 'iron cage' strengthened? Discretion and digital discipline. Public Administration 85, 1 (2007), 189--214.Google ScholarCross Ref
Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (2012), 1--33.Google ScholarDigital Library
Faisal Kamiran, Toon Calders, and Mykola Pechenizkiy. 2010. Discrimination aware decision tree learning. In 2010 IEEE International Conference on Data Mining. 869--874. Google ScholarDigital Library
Kensaku Kawamoto, Caitlin A Houlihan, E Andrew Balas, and David F Lobach. 2005. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. BMJ 330, 7494 (2005), 765.Google Scholar
Sara Kiesler and Jennifer Goetz. 2002. Mental Models of Robotic Assistants. In CHI '02 Extended Abstracts on Human Factors in Computing Systems (CHI EA '02). 576--577. Google ScholarDigital Library
Iacovos Kirlappos, Simon Parkin, and M. Angela Sasse. 2015. "Shadow Security" As a Tool for the Learning Organization. SIGCAS Comput. Soc. 45, 1 (2015), 29--37. Google ScholarDigital Library
Daniel Antony Kolkman, Paolo Campo, Tina Balke-Visser, and Nigel Gilbert. 2016. How to build models for government: Criteria driving model acceptance in policymaking. Policy Sciences 49, 4 (2016), 489--504.Google ScholarCross Ref
Christopher A Le Dantec and W Keith Edwards. 2010. Across boundaries of influence and accountability: The multiple scales of public sector information systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI'10. ACM, 113--122. Google ScholarDigital Library
Michael Lipsky. 2010. Street-level bureaucracy: Dilemmas of the individual in public services. Russell Sage Foundation, New York.Google Scholar
Zachary C Lipton. 2016. The Mythos of Model Interpretability. In 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). New York. https://arxiv.org/abs/1606.03490Google Scholar
Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65 (2017), 211 -- 222. Google ScholarDigital Library
Robin Moore (Ed.). 2015. A compendium of research and analysis on the Offender Assessment System. Ministry of Justice Analytical Series, London. DOI: http://dx.doi.org/https://perma.cc/W2FT-NFWZGoogle Scholar
J. David Morgenthaler, Misha Gridnev, Raluca Sauciuc, and Sanjay Bhansali. 2012. Searching for Build Debt: Experiences Managing Technical Debt at Google. In Proceedings of the Third International Workshop on Managing Technical Debt, MTD'12, Zurich, Switzerland - June 05, 2012. 1--6. Google ScholarDigital Library
Kathleen L. Mosier, Linda J. Skitka, Susan Heers, and Mark Burdick. 1998. Automation Bias: Decision Making and Performance in High-Tech Cockpits. The International Journal of Aviation Psychology 8, 1 (1998), 47--63.Google ScholarCross Ref
Nesta. 2015. Machines that learn in the wild: Machine learning capabilities, limitations and implications. Nesta, London. https://perma.cc/A6AM-GV6XGoogle Scholar
BBC News. 2016. Kent slavery raids 'uncover 21 victims'. BBC News (7 Dec. 2016). https://perma.cc/AM4S-RMHRGoogle Scholar
Donald A Norman. 1983. Some observations on mental models. In Mental Models, Dedre Gentner and Albert L Stevens (Eds.). Psychology Press, New York City, NY, 7--14.Google Scholar
Teresa Odendahl and Aileen M Shaw. 2002. Interviewing elites. Handbook of Interview Research (2002), 299--316.Google Scholar
Marion Oswald, Jamie Grace, Sheena Urwin, and Geoffrey C. Barnes. forthcoming. Algorithmic Risk Assessment Policing Models: Lessons from the Durham Hart Model and 'Experimental' Proportionality. Information & Communications Technology Laws (forthcoming). https://ssrn.com/abstract=3029345Google Scholar
Edward C Page and Bill Jenkins. 2005. Policy bureaucracy: Government with a cast of thousands. Oxford University Press, Oxford.Google Scholar
Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware Data Mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 560--568. Google ScholarDigital Library
Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2009. Dataset shift in machine learning. The MIT Press, Cambridge, MA. Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 1135--1144. Google ScholarDigital Library
D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montréal, Canada - December 07--12, 2015. MIT Press, Cambridge, MA, 2503--2511. https://perma.cc/G6VN-9B86 Google ScholarDigital Library
Nick Seaver. 2013. Knowing algorithms. Media in Transition 8 (2013). https://perma.cc/8USJ-VTWSGoogle Scholar
Nick Seaver. 2014. On reverse engineering: Looking for the cultural work of engineers {blog. Medium (2014). https://medium.com/anthropology-and-algorithms/ on-reverse-engineering-d9f5bae87812Google Scholar
Andrew Selbst. forthcoming. Disparate Impact in Big Data Policing. Georgia Law Review (forthcoming).Google Scholar
Linda J Skitka, Kathleen L Mosier, and Mark Burdick. 1999. Does automation bias decision-making? International Journal of Human-Computer Studies 51 (1999), 991--1006. Google ScholarDigital Library
The Royal Society. 2017. Machine learning: The power and promise of computers that learn by example. The Royal Society, London. https://royalsociety.org/~/ media/policy/projects/machine-learning/publications/ machine-learning-report.pdfGoogle Scholar
The Royal Society and the British Academy. 2017. Data management and use: Governance in the 21st Century. The Royal Society and the British Academy, London. https://royalsociety.org/~/media/policy/projects/ data-governance/data-management-governance.pdfGoogle Scholar
Mary E Thomson, Dilek Önkal, Ali Avcioğu, and Paul Goodwin. 2004. Aviation risk perception: A comparison between experts and novices. Risk Analysis 24, 6 (2004), 1585--1595.Google ScholarCross Ref
Alan B Tickle, Robert Andrews, Mostefa Golea, and Joachim Diederich. 1998. The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks. IEEE Transactions on Neural Networks 9, 6 (1998), 1057--1068. Google ScholarDigital Library
Nikolaj Tollenaar, B. S. J. Wartna, P.G.M Van Der Heijden, and Stefan Bogaerts. 2016. StatRec - Performance, validation and preservability of a static risk prediction instrument. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 129, 1 (2016), 25--44.Google ScholarCross Ref
Joe Tullio, Anind K. Dey, Jason Chalecki, and James Fogarty. 2007. How It Works: A Field Study of Non-technical Users Interacting with an Intelligent System. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07). New York, NY, USA, 31--40. Google ScholarDigital Library
Berk Ustun and Cynthia Rudin. 2016. Supersparse Linear Integer Models for Optimized Medical Scoring Systems. Machine Learning 102, 3 (2016), 349--391. Google ScholarDigital Library
Michael Veale and Reuben Binns. 2017. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2 (2017).Google Scholar
Wetenschappelijke Raad voor het Regeringsbeleid. 2016. Big Data in een vrije en veilige samenleving (WRR-Rapport 95). WRR, Den Haag. http://www.wrr.nl/publicaties/publicatie/article/ big-data-in-een-vrije-en-veilige-samenleving/Google Scholar
Michael R Wick and William B Thompson. 1992. Reconstructive expert system explanation. Artificial Intelligence 54, 1--2 (1992), 33--70. Google ScholarDigital Library
Langdon Winner. 1980. Do Artifacts Have Politics? Dædelus 109, 1 (1980), 121--136. http://www.jstor.org/stable/20024652Google Scholar
Qian Yang, John Zimmerman, Aaron Steinfeld, Lisa Carey, and James F Antaki. 2016. Investigating the Heart Pump Implant Decision Process: Opportunities for Decision Support Tools to Help. In Proceedings of the 2016 SIGCHI Conference on Human Factors in Computing Systems, CHI'16. 4477--4488. Google ScholarDigital Library
Yunfeng Zhang, Rachel KE Bellamy, and Wendy A Kellogg. 2015. Designing information for remediating cognitive biases in decision-making. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI'15. 2211--2220. Google ScholarDigital Library

Index Terms

Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making

Recommendations

Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on Algorithmic Decision-making in Child Welfare Services
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Algorithmic decision-making systems are increasingly being adopted by government public service agencies. Researchers, policy experts, and civil rights groups have all voiced concerns that such systems are being deployed without adequate consideration ...
Read More
Algorithmic Decision Making in Public Administration: A CSCW-Perspective
GROUP '20: Companion Proceedings of the 2020 ACM International Conference on Supporting Group Work

In this paper, I propose a study of algorithmic decision making in public administration from a computer supported cooperative work (CSCW) perspective. Each day the public administration makes thousands of decisions with consequences for the welfare of ...
Read More
Public Works and Infrastructure: Improvement Initiative for Federal Government in Mexico
dg.o '16: Proceedings of the 17th International Digital Government Research Conference on Digital Government Research

The Ministry of Communications and Transportation of the Federal Government in Mexico adopted the principles of the Strategy of Opening Government data. So in this poster we describe "Follow Public Works and Infrastructure", an initiative for Public ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
April 2018
8489 pages
ISBN:9781450356206
DOI:10.1145/3173574
General Chairs:
Regan Mandryk
University of Saskatchewan, Canada
,
Mark Hancock
University of Waterloo, Canada
,
Program Chairs:
Mark Perry
Brunel University London, UK
,
Anna Cox
University College London, UK
Copyright © 2018 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithmic accountability
algorithmic bias
decision-support
predictive policing
public administration
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '18 Paper Acceptance Rate666of2,590submissions,26%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 218
  Total Citations
  View Citations
- 8,009
  Total Downloads
- Downloads (Last 12 months)1,332
- Downloads (Last 6 weeks)163
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fairness and Accountability Design Needs for Algorithmic Support in High-Stakes Public Sector Decision-Making

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Toward Algorithmic Accountability in Public Services: A Qualitative Study of Affected Community Perspectives on Algorithmic Decision-making in Child Welfare Services

Algorithmic Decision Making in Public Administration: A CSCW-Perspective

Public Works and Infrastructure: Improvement Initiative for Federal Government in Mexico