skip to main content
10.1145/3287560.3287600acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

50 Years of Test (Un)fairness: Lessons for Machine Learning

Published:29 January 2019Publication History

ABSTRACT

Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

References

  1. Anne Anastasi. 1961. Psychological tests: Uses and abuses. Teachers College Record (1961).Google ScholarGoogle Scholar
  2. Philip Ash. 1966. The implications of the Civil Rights Act of 1964 for psychological assessment in industry. American Psychologist 21, 8 (1966), 797.Google ScholarGoogle ScholarCross RefCross Ref
  3. Kunihiro Baba, Ritei Shibata, and Masaaki Sibuya. 2004. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 46, 4 (2004), 657--664.Google ScholarGoogle ScholarCross RefCross Ref
  4. Solon Barocas, Moritz Hardt, and Arvind Naranayan. 2018. Fairness in Machine Learning. http://fairmlbook.org. (2018).Google ScholarGoogle Scholar
  5. Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2017. Fairness in criminal justice risk assessments: the state of the art. arXiv preprint arXiv:1703.09207 (2017).Google ScholarGoogle Scholar
  6. Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H. Chi. 2017. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. CoRR abs/1707.00075 (2017). arXiv:1707.00075 http://arxiv.org/abs/1707.00075Google ScholarGoogle Scholar
  7. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google ScholarGoogle Scholar
  8. L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2017. Ranking with fairness constraints. arXiv preprint arXiv:1704.06840 (2017).Google ScholarGoogle Scholar
  9. Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153--163.Google ScholarGoogle Scholar
  10. T Anne Cleary. 1966. Test bias: Validity of the Scholastic Aptitude Test for Negro and white students in integrated colleges. ETS Research Bulletin Series 1966, 2 (1966), i--23.Google ScholarGoogle ScholarCross RefCross Ref
  11. T. Anne Cleary. 1968. Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement 5, 2 (1968), 115--124.Google ScholarGoogle ScholarCross RefCross Ref
  12. T Anne Cleary and Thomas L Hilton. 1968. An investigation of item bias. Educational and Psychological Measurement 28, 1 (1968), 61--75.Google ScholarGoogle ScholarCross RefCross Ref
  13. Irina Cojuharenco and David Patient. 2013. Workplace fairness versus unfairness: Examining the differential salience of facets of organizational justice. Journal of Occupational and Organizational Psychology 86, 3 (2013), 371--393.Google ScholarGoogle ScholarCross RefCross Ref
  14. Nancy S Cole. 1973. Bias in selection. Journal of educational measurement 10, 4 (1973), 237--255.Google ScholarGoogle ScholarCross RefCross Ref
  15. Nancy S Cole and Michael J Zieky. 2001. The new faces of fairness. Journal of Educational Measurement 38, 4 (2001), 369--382.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sam Corbett-Davies, Emma Pierson, Avi Feller, and Sharad Goel. 2016. A computer program used for bail and sentencing decisions was labeled biased against blacks. Its actually not that clear. https:/www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/.(2016).Google ScholarGoogle Scholar
  17. Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. CoRR abs/1701.08230 (2017). arXiv:1701.08230 http://arxiv.org/abs/1701.08230 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. National Research Council et al. 1989. Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. National Academies Press.Google ScholarGoogle Scholar
  19. Richard B Darlington. 1971. Another Look at Cultural Fairness. Journal of Educational Measurement 8, 2 (1971), 71--82.Google ScholarGoogle ScholarCross RefCross Ref
  20. William Dieterich, Christina Mendoza, and Tim Brennan. 2016. COMPAS risk scales: Demonstrating accuracy equity and predictive parity. http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf. (2016).Google ScholarGoogle Scholar
  21. Neil J Dorans. 2017. Contributions to the Quantitative Assessment of Item, Test, and Score Fairness. In Advancing Human Assessment. Springer, 201--230.Google ScholarGoogle Scholar
  22. Neil J Dorans and Paul W Holland. 1992. DIF Detection and Description: Mantel-Haenszel and Standardization. ETS Research Report Series 1992, 1 (1992), i---40.Google ScholarGoogle ScholarCross RefCross Ref
  23. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hillel J Einhorn and Alan R Bass. 1971. Methodological considerations relevant to discrimination in employment testing. Psychological Bulletin 75, 4 (1971), 261.Google ScholarGoogle ScholarCross RefCross Ref
  25. Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway feedback loops in predictive policing. arXiv preprint arXiv: 1706.09847 (2017).Google ScholarGoogle Scholar
  26. Ronald L Flaugher. 1974. Bias in Testing: A Review and Discussion. TM Report No. 36. Technical Report. Educational Testing Services.Google ScholarGoogle Scholar
  27. James R. Foulds and Shimei Pan. 2018. An Intersectional Definition of Fairness. CoRR abs/1807.08362 (2018).Google ScholarGoogle Scholar
  28. Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im) possibility of fairness. arXiv preprint arXiv: 1609.07236 (2016).Google ScholarGoogle Scholar
  29. Gabriel Goh, Andrew Cotter, Maya Gupta, and Michael P Friedlander. 2016. Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems. 2415--2423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Robert M Guion. 1966. Employment tests and discriminatory hiring. Industrial Relations: A Journal of Economy and Society 5, 2 (1966), 20--37.Google ScholarGoogle ScholarCross RefCross Ref
  31. Foad Hamidi, Morgan Klaus Scheuerman, and Stacy M Branham. 2018. Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3315--3323. http://papers.nips.cc/paper/6374-equality-of-opportunity-in-supervised-learning.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Anna Lauren Hoffmann. 2017. Data, technology, and gender: Thinking about (and from) trans lives. In Spaces for the Future. Routledge, 15--25.Google ScholarGoogle Scholar
  34. John E Hunter and Frank L Schmidt. 1976. Critical analysis of the statistical and ethical implications of various definitions of test bias. Psychological Bulletin 83, 6 (1976), 1053.Google ScholarGoogle ScholarCross RefCross Ref
  35. Christopher Jencks. 1998. Racial bias in testing. The Black-White test score gap 55 (1998), 84.Google ScholarGoogle Scholar
  36. Arthur R Jensen. 1980. Bias in mental testing. (1980).Google ScholarGoogle Scholar
  37. Marshall B Jones. 1973. Moderated regression and equal opportunity. Educational and Psychological Measurement 33, 3 (1973), 591--602.Google ScholarGoogle ScholarCross RefCross Ref
  38. Jerome Karabel. 2006. The chosen: The hidden history of admission and exclusion at Harvard, Yale, and Princeton. Houghton Mifflin Harcourt.Google ScholarGoogle Scholar
  39. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In ICML.Google ScholarGoogle Scholar
  40. Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).Google ScholarGoogle Scholar
  41. Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4066--4076. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jeff Larson, Surya Mau, Lauren Kirchner, and Julia Angwin. 2016. How We Analyzed the COMPAS Recidivism Algorithm. https:/www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. (2016).Google ScholarGoogle Scholar
  43. Robert L Linn. 1973. Fair test use in selection. Review of Educational Research 43, 2 (1973), 139--161.Google ScholarGoogle ScholarCross RefCross Ref
  44. Robert L Linn. 1976. In search of fair selection procedures. Journal of Educational Measurement 13, 1 (1976), 53--58.Google ScholarGoogle ScholarCross RefCross Ref
  45. Gideon S Mann and Andrew McCallum. 2007. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proceedings of the 24th international conference on Machine learning. ACM, 593--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shira Mitchell, Eric Potash, and Solon Barocas. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 (2018).Google ScholarGoogle Scholar
  47. National Council on Measurement in Education NCME (Ed.). 1976. Journal of Education Measurement. 13, 1 (1976).Google ScholarGoogle Scholar
  48. Melvin R Novick and Nancy S Petersen. 1976. Towards equalizing educational and employment opportunity. Journal of Educational Measurement 13, 1 (1976), 77--88.Google ScholarGoogle ScholarCross RefCross Ref
  49. Cathy O'Neil. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Randall D Penfield. 2016. Fairness in Test Scoring. In Fairness in Educational Assessment and Measurement. Routledge, 71--92.Google ScholarGoogle Scholar
  51. Nancy S Petersen. 1976. An expected utility model for "optimal" selection. Journal of Educational Statistics 1, 4 (1976), 333--358.Google ScholarGoogle Scholar
  52. Nancy S Petersen and Melvin R Novick. 1976. An evaluation of some models for culture-fair selection. Journal of Educational Measurement 13, 1 (1976), 3--29.Google ScholarGoogle ScholarCross RefCross Ref
  53. S E Phillips. 2016. Legal Aspects of Test Fairness. In Fairness in Educational Assessment and Measurement, Neil J Dorans and Linda L Cook (Eds.). Routledge, 239--268.Google ScholarGoogle Scholar
  54. Mitchell F Rice and Brad Baptiste. 1994. Race Norming, Validity Generalization, and Employment Testing. Handbook of Public Personnel Administration 58 (1994), 451.Google ScholarGoogle Scholar
  55. Hee Jung Ryu, Hartwig Adam, and Margaret Mitchell. 2018. InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity. In Workshop on Fairness, Accountability and Transparency in Machine Learning.Google ScholarGoogle Scholar
  56. Ronald J Samuda. 1998. Psychological testing of American minorities: Issues and consequences. Vol. 10. Sage.Google ScholarGoogle Scholar
  57. Richard L Sawyer, Nancy S Cole, and James WL Cole. 1976. Utilities and the issue of fairness in a decision theoretic model for selection. Journal of Educational Measurement 13, 1 (1976), 59--76.Google ScholarGoogle ScholarCross RefCross Ref
  58. Janice Scheuneman. 1979. A method of assessing bias in test items. Journal of Educational Measurement 16, 3 (1979), 143--152.Google ScholarGoogle ScholarCross RefCross Ref
  59. Rajen D Shah and Jonas Peters. 2018. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure. arXiv preprint arXiv:1804.07203 (2018).Google ScholarGoogle Scholar
  60. Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. 2017. The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics 11, 3 (2017), 1193--1216.Google ScholarGoogle Scholar
  61. Charles L Thomas. 1973. The Overprediction Phenomenon among Black Collegians: Some Prelinimary Considerations. (1973).Google ScholarGoogle Scholar
  62. Robert L Thorndike. 1971. Concepts of culture-fairness. Journal of Educational Measurement 8, 2 (1971), 63--70.Google ScholarGoogle ScholarCross RefCross Ref
  63. András Vargha, Tamas Rudas, Harold D Delaney, and Scott E Maxwell. 1996. Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral statistics 21, 3 (1996), 264--282.Google ScholarGoogle ScholarCross RefCross Ref
  64. Frederick E Vars and William G Bowen. 1998. Scholastic aptitude test scores, race, and academic performance in selective colleges and universities. The Black-White test score gap (1998), 457--79.Google ScholarGoogle Scholar
  65. Kimberly West-Faulcon. 2011. Fairness Feuds: Competing Conceptions of Title VII Discriminatory Testing. Wake Forest L. Rev. 46 (2011), 1035.Google ScholarGoogle Scholar
  66. Robert L Williams, William Dotson, Patricia Don, and Willie S Williams. 1980. The war against testing: A current status report. The Journal of Negro Education 49, 3 (1980), 263--273.Google ScholarGoogle ScholarCross RefCross Ref
  67. Warren W Willingham and Nancy S Cole. 2013. Gender and fair assessment. Routledge.Google ScholarGoogle Scholar
  68. Muhammad Bilal Zafar, Isabel Valera, Manuel Rodriguez, Krishna Gummadi, and Adrian Weller. 2017. From parity to preference-based notions of fairness in classification. In Advances in Neural Information Processing Systems. 229--239.Google ScholarGoogle Scholar
  69. Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. (2018).Google ScholarGoogle Scholar
  70. Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). 912--919. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Rebecca Zwick and Neil J Dorans. 2016. Philosophical Perspectives on Fairness in Educational Assessment. In Fairness in Educational Assessment and Measurement, Neil J Dorans and Linda L Cook (Eds.). Routledge, 267--281.Google ScholarGoogle Scholar

Index Terms

  1. 50 Years of Test (Un)fairness: Lessons for Machine Learning

                      Recommendations

                      Comments

                      Login options

                      Check if you have access through your login credentials or your institution to get full access on this article.

                      Sign in
                      • Published in

                        cover image ACM Conferences
                        FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
                        January 2019
                        388 pages
                        ISBN:9781450361255
                        DOI:10.1145/3287560

                        Copyright © 2019 ACM

                        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                        Publisher

                        Association for Computing Machinery

                        New York, NY, United States

                        Publication History

                        • Published: 29 January 2019

                        Permissions

                        Request permissions about this article.

                        Request Permissions

                        Check for updates

                        Qualifiers

                        • research-article
                        • Research
                        • Refereed limited

                        Upcoming Conference

                        FAccT '24

                      PDF Format

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader