ABSTRACT
Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.
- Anne Anastasi. 1961. Psychological tests: Uses and abuses. Teachers College Record (1961).Google Scholar
- Philip Ash. 1966. The implications of the Civil Rights Act of 1964 for psychological assessment in industry. American Psychologist 21, 8 (1966), 797.Google ScholarCross Ref
- Kunihiro Baba, Ritei Shibata, and Masaaki Sibuya. 2004. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 46, 4 (2004), 657--664.Google ScholarCross Ref
- Solon Barocas, Moritz Hardt, and Arvind Naranayan. 2018. Fairness in Machine Learning. http://fairmlbook.org. (2018).Google Scholar
- Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2017. Fairness in criminal justice risk assessments: the state of the art. arXiv preprint arXiv:1703.09207 (2017).Google Scholar
- Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H. Chi. 2017. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. CoRR abs/1707.00075 (2017). arXiv:1707.00075 http://arxiv.org/abs/1707.00075Google Scholar
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google Scholar
- L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2017. Ranking with fairness constraints. arXiv preprint arXiv:1704.06840 (2017).Google Scholar
- Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153--163.Google Scholar
- T Anne Cleary. 1966. Test bias: Validity of the Scholastic Aptitude Test for Negro and white students in integrated colleges. ETS Research Bulletin Series 1966, 2 (1966), i--23.Google ScholarCross Ref
- T. Anne Cleary. 1968. Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement 5, 2 (1968), 115--124.Google ScholarCross Ref
- T Anne Cleary and Thomas L Hilton. 1968. An investigation of item bias. Educational and Psychological Measurement 28, 1 (1968), 61--75.Google ScholarCross Ref
- Irina Cojuharenco and David Patient. 2013. Workplace fairness versus unfairness: Examining the differential salience of facets of organizational justice. Journal of Occupational and Organizational Psychology 86, 3 (2013), 371--393.Google ScholarCross Ref
- Nancy S Cole. 1973. Bias in selection. Journal of educational measurement 10, 4 (1973), 237--255.Google ScholarCross Ref
- Nancy S Cole and Michael J Zieky. 2001. The new faces of fairness. Journal of Educational Measurement 38, 4 (2001), 369--382.Google ScholarCross Ref
- Sam Corbett-Davies, Emma Pierson, Avi Feller, and Sharad Goel. 2016. A computer program used for bail and sentencing decisions was labeled biased against blacks. Its actually not that clear. https:/www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/.(2016).Google Scholar
- Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. CoRR abs/1701.08230 (2017). arXiv:1701.08230 http://arxiv.org/abs/1701.08230 Google ScholarDigital Library
- National Research Council et al. 1989. Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. National Academies Press.Google Scholar
- Richard B Darlington. 1971. Another Look at Cultural Fairness. Journal of Educational Measurement 8, 2 (1971), 71--82.Google ScholarCross Ref
- William Dieterich, Christina Mendoza, and Tim Brennan. 2016. COMPAS risk scales: Demonstrating accuracy equity and predictive parity. http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf. (2016).Google Scholar
- Neil J Dorans. 2017. Contributions to the Quantitative Assessment of Item, Test, and Score Fairness. In Advancing Human Assessment. Springer, 201--230.Google Scholar
- Neil J Dorans and Paul W Holland. 1992. DIF Detection and Description: Mantel-Haenszel and Standardization. ETS Research Report Series 1992, 1 (1992), i---40.Google ScholarCross Ref
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. Google ScholarDigital Library
- Hillel J Einhorn and Alan R Bass. 1971. Methodological considerations relevant to discrimination in employment testing. Psychological Bulletin 75, 4 (1971), 261.Google ScholarCross Ref
- Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway feedback loops in predictive policing. arXiv preprint arXiv: 1706.09847 (2017).Google Scholar
- Ronald L Flaugher. 1974. Bias in Testing: A Review and Discussion. TM Report No. 36. Technical Report. Educational Testing Services.Google Scholar
- James R. Foulds and Shimei Pan. 2018. An Intersectional Definition of Fairness. CoRR abs/1807.08362 (2018).Google Scholar
- Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im) possibility of fairness. arXiv preprint arXiv: 1609.07236 (2016).Google Scholar
- Gabriel Goh, Andrew Cotter, Maya Gupta, and Michael P Friedlander. 2016. Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems. 2415--2423. Google ScholarDigital Library
- Robert M Guion. 1966. Employment tests and discriminatory hiring. Industrial Relations: A Journal of Economy and Society 5, 2 (1966), 20--37.Google ScholarCross Ref
- Foad Hamidi, Morgan Klaus Scheuerman, and Stacy M Branham. 2018. Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 8. Google ScholarDigital Library
- Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3315--3323. http://papers.nips.cc/paper/6374-equality-of-opportunity-in-supervised-learning.pdf Google ScholarDigital Library
- Anna Lauren Hoffmann. 2017. Data, technology, and gender: Thinking about (and from) trans lives. In Spaces for the Future. Routledge, 15--25.Google Scholar
- John E Hunter and Frank L Schmidt. 1976. Critical analysis of the statistical and ethical implications of various definitions of test bias. Psychological Bulletin 83, 6 (1976), 1053.Google ScholarCross Ref
- Christopher Jencks. 1998. Racial bias in testing. The Black-White test score gap 55 (1998), 84.Google Scholar
- Arthur R Jensen. 1980. Bias in mental testing. (1980).Google Scholar
- Marshall B Jones. 1973. Moderated regression and equal opportunity. Educational and Psychological Measurement 33, 3 (1973), 591--602.Google ScholarCross Ref
- Jerome Karabel. 2006. The chosen: The hidden history of admission and exclusion at Harvard, Yale, and Princeton. Houghton Mifflin Harcourt.Google Scholar
- Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In ICML.Google Scholar
- Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).Google Scholar
- Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4066--4076. Google ScholarDigital Library
- Jeff Larson, Surya Mau, Lauren Kirchner, and Julia Angwin. 2016. How We Analyzed the COMPAS Recidivism Algorithm. https:/www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. (2016).Google Scholar
- Robert L Linn. 1973. Fair test use in selection. Review of Educational Research 43, 2 (1973), 139--161.Google ScholarCross Ref
- Robert L Linn. 1976. In search of fair selection procedures. Journal of Educational Measurement 13, 1 (1976), 53--58.Google ScholarCross Ref
- Gideon S Mann and Andrew McCallum. 2007. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proceedings of the 24th international conference on Machine learning. ACM, 593--600. Google ScholarDigital Library
- Shira Mitchell, Eric Potash, and Solon Barocas. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 (2018).Google Scholar
- National Council on Measurement in Education NCME (Ed.). 1976. Journal of Education Measurement. 13, 1 (1976).Google Scholar
- Melvin R Novick and Nancy S Petersen. 1976. Towards equalizing educational and employment opportunity. Journal of Educational Measurement 13, 1 (1976), 77--88.Google ScholarCross Ref
- Cathy O'Neil. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books. Google ScholarDigital Library
- Randall D Penfield. 2016. Fairness in Test Scoring. In Fairness in Educational Assessment and Measurement. Routledge, 71--92.Google Scholar
- Nancy S Petersen. 1976. An expected utility model for "optimal" selection. Journal of Educational Statistics 1, 4 (1976), 333--358.Google Scholar
- Nancy S Petersen and Melvin R Novick. 1976. An evaluation of some models for culture-fair selection. Journal of Educational Measurement 13, 1 (1976), 3--29.Google ScholarCross Ref
- S E Phillips. 2016. Legal Aspects of Test Fairness. In Fairness in Educational Assessment and Measurement, Neil J Dorans and Linda L Cook (Eds.). Routledge, 239--268.Google Scholar
- Mitchell F Rice and Brad Baptiste. 1994. Race Norming, Validity Generalization, and Employment Testing. Handbook of Public Personnel Administration 58 (1994), 451.Google Scholar
- Hee Jung Ryu, Hartwig Adam, and Margaret Mitchell. 2018. InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity. In Workshop on Fairness, Accountability and Transparency in Machine Learning.Google Scholar
- Ronald J Samuda. 1998. Psychological testing of American minorities: Issues and consequences. Vol. 10. Sage.Google Scholar
- Richard L Sawyer, Nancy S Cole, and James WL Cole. 1976. Utilities and the issue of fairness in a decision theoretic model for selection. Journal of Educational Measurement 13, 1 (1976), 59--76.Google ScholarCross Ref
- Janice Scheuneman. 1979. A method of assessing bias in test items. Journal of Educational Measurement 16, 3 (1979), 143--152.Google ScholarCross Ref
- Rajen D Shah and Jonas Peters. 2018. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure. arXiv preprint arXiv:1804.07203 (2018).Google Scholar
- Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. 2017. The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics 11, 3 (2017), 1193--1216.Google Scholar
- Charles L Thomas. 1973. The Overprediction Phenomenon among Black Collegians: Some Prelinimary Considerations. (1973).Google Scholar
- Robert L Thorndike. 1971. Concepts of culture-fairness. Journal of Educational Measurement 8, 2 (1971), 63--70.Google ScholarCross Ref
- András Vargha, Tamas Rudas, Harold D Delaney, and Scott E Maxwell. 1996. Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral statistics 21, 3 (1996), 264--282.Google ScholarCross Ref
- Frederick E Vars and William G Bowen. 1998. Scholastic aptitude test scores, race, and academic performance in selective colleges and universities. The Black-White test score gap (1998), 457--79.Google Scholar
- Kimberly West-Faulcon. 2011. Fairness Feuds: Competing Conceptions of Title VII Discriminatory Testing. Wake Forest L. Rev. 46 (2011), 1035.Google Scholar
- Robert L Williams, William Dotson, Patricia Don, and Willie S Williams. 1980. The war against testing: A current status report. The Journal of Negro Education 49, 3 (1980), 263--273.Google ScholarCross Ref
- Warren W Willingham and Nancy S Cole. 2013. Gender and fair assessment. Routledge.Google Scholar
- Muhammad Bilal Zafar, Isabel Valera, Manuel Rodriguez, Krishna Gummadi, and Adrian Weller. 2017. From parity to preference-based notions of fairness in classification. In Advances in Neural Information Processing Systems. 229--239.Google Scholar
- Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. (2018).Google Scholar
- Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). 912--919. Google ScholarDigital Library
- Rebecca Zwick and Neil J Dorans. 2016. Philosophical Perspectives on Fairness in Educational Assessment. In Fairness in Educational Assessment and Measurement, Neil J Dorans and Linda L Cook (Eds.). Routledge, 267--281.Google Scholar
Index Terms
- 50 Years of Test (Un)fairness: Lessons for Machine Learning
Recommendations
Airtime Fairness for IEEE 802.11 Multirate Networks
Under a multi rate network scenario, the IEEE 802.11 DCF MAC fails to provide air-time fairness for all competing stations since the protocol is designed for ensuring max-min throughput fairness and the maximum achievable throughput by any station gets ...
Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach
QShine '08: Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and RobustnessIn this work, we consider the problem of fairness for Transit Access Points (TAP) in multi-hop wireless backhaul networks. Existing approaches are not practical due to the requirement for modifications to the MAC layer or queueing operations of TAPs, or ...
Dynamic Contention Window Control Mechanism to Achieve Fairness between Uplink and Downlink Flows in IEEE 802.11 Wireless LANs
This paper considers a fairness issue between uplink and downlink flows in IEEE 802.11 wireless LANs, where uplink flows dominate over downlink flows in terms of wireless bandwidth usage. As a solution to this unfairness, we propose modifying the IEEE ...
Comments