research-article

50 Years of Test (Un)fairness: Lessons for Machine Learning

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and TransparencyJanuary 2019Pages 49–58https://doi.org/10.1145/3287560.3287600

Published:29 January 2019Publication History

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

Pages 49–58

ABSTRACT

Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

References

Anne Anastasi. 1961. Psychological tests: Uses and abuses. Teachers College Record (1961).Google Scholar
Philip Ash. 1966. The implications of the Civil Rights Act of 1964 for psychological assessment in industry. American Psychologist 21, 8 (1966), 797.Google ScholarCross Ref
Kunihiro Baba, Ritei Shibata, and Masaaki Sibuya. 2004. Partial correlation and conditional correlation as measures of conditional independence. Australian & New Zealand Journal of Statistics 46, 4 (2004), 657--664.Google ScholarCross Ref
Solon Barocas, Moritz Hardt, and Arvind Naranayan. 2018. Fairness in Machine Learning. http://fairmlbook.org. (2018).Google Scholar
Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2017. Fairness in criminal justice risk assessments: the state of the art. arXiv preprint arXiv:1703.09207 (2017).Google Scholar
Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H. Chi. 2017. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. CoRR abs/1707.00075 (2017). arXiv:1707.00075 http://arxiv.org/abs/1707.00075Google Scholar
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google Scholar
L Elisa Celis, Damian Straszak, and Nisheeth K Vishnoi. 2017. Ranking with fairness constraints. arXiv preprint arXiv:1704.06840 (2017).Google Scholar
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153--163.Google Scholar
T Anne Cleary. 1966. Test bias: Validity of the Scholastic Aptitude Test for Negro and white students in integrated colleges. ETS Research Bulletin Series 1966, 2 (1966), i--23.Google ScholarCross Ref
T. Anne Cleary. 1968. Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal of Educational Measurement 5, 2 (1968), 115--124.Google ScholarCross Ref
T Anne Cleary and Thomas L Hilton. 1968. An investigation of item bias. Educational and Psychological Measurement 28, 1 (1968), 61--75.Google ScholarCross Ref
Irina Cojuharenco and David Patient. 2013. Workplace fairness versus unfairness: Examining the differential salience of facets of organizational justice. Journal of Occupational and Organizational Psychology 86, 3 (2013), 371--393.Google ScholarCross Ref
Nancy S Cole. 1973. Bias in selection. Journal of educational measurement 10, 4 (1973), 237--255.Google ScholarCross Ref
Nancy S Cole and Michael J Zieky. 2001. The new faces of fairness. Journal of Educational Measurement 38, 4 (2001), 369--382.Google ScholarCross Ref
Sam Corbett-Davies, Emma Pierson, Avi Feller, and Sharad Goel. 2016. A computer program used for bail and sentencing decisions was labeled biased against blacks. Its actually not that clear. https:/www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/.(2016).Google Scholar
Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. CoRR abs/1701.08230 (2017). arXiv:1701.08230 http://arxiv.org/abs/1701.08230 Google ScholarDigital Library
National Research Council et al. 1989. Fairness in employment testing: Validity generalization, minority issues, and the General Aptitude Test Battery. National Academies Press.Google Scholar
Richard B Darlington. 1971. Another Look at Cultural Fairness. Journal of Educational Measurement 8, 2 (1971), 71--82.Google ScholarCross Ref
William Dieterich, Christina Mendoza, and Tim Brennan. 2016. COMPAS risk scales: Demonstrating accuracy equity and predictive parity. http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf. (2016).Google Scholar
Neil J Dorans. 2017. Contributions to the Quantitative Assessment of Item, Test, and Score Fairness. In Advancing Human Assessment. Springer, 201--230.Google Scholar
Neil J Dorans and Paul W Holland. 1992. DIF Detection and Description: Mantel-Haenszel and Standardization. ETS Research Report Series 1992, 1 (1992), i---40.Google ScholarCross Ref
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS '12). ACM, New York, NY, USA, 214--226. Google ScholarDigital Library
Hillel J Einhorn and Alan R Bass. 1971. Methodological considerations relevant to discrimination in employment testing. Psychological Bulletin 75, 4 (1971), 261.Google ScholarCross Ref
Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway feedback loops in predictive policing. arXiv preprint arXiv: 1706.09847 (2017).Google Scholar
Ronald L Flaugher. 1974. Bias in Testing: A Review and Discussion. TM Report No. 36. Technical Report. Educational Testing Services.Google Scholar
James R. Foulds and Shimei Pan. 2018. An Intersectional Definition of Fairness. CoRR abs/1807.08362 (2018).Google Scholar
Sorelle A Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im) possibility of fairness. arXiv preprint arXiv: 1609.07236 (2016).Google Scholar
Gabriel Goh, Andrew Cotter, Maya Gupta, and Michael P Friedlander. 2016. Satisfying real-world goals with dataset constraints. In Advances in Neural Information Processing Systems. 2415--2423. Google ScholarDigital Library
Robert M Guion. 1966. Employment tests and discriminatory hiring. Industrial Relations: A Journal of Economy and Society 5, 2 (1966), 20--37.Google ScholarCross Ref
Foad Hamidi, Morgan Klaus Scheuerman, and Stacy M Branham. 2018. Gender Recognition or Gender Reductionism?: The Social Implications of Embedded Gender Recognition Systems. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 8. Google ScholarDigital Library
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3315--3323. http://papers.nips.cc/paper/6374-equality-of-opportunity-in-supervised-learning.pdf Google ScholarDigital Library
Anna Lauren Hoffmann. 2017. Data, technology, and gender: Thinking about (and from) trans lives. In Spaces for the Future. Routledge, 15--25.Google Scholar
John E Hunter and Frank L Schmidt. 1976. Critical analysis of the statistical and ethical implications of various definitions of test bias. Psychological Bulletin 83, 6 (1976), 1053.Google ScholarCross Ref
Christopher Jencks. 1998. Racial bias in testing. The Black-White test score gap 55 (1998), 84.Google Scholar
Arthur R Jensen. 1980. Bias in mental testing. (1980).Google Scholar
Marshall B Jones. 1973. Moderated regression and equal opportunity. Educational and Psychological Measurement 33, 3 (1973), 591--602.Google ScholarCross Ref
Jerome Karabel. 2006. The chosen: The hidden history of admission and exclusion at Harvard, Yale, and Princeton. Houghton Mifflin Harcourt.Google Scholar
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2018. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In ICML.Google Scholar
Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016).Google Scholar
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4066--4076. Google ScholarDigital Library
Jeff Larson, Surya Mau, Lauren Kirchner, and Julia Angwin. 2016. How We Analyzed the COMPAS Recidivism Algorithm. https:/www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. (2016).Google Scholar
Robert L Linn. 1973. Fair test use in selection. Review of Educational Research 43, 2 (1973), 139--161.Google ScholarCross Ref
Robert L Linn. 1976. In search of fair selection procedures. Journal of Educational Measurement 13, 1 (1976), 53--58.Google ScholarCross Ref
Gideon S Mann and Andrew McCallum. 2007. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proceedings of the 24th international conference on Machine learning. ACM, 593--600. Google ScholarDigital Library
Shira Mitchell, Eric Potash, and Solon Barocas. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 (2018).Google Scholar
National Council on Measurement in Education NCME (Ed.). 1976. Journal of Education Measurement. 13, 1 (1976).Google Scholar
Melvin R Novick and Nancy S Petersen. 1976. Towards equalizing educational and employment opportunity. Journal of Educational Measurement 13, 1 (1976), 77--88.Google ScholarCross Ref
Cathy O'Neil. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books. Google ScholarDigital Library
Randall D Penfield. 2016. Fairness in Test Scoring. In Fairness in Educational Assessment and Measurement. Routledge, 71--92.Google Scholar
Nancy S Petersen. 1976. An expected utility model for "optimal" selection. Journal of Educational Statistics 1, 4 (1976), 333--358.Google Scholar
Nancy S Petersen and Melvin R Novick. 1976. An evaluation of some models for culture-fair selection. Journal of Educational Measurement 13, 1 (1976), 3--29.Google ScholarCross Ref
S E Phillips. 2016. Legal Aspects of Test Fairness. In Fairness in Educational Assessment and Measurement, Neil J Dorans and Linda L Cook (Eds.). Routledge, 239--268.Google Scholar
Mitchell F Rice and Brad Baptiste. 1994. Race Norming, Validity Generalization, and Employment Testing. Handbook of Public Personnel Administration 58 (1994), 451.Google Scholar
Hee Jung Ryu, Hartwig Adam, and Margaret Mitchell. 2018. InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity. In Workshop on Fairness, Accountability and Transparency in Machine Learning.Google Scholar
Ronald J Samuda. 1998. Psychological testing of American minorities: Issues and consequences. Vol. 10. Sage.Google Scholar
Richard L Sawyer, Nancy S Cole, and James WL Cole. 1976. Utilities and the issue of fairness in a decision theoretic model for selection. Journal of Educational Measurement 13, 1 (1976), 59--76.Google ScholarCross Ref
Janice Scheuneman. 1979. A method of assessing bias in test items. Journal of Educational Measurement 16, 3 (1979), 143--152.Google ScholarCross Ref
Rajen D Shah and Jonas Peters. 2018. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure. arXiv preprint arXiv:1804.07203 (2018).Google Scholar
Camelia Simoiu, Sam Corbett-Davies, Sharad Goel, et al. 2017. The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics 11, 3 (2017), 1193--1216.Google Scholar
Charles L Thomas. 1973. The Overprediction Phenomenon among Black Collegians: Some Prelinimary Considerations. (1973).Google Scholar
Robert L Thorndike. 1971. Concepts of culture-fairness. Journal of Educational Measurement 8, 2 (1971), 63--70.Google ScholarCross Ref
András Vargha, Tamas Rudas, Harold D Delaney, and Scott E Maxwell. 1996. Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral statistics 21, 3 (1996), 264--282.Google ScholarCross Ref
Frederick E Vars and William G Bowen. 1998. Scholastic aptitude test scores, race, and academic performance in selective colleges and universities. The Black-White test score gap (1998), 457--79.Google Scholar
Kimberly West-Faulcon. 2011. Fairness Feuds: Competing Conceptions of Title VII Discriminatory Testing. Wake Forest L. Rev. 46 (2011), 1035.Google Scholar
Robert L Williams, William Dotson, Patricia Don, and Willie S Williams. 1980. The war against testing: A current status report. The Journal of Negro Education 49, 3 (1980), 263--273.Google ScholarCross Ref
Warren W Willingham and Nancy S Cole. 2013. Gender and fair assessment. Routledge.Google Scholar
Muhammad Bilal Zafar, Isabel Valera, Manuel Rodriguez, Krishna Gummadi, and Adrian Weller. 2017. From parity to preference-based notions of fairness in classification. In Advances in Neural Information Processing Systems. 229--239.Google Scholar
Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. (2018).Google Scholar
Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). 912--919. Google ScholarDigital Library
Rebecca Zwick and Neil J Dorans. 2016. Philosophical Perspectives on Fairness in Educational Assessment. In Fairness in Educational Assessment and Measurement, Neil J Dorans and Linda L Cook (Eds.). Routledge, 267--281.Google Scholar

Index Terms

Recommendations

Airtime Fairness for IEEE 802.11 Multirate Networks

Under a multi rate network scenario, the IEEE 802.11 DCF MAC fails to provide air-time fairness for all competing stations since the protocol is designed for ensuring max-min throughput fairness and the maximum achievable throughput by any station gets ...
Read More
Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach
QShine '08: Proceedings of the 5th International ICST Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness

In this work, we consider the problem of fairness for Transit Access Points (TAP) in multi-hop wireless backhaul networks. Existing approaches are not practical due to the requirement for modifications to the MAC layer or queueing operations of TAPs, or ...
Read More
Dynamic Contention Window Control Mechanism to Achieve Fairness between Uplink and Downlink Flows in IEEE 802.11 Wireless LANs

This paper considers a fairness issue between uplink and downlink flows in IEEE 802.11 wireless LANs, where uplink flows dominate over downlink flows in terms of wireless bandwidth usage. As a solution to this unfairness, we propose modifying the IEEE ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency
January 2019
388 pages
ISBN:9781450361255
DOI:10.1145/3287560

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ML fairness
fairness
history
psychometrics
test fairness
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Upcoming Conference

FAccT '24

The 2024 ACM Conference on Fairness, Accountability, and Transparency

June 3 - 6, 2024

Rio de Janeiro , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 173
  Total Citations
  View Citations
- 3,538
  Total Downloads
- Downloads (Last 12 months)347
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

50 Years of Test (Un)fairness: Lessons for Machine Learning

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Airtime Fairness for IEEE 802.11 Multirate Networks

Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach

Dynamic Contention Window Control Mechanism to Achieve Fairness between Uplink and Downlink Flows in IEEE 802.11 Wireless LANs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

50 Years of Test (Un)fairness: Lessons for Machine Learning

FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Airtime Fairness for IEEE 802.11 Multirate Networks

Fairness in multi-hop wireless backhaul networks: a dynamic estimation approach

Dynamic Contention Window Control Mechanism to Achieve Fairness between Uplink and Downlink Flows in IEEE 802.11 Wireless LANs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media