Abstract
In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. In this article, we introduce the problem of discovering discrimination through data mining in a dataset of historical decision records, taken by humans or by automatic systems. We formalize the processes of direct and indirect discrimination discovery by modelling protected-by-law groups and contexts where discrimination occurs in a classification rule based syntax. Basically, classification rules extracted from the dataset allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is formalized by an extension of the lift measure of a classification rule. In direct discrimination, the extracted rules can be directly mined in search of discriminatory contexts. In indirect discrimination, the mining process needs some background knowledge as a further input, for example, census data, that combined with the extracted rules might allow for unveiling contexts of discriminatory decisions. A strategy adopted for combining extracted classification rules with background knowledge is called an inference model. In this article, we propose two inference models and provide automatic procedures for their implementation. An empirical assessment of our results is provided on the German credit dataset and on the PKDD Discovery Challenge 1999 financial dataset.
Supplemental Material
Available for Download
Online appendix to data mining for discrimination discovery on article 9.
- Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the International Conference on Very Large Databases. Morgan Kaufmann, 487--499. Google ScholarDigital Library
- Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 439--450. Google ScholarDigital Library
- Australian Legislation. 2009. (a) Equal Opportunity Act—Victoria State, (b) Anti-Discrimination Act—Queensland State. http://www.austlii.edu.au.Google Scholar
- Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., and Vanthienen, J. 2003. Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Resear. Soc. 54, 6, 627--635.Google ScholarCross Ref
- Becker, G. S. 1957. The Economics of Discrimination. University of Chicago Press.Google Scholar
- Berka, P. 1999. PKDD 1999 discovery challenge. http://lisp.vse.cz/challenge.Google Scholar
- Chien, C.-F. and Chen, L. 2008. Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry. Exp. Syst. Appl. 34, 1, 280--290. Google ScholarDigital Library
- Clifton, C. 2003. Privacy preserving data mining: How do we mine data when we aren't allowed to see it? In Procedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Tutorial. http://www.cs.purdue.edu/homes/clifton.Google Scholar
- European Union Legislation. 2009. (a) Racial Equality Directive, (b) Employment Equality Directive. http://ec.europa.eu/employment_social/fundamental_rights.Google Scholar
- Gastwirth, J. L. 1992. Statistical reasoning in the legal setting. Amer. Statist. 46, 1, 55--69.Google Scholar
- Goethals, B. 2009. Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi.Google Scholar
- Hand, D. J. 2001. Modelling consumer credit risk. IMA J. Manag. Math. 12, 139--155.Google ScholarCross Ref
- Hand, D. J. and Henley, W. E. 1997. Statistical classification methods in consumer credit scoring: A review. J. Royal Statist. Soc. Series A 160, 523--541.Google ScholarCross Ref
- Harford, T. 2008. Logic of Life. The Random House.Google Scholar
- Hintoglu, A. A., Inan, A., Saygin, Y., and Keskinöz, M. 2005. Suppressing data sets to prevent discovery of association rules. In Proceedings of the IEEE International Conference on Data Mining. IEEE Computer Society, 645--648. Google ScholarDigital Library
- Holzer, H., Raphael, S., and Stoll, M. 2004. Black job applicants and the hiring officer's race. Industr. Labor Relat. Rev. 57, 2, 267--287.Google ScholarCross Ref
- Holzer, H. J. and Neumark, D. 2006. Affirmative action: What do we know? J. Policy Anal. Manag. 25, 463--490.Google ScholarCross Ref
- Hunter, R. 1992. Indirect Discrimination in the Workplace. The Federation Press.Google Scholar
- Kamiran, F. and Calders, T. 2009. Classification without discrimination. In Proceedings of the IEEE International Conference on Computer, Control & Communication. IEEE Press.Google Scholar
- Kaye, D. and Aickin, M., Eds. 1992. Statistical Methods in Discrimination Litigation. Marcel Dekker, Inc.Google Scholar
- Knopff, R. 1986. On proving discrimination: Statistical methods and unfolding policy logics. Canad. Pub. Policy 12, 573--583.Google ScholarCross Ref
- Knuth, D. 1997. Fundamental Algorithms. Addison-Wesley.Google Scholar
- Kuhn, P. 1987. Sex discrimination in labor markets: The role of statistical evidence. Amer. Econ. Rev. 77, 567--583.Google Scholar
- LaCour-Little, M. 1999. Discrimination in mortgage lending: A critical review of the literature. J. Real Estate Lit. 7, 15--50.Google ScholarCross Ref
- Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. AAAI Press, 80--86.Google Scholar
- Liu, K. 2009. Privacy preserving data mining bibliography. http://www.csee.umbc.edu/~kunliu1/research/privacy_review.html.Google Scholar
- Makkonen, T. 2007. Measuring discrimination: Data collection and the EU equality law. http://ec.europa.eu/employment_social/fundamental_rights.Google Scholar
- Newman, D., Hettich, S., Blake, C., and Merz, C. 1998. UCI repository of machine learning databases. http://archive.ics.uci.edu/ml.Google Scholar
- Pedreschi, D., Ruggieri, S., and Turini, F. 2008. Discrimination-aware data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 560--568. Google ScholarDigital Library
- Pedreschi, D., Ruggieri, S., and Turini, F. 2009. Measuring discrimination in socially-sensitive decision records. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 581--592.Google Scholar
- Piette, M. J. and White, P. F. 1999. Approaches for dealing with small sample sizes in employment discrimination litigation. J. Foren. Econ. 12, 43--56.Google ScholarCross Ref
- Rauch, J. 2005. Logic of association rules. Appl. Intell. 22, 1, 9--28. Google ScholarDigital Library
- Rauch, J. and Simunek, M. 2001. Mining for association rules by 4ft-Miner. In Proceedings of the INAP 2001. Prolog Association of Japan, 285--295.Google Scholar
- Rauch, J. and Simunek, M. 2009. 4-ft Miner Procedure. http://lispminer.vse.cz.Google Scholar
- Riach, P. A. and Rich, J. 2002. Field experiments of discrimination in the market place. Econ. J. 112, 480--518.Google ScholarCross Ref
- Squires, G. D. 2003. Racial profiling, insurance style: Insurance redlining and the uneven development of metropolitan areas. J. Urban Affairs 25, 4, 391--410.Google ScholarCross Ref
- Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the International Conference on Very Large Databases. Morgan Kaufmann, 407--419. Google ScholarDigital Library
- Sweeney, L. 2001. Computational disclosure control: A primer on data privacy protection. Ph.D. thesis, MIT, Cambridge, MA. Google ScholarDigital Library
- Sweeney, L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzz. Knowl.-Bas. Syst. 10, 5, 571--588. Google ScholarDigital Library
- Tan, P.-N., Kumar, V., and Srivastava, J. 2004. Selecting the right objective measure for association analysis. Inform. Syst. 29, 4, 293--313. Google ScholarDigital Library
- Thomas, L. C. 2000. A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. Int. J. Forecast. 16, 149--172.Google ScholarCross Ref
- U.K. Legislation. 2009. (a) Sex Discrimination Act, (b) Race Relation Act. http://www.statutelaw.gov.uk.Google Scholar
- U.S. Federal Legislation. 2009. (a) Equal Credit Opportunity Act, (b) Fair Housing Act, (c) Intentional Employment Discrimination, (d) Equal Pay Act, (e) Pregnancy Discrimination Act. http://www.usdoj.gov.Google Scholar
- Vaidya, J., Clifton, C. W., and Zhu, Y. M. 2006. Privacy Preserving Data Mining. Advances in Information Security. Springer. Google ScholarDigital Library
- Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y., and Dasseni, E. 2004. Association rule hiding. IEEE Trans. Knowl. Data Engin. 16, 4, 434--447. Google ScholarDigital Library
- Viaene, S., Derrig, R. A., Baesens, B., and Dedene, G. 2001. A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J. Risk Insur. 69, 3, 373--421.Google ScholarCross Ref
- Vojtek, M. and Kočenda, E. 2006. Credit scoring methods. J. Econ. Finance 56, 152--167.Google Scholar
- Wang, K., Fung, B. C. M., and Yu, P. S. 2005. Template-based privacy preservation in classification problems. In Proceedings of the IEEE International Conference on Data Mining. IEEE Computer Society, 466--473. Google ScholarDigital Library
- Wu, X., Zhang, C., and Zhang, S. 2004. Efficient mining of both positive and negative association rules. ACM Trans. Inform. Syst. 22, 3, 381--405. Google ScholarDigital Library
- Yin, X. and Han, J. 2003. CPAR: Classification based on Predictive Association Rules. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 331--335.Google Scholar
Index Terms
- Data mining for discrimination discovery
Recommendations
Discrimination-aware data mining
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningIn the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Rules extracted from databases by data mining techniques, such as ...
A study of top-k measures for discrimination discovery
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied ComputingData mining approaches for discrimination discovery unveil contexts of possible discrimination against protected-by-law groups by extracting classification rules from a dataset of historical decision records. Rules are ranked according to some legally-...
Integrating induction and deduction for finding evidence of discrimination
We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize the ...
Comments