skip to main content
research-article

Data mining for discrimination discovery

Published:28 May 2010Publication History
Skip Abstract Section

Abstract

In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. In this article, we introduce the problem of discovering discrimination through data mining in a dataset of historical decision records, taken by humans or by automatic systems. We formalize the processes of direct and indirect discrimination discovery by modelling protected-by-law groups and contexts where discrimination occurs in a classification rule based syntax. Basically, classification rules extracted from the dataset allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is formalized by an extension of the lift measure of a classification rule. In direct discrimination, the extracted rules can be directly mined in search of discriminatory contexts. In indirect discrimination, the mining process needs some background knowledge as a further input, for example, census data, that combined with the extracted rules might allow for unveiling contexts of discriminatory decisions. A strategy adopted for combining extracted classification rules with background knowledge is called an inference model. In this article, we propose two inference models and provide automatic procedures for their implementation. An empirical assessment of our results is provided on the German credit dataset and on the PKDD Discovery Challenge 1999 financial dataset.

Skip Supplemental Material Section

Supplemental Material

References

  1. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the International Conference on Very Large Databases. Morgan Kaufmann, 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, R. and Srikant, R. 2000. Privacy-preserving data mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Australian Legislation. 2009. (a) Equal Opportunity Act—Victoria State, (b) Anti-Discrimination Act—Queensland State. http://www.austlii.edu.au.Google ScholarGoogle Scholar
  4. Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., and Vanthienen, J. 2003. Benchmarking state-of-the-art classification algorithms for credit scoring. J. Oper. Resear. Soc. 54, 6, 627--635.Google ScholarGoogle ScholarCross RefCross Ref
  5. Becker, G. S. 1957. The Economics of Discrimination. University of Chicago Press.Google ScholarGoogle Scholar
  6. Berka, P. 1999. PKDD 1999 discovery challenge. http://lisp.vse.cz/challenge.Google ScholarGoogle Scholar
  7. Chien, C.-F. and Chen, L. 2008. Data mining to improve personnel selection and enhance human capital: A case study in high-technology industry. Exp. Syst. Appl. 34, 1, 280--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Clifton, C. 2003. Privacy preserving data mining: How do we mine data when we aren't allowed to see it? In Procedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Tutorial. http://www.cs.purdue.edu/homes/clifton.Google ScholarGoogle Scholar
  9. European Union Legislation. 2009. (a) Racial Equality Directive, (b) Employment Equality Directive. http://ec.europa.eu/employment_social/fundamental_rights.Google ScholarGoogle Scholar
  10. Gastwirth, J. L. 1992. Statistical reasoning in the legal setting. Amer. Statist. 46, 1, 55--69.Google ScholarGoogle Scholar
  11. Goethals, B. 2009. Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi.Google ScholarGoogle Scholar
  12. Hand, D. J. 2001. Modelling consumer credit risk. IMA J. Manag. Math. 12, 139--155.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hand, D. J. and Henley, W. E. 1997. Statistical classification methods in consumer credit scoring: A review. J. Royal Statist. Soc. Series A 160, 523--541.Google ScholarGoogle ScholarCross RefCross Ref
  14. Harford, T. 2008. Logic of Life. The Random House.Google ScholarGoogle Scholar
  15. Hintoglu, A. A., Inan, A., Saygin, Y., and Keskinöz, M. 2005. Suppressing data sets to prevent discovery of association rules. In Proceedings of the IEEE International Conference on Data Mining. IEEE Computer Society, 645--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Holzer, H., Raphael, S., and Stoll, M. 2004. Black job applicants and the hiring officer's race. Industr. Labor Relat. Rev. 57, 2, 267--287.Google ScholarGoogle ScholarCross RefCross Ref
  17. Holzer, H. J. and Neumark, D. 2006. Affirmative action: What do we know? J. Policy Anal. Manag. 25, 463--490.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hunter, R. 1992. Indirect Discrimination in the Workplace. The Federation Press.Google ScholarGoogle Scholar
  19. Kamiran, F. and Calders, T. 2009. Classification without discrimination. In Proceedings of the IEEE International Conference on Computer, Control & Communication. IEEE Press.Google ScholarGoogle Scholar
  20. Kaye, D. and Aickin, M., Eds. 1992. Statistical Methods in Discrimination Litigation. Marcel Dekker, Inc.Google ScholarGoogle Scholar
  21. Knopff, R. 1986. On proving discrimination: Statistical methods and unfolding policy logics. Canad. Pub. Policy 12, 573--583.Google ScholarGoogle ScholarCross RefCross Ref
  22. Knuth, D. 1997. Fundamental Algorithms. Addison-Wesley.Google ScholarGoogle Scholar
  23. Kuhn, P. 1987. Sex discrimination in labor markets: The role of statistical evidence. Amer. Econ. Rev. 77, 567--583.Google ScholarGoogle Scholar
  24. LaCour-Little, M. 1999. Discrimination in mortgage lending: A critical review of the literature. J. Real Estate Lit. 7, 15--50.Google ScholarGoogle ScholarCross RefCross Ref
  25. Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. AAAI Press, 80--86.Google ScholarGoogle Scholar
  26. Liu, K. 2009. Privacy preserving data mining bibliography. http://www.csee.umbc.edu/~kunliu1/research/privacy_review.html.Google ScholarGoogle Scholar
  27. Makkonen, T. 2007. Measuring discrimination: Data collection and the EU equality law. http://ec.europa.eu/employment_social/fundamental_rights.Google ScholarGoogle Scholar
  28. Newman, D., Hettich, S., Blake, C., and Merz, C. 1998. UCI repository of machine learning databases. http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  29. Pedreschi, D., Ruggieri, S., and Turini, F. 2008. Discrimination-aware data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 560--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pedreschi, D., Ruggieri, S., and Turini, F. 2009. Measuring discrimination in socially-sensitive decision records. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 581--592.Google ScholarGoogle Scholar
  31. Piette, M. J. and White, P. F. 1999. Approaches for dealing with small sample sizes in employment discrimination litigation. J. Foren. Econ. 12, 43--56.Google ScholarGoogle ScholarCross RefCross Ref
  32. Rauch, J. 2005. Logic of association rules. Appl. Intell. 22, 1, 9--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rauch, J. and Simunek, M. 2001. Mining for association rules by 4ft-Miner. In Proceedings of the INAP 2001. Prolog Association of Japan, 285--295.Google ScholarGoogle Scholar
  34. Rauch, J. and Simunek, M. 2009. 4-ft Miner Procedure. http://lispminer.vse.cz.Google ScholarGoogle Scholar
  35. Riach, P. A. and Rich, J. 2002. Field experiments of discrimination in the market place. Econ. J. 112, 480--518.Google ScholarGoogle ScholarCross RefCross Ref
  36. Squires, G. D. 2003. Racial profiling, insurance style: Insurance redlining and the uneven development of metropolitan areas. J. Urban Affairs 25, 4, 391--410.Google ScholarGoogle ScholarCross RefCross Ref
  37. Srikant, R. and Agrawal, R. 1995. Mining generalized association rules. In Proceedings of the International Conference on Very Large Databases. Morgan Kaufmann, 407--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sweeney, L. 2001. Computational disclosure control: A primer on data privacy protection. Ph.D. thesis, MIT, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sweeney, L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzz. Knowl.-Bas. Syst. 10, 5, 571--588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tan, P.-N., Kumar, V., and Srivastava, J. 2004. Selecting the right objective measure for association analysis. Inform. Syst. 29, 4, 293--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Thomas, L. C. 2000. A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. Int. J. Forecast. 16, 149--172.Google ScholarGoogle ScholarCross RefCross Ref
  42. U.K. Legislation. 2009. (a) Sex Discrimination Act, (b) Race Relation Act. http://www.statutelaw.gov.uk.Google ScholarGoogle Scholar
  43. U.S. Federal Legislation. 2009. (a) Equal Credit Opportunity Act, (b) Fair Housing Act, (c) Intentional Employment Discrimination, (d) Equal Pay Act, (e) Pregnancy Discrimination Act. http://www.usdoj.gov.Google ScholarGoogle Scholar
  44. Vaidya, J., Clifton, C. W., and Zhu, Y. M. 2006. Privacy Preserving Data Mining. Advances in Information Security. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Verykios, V. S., Elmagarmid, A. K., Bertino, E., Saygin, Y., and Dasseni, E. 2004. Association rule hiding. IEEE Trans. Knowl. Data Engin. 16, 4, 434--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Viaene, S., Derrig, R. A., Baesens, B., and Dedene, G. 2001. A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J. Risk Insur. 69, 3, 373--421.Google ScholarGoogle ScholarCross RefCross Ref
  47. Vojtek, M. and Kočenda, E. 2006. Credit scoring methods. J. Econ. Finance 56, 152--167.Google ScholarGoogle Scholar
  48. Wang, K., Fung, B. C. M., and Yu, P. S. 2005. Template-based privacy preservation in classification problems. In Proceedings of the IEEE International Conference on Data Mining. IEEE Computer Society, 466--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wu, X., Zhang, C., and Zhang, S. 2004. Efficient mining of both positive and negative association rules. ACM Trans. Inform. Syst. 22, 3, 381--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yin, X. and Han, J. 2003. CPAR: Classification based on Predictive Association Rules. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 331--335.Google ScholarGoogle Scholar

Index Terms

  1. Data mining for discrimination discovery

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 4, Issue 2
      May 2010
      129 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/1754428
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 May 2010
      • Accepted: 1 August 2009
      • Revised: 1 June 2009
      • Received: 1 July 2008
      Published in tkdd Volume 4, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader