skip to main content
10.1145/2623330.2623333acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Corporate residence fraud detection

Published:24 August 2014Publication History

ABSTRACT

With the globalisation of the world's economies and ever-evolving financial structures, fraud has become one of the main dissipaters of government wealth and perhaps even a major contributor in the slowing down of economies in general. Although corporate residence fraud is known to be a major factor, data availability and high sensitivity have caused this domain to be largely untouched by academia. The current Belgian government has pledged to tackle this issue at large by using a variety of in-house approaches and cooperations with institutions such as academia, the ultimate goal being a fair and efficient taxation system. This is the first data mining application specifically aimed at finding corporate residence fraud, where we show the predictive value of using both structured and fine-grained invoicing data. We further describe the problems involved in building such a fraud detection system, which are mainly data-related (e.g. data asymmetry, quality, volume, variety and velocity) and deployment-related (e.g. the need for explanations of the predictions made).

Skip Supplemental Material Section

Supplemental Material

p1650-sidebyside.mp4

mp4

190.1 MB

References

  1. M. H. Baer. Linkage and the Deterrence of Corporate Fraud, 2008.Google ScholarGoogle Scholar
  2. S. Basta, F. Fassetti, M. Guarascio, G. Manco, F. Giannotti, D. Pedreschi, L. Spinsanti, G. Papi, and S. Pisani. High quality true-positive prediction for fiscal fraud detection. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on, pages 7--12. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland. Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602--613, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. J. Bolton and D. J. Hand. Statistical fraud detection: A review. Statistical Science, pages 235--249, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. J. Bolton, D. J. Hand, et al. Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control VII, pages 235--255, 2001.Google ScholarGoogle Scholar
  6. R. Brause, T. Langsdorf, and M. Hepp. Neural data mining for credit card fraud detection. In Tools with Artificial Intelligence, 1999. Proceedings. 11th IEEE International Conference on, pages 103--106. IEEE, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak. Detecting management fraud in public companies. Management Science, 56(7):1146--1160, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. Springer, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Crombez. Zwart en wit. De Bezige Bij, 2013.Google ScholarGoogle Scholar
  10. J. Demšar. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7:1--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. EUR-LEX. Communication from the commission to the european parliament and the council, 2012.Google ScholarGoogle Scholar
  12. European Commission. Fight against tax fraud and tax evasion: A huge problem, 2013.Google ScholarGoogle Scholar
  13. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Fawcett and F. Provost. Combining data mining and machine learning for effective user profiling. In Proceedings of the Third KDD International Conference on Knowledge Discovery and Data Mining, pages 8--13, 1996.Google ScholarGoogle Scholar
  15. T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3):291--316, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. C. González and J. D. Velásquez. Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Systems with Applications, 40(5):1427--1436, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. S. Hilas and P. A. Mastorocostas. An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowledge-Based Systems, 21(7):721--726, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Junqué de Fortuny, D. Martens, and F. Provost. Predictive Modeling with Big Data: Is Bigger Really Better? Big Data, 1(4):215--226, Oct. 2013.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. Juszczak, N. M. Adams, D. J. Hand, C. Whitrow, and D. J. Weston. Off-the-peg and bespoke classifiers for fraud detection. Computational Statistics & Data Analysis, 52(9):4521--4532, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Kirkos, C. Spathis, and Y. Manolopoulos. Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4):995--1003, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. A. Macskassy and F. Provost. A simple relational classifier. 2003.Google ScholarGoogle Scholar
  22. S. A. Macskassy and F. Provost. Suspicion scoring based on guilt-by-association, collective inference, and focused data access. In International conference on intelligence analysis, 2005.Google ScholarGoogle Scholar
  23. D. Martens and F. Provost. Explaining data-driven document classifications. MIS Quarterly, 38(4), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Martens, F. Provost, J. Clark, and E. Junqué de Fortuny. Mining fine-grained consumer payment data to improve targeted marketing. Technical report, Stern School of Business, New York University, 2013.Google ScholarGoogle Scholar
  25. National Fraud Authority. Annual fraud indicator 2013. 2013.Google ScholarGoogle Scholar
  26. E. Ngai, Y. Hu, Y. Wong, Y. Chen, and X. Sun. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3):559--569, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Organisation for Economic Co-operation and Development. Tax and development themes in recent G20 discussion, 2013.Google ScholarGoogle Scholar
  28. C. Perlich and F. Provost. Distribution-based aggregation for relational learning with identifier attributes. Machine Learning, 62(1--2):65--105, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Phua, V. Lee, K. Smith, and R. Gayler. A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119, 2010.Google ScholarGoogle Scholar
  30. J.-J. Rousseau. The Social Contract, Or Principles of Political Right (Du contrat social ou Principes du droit politique). 1762.Google ScholarGoogle Scholar
  31. C. Rudin. The p-norm push: A simple convex ranking algorithm that concentrates at the top of the list. The Journal of Machine Learning Research, 10:2233--2271, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Sahin and E. Duman. Detecting credit card fraud by decision trees and support vector machines. In Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 1, 2011.Google ScholarGoogle Scholar
  33. D. Sánchez, M. Vila, L. Cerda, and J.-M. Serrano. Association rules applied to credit card fraud detection. Expert Systems with Applications, 36(2):3630--3640, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Stankova, D. Martens, and F. Provost. Classification over bipartite graphs through projection. University of Antwerp, working paper, 2013.Google ScholarGoogle Scholar
  35. O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, and F. Provost. Using co-visitation networks for classifying non-intentional traffic. 2013.Google ScholarGoogle Scholar
  36. L. C. Thomas. Consumer Credit Models: Pricing, Profit and Portfolios: Pricing, Profit and Portfolios. Oxford University Press, 2009.Google ScholarGoogle Scholar
  37. L. C. Thomas, D. B. Edelman, and J. N. Crook. Credit scoring and its applications. Siam, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Class imbalance, redux. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 754--763. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams. Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18(1):30--55, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Wolpert. Stacked generalization. Neural networks, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R.-S. Wu, C.-S. Ou, H.-Y. Lin, S.-I. Chang, and D. C. Yen. Using data mining technique to enhance tax evasion detection performance. Expert Systems with Applications, 39(10):8769--8777, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Corporate residence fraud detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2014
      2028 pages
      ISBN:9781450329569
      DOI:10.1145/2623330

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader