ABSTRACT
With the globalisation of the world's economies and ever-evolving financial structures, fraud has become one of the main dissipaters of government wealth and perhaps even a major contributor in the slowing down of economies in general. Although corporate residence fraud is known to be a major factor, data availability and high sensitivity have caused this domain to be largely untouched by academia. The current Belgian government has pledged to tackle this issue at large by using a variety of in-house approaches and cooperations with institutions such as academia, the ultimate goal being a fair and efficient taxation system. This is the first data mining application specifically aimed at finding corporate residence fraud, where we show the predictive value of using both structured and fine-grained invoicing data. We further describe the problems involved in building such a fraud detection system, which are mainly data-related (e.g. data asymmetry, quality, volume, variety and velocity) and deployment-related (e.g. the need for explanations of the predictions made).
Supplemental Material
- M. H. Baer. Linkage and the Deterrence of Corporate Fraud, 2008.Google Scholar
- S. Basta, F. Fassetti, M. Guarascio, G. Manco, F. Giannotti, D. Pedreschi, L. Spinsanti, G. Papi, and S. Pisani. High quality true-positive prediction for fiscal fraud detection. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on, pages 7--12. IEEE, 2009. Google ScholarDigital Library
- S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland. Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602--613, 2011. Google ScholarDigital Library
- R. J. Bolton and D. J. Hand. Statistical fraud detection: A review. Statistical Science, pages 235--249, 2002.Google ScholarCross Ref
- R. J. Bolton, D. J. Hand, et al. Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control VII, pages 235--255, 2001.Google Scholar
- R. Brause, T. Langsdorf, and M. Hepp. Neural data mining for credit card fraud detection. In Tools with Artificial Intelligence, 1999. Proceedings. 11th IEEE International Conference on, pages 103--106. IEEE, 1999. Google ScholarDigital Library
- M. Cecchini, H. Aytug, G. J. Koehler, and P. Pathak. Detecting management fraud in public companies. Management Science, 56(7):1146--1160, 2010. Google ScholarDigital Library
- C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. Springer, 2001.Google ScholarCross Ref
- J. Crombez. Zwart en wit. De Bezige Bij, 2013.Google Scholar
- J. Demšar. Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7:1--30, 2006. Google ScholarDigital Library
- EUR-LEX. Communication from the commission to the european parliament and the council, 2012.Google Scholar
- European Commission. Fight against tax fraud and tax evasion: A huge problem, 2013.Google Scholar
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarDigital Library
- T. Fawcett and F. Provost. Combining data mining and machine learning for effective user profiling. In Proceedings of the Third KDD International Conference on Knowledge Discovery and Data Mining, pages 8--13, 1996.Google Scholar
- T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3):291--316, 1997. Google ScholarDigital Library
- P. C. González and J. D. Velásquez. Characterization and detection of taxpayers with false invoices using data mining techniques. Expert Systems with Applications, 40(5):1427--1436, 2013. Google ScholarDigital Library
- C. S. Hilas and P. A. Mastorocostas. An application of supervised and unsupervised learning approaches to telecommunications fraud detection. Knowledge-Based Systems, 21(7):721--726, 2008. Google ScholarDigital Library
- E. Junqué de Fortuny, D. Martens, and F. Provost. Predictive Modeling with Big Data: Is Bigger Really Better? Big Data, 1(4):215--226, Oct. 2013.Google ScholarCross Ref
- P. Juszczak, N. M. Adams, D. J. Hand, C. Whitrow, and D. J. Weston. Off-the-peg and bespoke classifiers for fraud detection. Computational Statistics & Data Analysis, 52(9):4521--4532, 2008. Google ScholarDigital Library
- E. Kirkos, C. Spathis, and Y. Manolopoulos. Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4):995--1003, 2007. Google ScholarDigital Library
- S. A. Macskassy and F. Provost. A simple relational classifier. 2003.Google Scholar
- S. A. Macskassy and F. Provost. Suspicion scoring based on guilt-by-association, collective inference, and focused data access. In International conference on intelligence analysis, 2005.Google Scholar
- D. Martens and F. Provost. Explaining data-driven document classifications. MIS Quarterly, 38(4), 2014. Google ScholarDigital Library
- D. Martens, F. Provost, J. Clark, and E. Junqué de Fortuny. Mining fine-grained consumer payment data to improve targeted marketing. Technical report, Stern School of Business, New York University, 2013.Google Scholar
- National Fraud Authority. Annual fraud indicator 2013. 2013.Google Scholar
- E. Ngai, Y. Hu, Y. Wong, Y. Chen, and X. Sun. The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3):559--569, 2011. Google ScholarDigital Library
- Organisation for Economic Co-operation and Development. Tax and development themes in recent G20 discussion, 2013.Google Scholar
- C. Perlich and F. Provost. Distribution-based aggregation for relational learning with identifier attributes. Machine Learning, 62(1--2):65--105, 2006. Google ScholarDigital Library
- C. Phua, V. Lee, K. Smith, and R. Gayler. A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119, 2010.Google Scholar
- J.-J. Rousseau. The Social Contract, Or Principles of Political Right (Du contrat social ou Principes du droit politique). 1762.Google Scholar
- C. Rudin. The p-norm push: A simple convex ranking algorithm that concentrates at the top of the list. The Journal of Machine Learning Research, 10:2233--2271, 2009. Google ScholarDigital Library
- Y. Sahin and E. Duman. Detecting credit card fraud by decision trees and support vector machines. In Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 1, 2011.Google Scholar
- D. Sánchez, M. Vila, L. Cerda, and J.-M. Serrano. Association rules applied to credit card fraud detection. Expert Systems with Applications, 36(2):3630--3640, 2009. Google ScholarDigital Library
- M. Stankova, D. Martens, and F. Provost. Classification over bipartite graphs through projection. University of Antwerp, working paper, 2013.Google Scholar
- O. Stitelman, C. Perlich, B. Dalessandro, R. Hook, T. Raeder, and F. Provost. Using co-visitation networks for classifying non-intentional traffic. 2013.Google Scholar
- L. C. Thomas. Consumer Credit Models: Pricing, Profit and Portfolios: Pricing, Profit and Portfolios. Oxford University Press, 2009.Google Scholar
- L. C. Thomas, D. B. Edelman, and J. N. Crook. Credit scoring and its applications. Siam, 2002. Google ScholarDigital Library
- B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos. Class imbalance, redux. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 754--763. IEEE, 2011. Google ScholarDigital Library
- C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams. Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18(1):30--55, 2009. Google ScholarDigital Library
- D. Wolpert. Stacked generalization. Neural networks, 1992. Google ScholarDigital Library
- R.-S. Wu, C.-S. Ou, H.-Y. Lin, S.-I. Chang, and D. C. Yen. Using data mining technique to enhance tax evasion detection performance. Expert Systems with Applications, 39(10):8769--8777, 2012. Google ScholarDigital Library
Index Terms
- Corporate residence fraud detection
Recommendations
Research on Credit Card Fraud Detection Model Based on Distance Sum
JCAI '09: Proceedings of the 2009 International Joint Conference on Artificial IntelligenceAlong with increasing credit cards and growing trade volume in China, credit card fraud rises sharply. How to enhance the detection and prevention of credit card fraud becomes the focus of risk control of banks. This paper proposes a credit card fraud ...
A comparison of machine learning algorithms for credit card fraud detection
NISS '23: Proceedings of the 6th International Conference on Networking, Intelligent Systems & SecurityWith the increasing use of credit cards for online and offline transactions, the risk of fraudulent activities has also increased significantly. In this study, we propose a machine learning-based approach to predict credit card fraud. We used a public ...
The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature
This paper presents a review of - and classification scheme for - the literature on the application of data mining techniques for the detection of financial fraud. Although financial fraud detection (FFD) is an emerging topic of great importance, a ...
Comments