skip to main content
research-article

Discovering Discontinuity in Big Financial Transaction Data

Published:08 February 2018Publication History
Skip Abstract Section

Abstract

Business transactions are typically recorded in the company ledger. The primary purpose of such financial information is to accompany a monthly or quarterly report for executives to make sound business decisions and strategies for the next business period. These business strategies often result in transitions that cause underlying infrastructures and components to change, including alteration in the nomenclature system of the business components. As a result, a transaction stream of an affected component would be replaced by another stream with a different component name, resulting in discontinuity of a financial stream of the same component. Recently, advancement in large-scale data mining technologies has enabled a set of critical applications to utilize knowledge extracted from a vast amount of existing data that would otherwise have been unused or underutilized. In financial and services computing domains, recent studies have illustrated that historical financial data could be used to predict future revenues and profits, optimizing costs, among other potential applications. These prediction models rely on long-term availability of the historical data that traces back for multiple years. However, the discontinuity of the financial transaction stream associated with a business component has limited the learning capability of the prediction models. In this article, we propose a set of machine learning–based algorithms to automatically discover component name replacements, using information available in general ledger databases. The algorithms are designed to be scalable for handling massive data points, especially in large companies. Furthermore, the proposed algorithms are generalizable to other domains whose data is time series and shares the same nature as the financial data available in business ledgers. A case study of real-world IBM service delivery retrieved from four different geographical regions is used to validate the efficacy of the proposed methodology.

References

  1. Gerard Biau. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13, 2012, 1063--1095. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jeanette Blomberg, Neil Boyette, Aniruddha Chandra, Sechan Oh, Ruoyi Zhou, Ray Strong, William Jones, Oliver Gehb, Andreas Vogt, and Gerhardt Satzger. 2014. Forecasting service profitability. In Proceedings of the 2014 IEEE International Conference on Services Computing (SCC’14). IEEE, Los Alamitos, CA, 370--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne. 2013. Machine learning strategies for time series forecasting. In Business Intelligence. Springer, 62--77.Google ScholarGoogle Scholar
  4. Pei-Chann Chang, Chen-Hao Liu, Jun-Lin Lin, Chin-Yuan Fan, and Celeste S. P. Ng. 2009. A neural network with a case based dynamic window for stock trading prediction. Expert Systems With Applications 36, 3, 6889--6898. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter Christen. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science 8 Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. William W. Cohen. 1995. Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning. 115--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thomas G. Dietterich. 1998a. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 7, 1895--1923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Thomas G. Dietterich. 1998b. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 7, 1895--1923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brian S. Everitt. 1992. The Analysis of Contingency Tables. Vol. 45. CRC Press, Boca Raton, FL.Google ScholarGoogle Scholar
  10. Gartheeban Ganeshapillai, John Guttag, and Andrew Lo. 2013. Learning connections in financial time series. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lee C. Gerhard, William E. Harrison, and Bernold M. Hanson (Eds.). 2001. Geological Perspectives of Global Climate Change. AAPG Studies in Geology #47. American Association of Petroleum Geologists, Tulsa, OK.Google ScholarGoogle Scholar
  12. Rainer Hegger, Holger Kantz, and Thomas Schreiber. 1999. Practical implementation of nonlinear time series methods: The TISEAN package. Chaos: An Interdisciplinary Journal of Nonlinear Science 9, 2, 413--435.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kuang-Jung Hsu. 1992. Time series analysis of the interdependence among air pollutants. Atmospheric Environment. Part B. Urban Atmosphere 26, 4, 491--503.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ren-Hung Hwang, Chung-Nan Lee, Yi-Ru Chen, and Da-Jing Zhang-Jian. 2014. Cost optimization of elasticity cloud resource subscription policy. IEEE Transactions on Services Computing 7, 4, 561--574.Google ScholarGoogle ScholarCross RefCross Ref
  15. George H. John and Pat Langley. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. 338--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Spencer S. Jones, R. Scott Evans, Todd L. Allen, Alun Thomas, Peter J. Haug, Shari J. Welch, and Gregory L. Snow. 2009. A multivariate time series approach to modeling and forecasting demand in the emergency department. Journal of Biomedical Informatics 42, 1, 123--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael S. Kaylen. 1988. Vector autoregression forecasting models: Recent developments applied to the US hog market. American Journal of Agricultural Economics 70, 3, 701--712.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. le Cessie and J. C. van Houwelingen. 1992. Ridge estimators in logistic regression. Applied Statistics 41, 1, 191--201.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. Leitner, W. Hummer, and S. Dustdar. 2013a. Cost-based optimization of service compositions. IEEE Transactions on Services Computing 6, 2, 239--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Philipp Leitner, Waldemar Hummer, and Schahram Dustdar. 2013b. Cost-based optimization of service compositions. IEEE Transactions on Services Computing 6, 2, 239--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jun Li, Bryan Stephenson, Hamid R. Motahari-Nezhad, and Sharad Singhal. 2011. GEODAC: A data assurance policy specification and enforcement framework for outsourced services. IEEE Transactions on Services Computing 4, 4, 340--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ee-Peng Lim, Hsinchun Chen, and Guoqing Chen. 2013. Business intelligence and analytics: Research directions. ACM Transactions on Management Information Systems 3, 4, Article 17, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ming-Chih Lin, Anthony J. T. Lee, Rung-Tai Kao, and Kuo-Tay Chen. 2008. Stock price movement prediction using representative prototypes of financial reports. ACM Transactions on Management Information Systems 2, 3, Article 19, 18 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Werner Mach, Benedikt Pittl, and Erich Schikuta. 2014. A forecasting and decision model for successful service negotiation. In Proceedings of the 2014 IEEE International Conference on Services Computing (SCC’14). IEEE, Los Alamitos, CA, 733--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google ScholarGoogle Scholar
  26. Ryszard S. Michalski, Jaime G. Carbonell, and Tom M. Mitchell. 2013. Machine Learning: An Artificial Intelligence Approach. Springer Science 8 Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nikola Milanovic and Bratislav Milic. 2011. Automatic generation of service availability models. IEEE Transactions on Services Computing 4, 1, 56--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chun-Shun Sun, Yao-Nan Wang, and Xin-Ran Li. 2008. A vector autoregression model of hourly wind speed and its applications in hourly wind speed forecasting. Proceedings of the Chinese Society of Electrical Engineering 28, 14, 112.Google ScholarGoogle Scholar
  30. Ruey S. Tsay. 2005. Analysis of Financial Time Series. Vol. 543. John Wiley 8 Sons.Google ScholarGoogle Scholar
  31. Ruey S. Tsay. 2013. Multivariate Time Series Analysis: With R and Financial Applications. John Wiley 8 Sons.Google ScholarGoogle Scholar
  32. Suppawong Tuarob, Sumit Bhatia, Prasenjit Mitra, and C. Lee Giles. 2013. Automatic detection of pseudocodes in scholarly documents using machine learning. In Proceedings of the 2013 12th International Conference on Documents Analysis and Recognition (ICDAR’13). IEEE. Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Suppawong Tuarob and Conrad S. Tucker. 2015. Automated discovery of lead users and latent product features by mining large scale social media networks. Journal of Mechanical Design 137, 7, 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  34. Suppawong Tuarob, Conrad S. Tucker, Soundar Kumara, C. Lee Giles, Aaron L. Pincus, David E. Conroy, and Nilam Ram. 2017. How are you feeling? A personalized methodology for predicting mental states from temporally observable physical and behavioral information. Journal of Biomedical Informatics 68, 1--19.Google ScholarGoogle ScholarCross RefCross Ref
  35. Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, and Nilam Ram. 2014. An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics 49, 2014, 255--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Suppawong Tuarob, Conrad S. Tucker, Ray Strong, Jeannette Blomberg, Anca Chandra, Pawan Chowdhary, and Sechan Oh. 2015. Automatic discovery of service name replacements using ledger data. In Proceedings of the 2015 IEEE International Conference on Services Computing (SCC’15). IEEE, Los Alamitos, CA, 624--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Beate Wild, Michael Eichler, Hans-Christoph Friederich, Mechthild Hartmann, Stephan Zipfel, and Wolfgang Herzog. 2010. A graphical vector autoregressive modelling approach to the analysis of electronic diary data. BMC Medical Research Methodology 10, 1, 28.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Discovering Discontinuity in Big Financial Transaction Data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Management Information Systems
          ACM Transactions on Management Information Systems  Volume 9, Issue 1
          March 2018
          89 pages
          ISSN:2158-656X
          EISSN:2158-6578
          DOI:10.1145/3146385
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 8 February 2018
          • Accepted: 1 October 2017
          • Revised: 1 July 2017
          • Received: 1 August 2016
          Published in tmis Volume 9, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader