research-article

Discovering Discontinuity in Big Financial Transaction Data

Authors:
Suppawong Tuarob

Mahidol University, Salaya, Nakhon Pathom Thailand

Mahidol University, Salaya, Nakhon Pathom Thailand

0000-0002-5201-5699
View Profile

,
Ray Strong

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Anca Chandra

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Conrad S. Tucker

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

Authors Info & Claims

ACM Transactions on Management Information Systems Volume 9 Issue 1Article No.: 3pp 1–26https://doi.org/10.1145/3159445

Published:08 February 2018Publication History

ACM Transactions on Management Information Systems

Abstract

Business transactions are typically recorded in the company ledger. The primary purpose of such financial information is to accompany a monthly or quarterly report for executives to make sound business decisions and strategies for the next business period. These business strategies often result in transitions that cause underlying infrastructures and components to change, including alteration in the nomenclature system of the business components. As a result, a transaction stream of an affected component would be replaced by another stream with a different component name, resulting in discontinuity of a financial stream of the same component. Recently, advancement in large-scale data mining technologies has enabled a set of critical applications to utilize knowledge extracted from a vast amount of existing data that would otherwise have been unused or underutilized. In financial and services computing domains, recent studies have illustrated that historical financial data could be used to predict future revenues and profits, optimizing costs, among other potential applications. These prediction models rely on long-term availability of the historical data that traces back for multiple years. However, the discontinuity of the financial transaction stream associated with a business component has limited the learning capability of the prediction models. In this article, we propose a set of machine learning–based algorithms to automatically discover component name replacements, using information available in general ledger databases. The algorithms are designed to be scalable for handling massive data points, especially in large companies. Furthermore, the proposed algorithms are generalizable to other domains whose data is time series and shares the same nature as the financial data available in business ledgers. A case study of real-world IBM service delivery retrieved from four different geographical regions is used to validate the efficacy of the proposed methodology.

References

Gerard Biau. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13, 2012, 1063--1095. Google ScholarDigital Library
Jeanette Blomberg, Neil Boyette, Aniruddha Chandra, Sechan Oh, Ruoyi Zhou, Ray Strong, William Jones, Oliver Gehb, Andreas Vogt, and Gerhardt Satzger. 2014. Forecasting service profitability. In Proceedings of the 2014 IEEE International Conference on Services Computing (SCC’14). IEEE, Los Alamitos, CA, 370--377. Google ScholarDigital Library
Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne. 2013. Machine learning strategies for time series forecasting. In Business Intelligence. Springer, 62--77.Google Scholar
Pei-Chann Chang, Chen-Hao Liu, Jun-Lin Lin, Chin-Yuan Fan, and Celeste S. P. Ng. 2009. A neural network with a case based dynamic window for stock trading prediction. Expert Systems With Applications 36, 3, 6889--6898. Google ScholarDigital Library
Peter Christen. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science 8 Business Media. Google ScholarDigital Library
William W. Cohen. 1995. Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning. 115--123. Google ScholarDigital Library
Thomas G. Dietterich. 1998a. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 7, 1895--1923. Google ScholarDigital Library
Thomas G. Dietterich. 1998b. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 7, 1895--1923. Google ScholarDigital Library
Brian S. Everitt. 1992. The Analysis of Contingency Tables. Vol. 45. CRC Press, Boca Raton, FL.Google Scholar
Gartheeban Ganeshapillai, John Guttag, and Andrew Lo. 2013. Learning connections in financial time series. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 109--117. Google ScholarDigital Library
Lee C. Gerhard, William E. Harrison, and Bernold M. Hanson (Eds.). 2001. Geological Perspectives of Global Climate Change. AAPG Studies in Geology #47. American Association of Petroleum Geologists, Tulsa, OK.Google Scholar
Rainer Hegger, Holger Kantz, and Thomas Schreiber. 1999. Practical implementation of nonlinear time series methods: The TISEAN package. Chaos: An Interdisciplinary Journal of Nonlinear Science 9, 2, 413--435.Google ScholarCross Ref
Kuang-Jung Hsu. 1992. Time series analysis of the interdependence among air pollutants. Atmospheric Environment. Part B. Urban Atmosphere 26, 4, 491--503.Google ScholarCross Ref
Ren-Hung Hwang, Chung-Nan Lee, Yi-Ru Chen, and Da-Jing Zhang-Jian. 2014. Cost optimization of elasticity cloud resource subscription policy. IEEE Transactions on Services Computing 7, 4, 561--574.Google ScholarCross Ref
George H. John and Pat Langley. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. 338--345. Google ScholarDigital Library
Spencer S. Jones, R. Scott Evans, Todd L. Allen, Alun Thomas, Peter J. Haug, Shari J. Welch, and Gregory L. Snow. 2009. A multivariate time series approach to modeling and forecasting demand in the emergency department. Journal of Biomedical Informatics 42, 1, 123--139. Google ScholarDigital Library
Michael S. Kaylen. 1988. Vector autoregression forecasting models: Recent developments applied to the US hog market. American Journal of Agricultural Economics 70, 3, 701--712.Google ScholarCross Ref
S. le Cessie and J. C. van Houwelingen. 1992. Ridge estimators in logistic regression. Applied Statistics 41, 1, 191--201.Google ScholarCross Ref
P. Leitner, W. Hummer, and S. Dustdar. 2013a. Cost-based optimization of service compositions. IEEE Transactions on Services Computing 6, 2, 239--251. Google ScholarDigital Library
Philipp Leitner, Waldemar Hummer, and Schahram Dustdar. 2013b. Cost-based optimization of service compositions. IEEE Transactions on Services Computing 6, 2, 239--251. Google ScholarDigital Library
Jun Li, Bryan Stephenson, Hamid R. Motahari-Nezhad, and Sharad Singhal. 2011. GEODAC: A data assurance policy specification and enforcement framework for outsourced services. IEEE Transactions on Services Computing 4, 4, 340--354. Google ScholarDigital Library
Ee-Peng Lim, Hsinchun Chen, and Guoqing Chen. 2013. Business intelligence and analytics: Research directions. ACM Transactions on Management Information Systems 3, 4, Article 17, 10 pages. Google ScholarDigital Library
Ming-Chih Lin, Anthony J. T. Lee, Rung-Tai Kao, and Kuo-Tay Chen. 2008. Stock price movement prediction using representative prototypes of financial reports. ACM Transactions on Management Information Systems 2, 3, Article 19, 18 pages. Google ScholarDigital Library
Werner Mach, Benedikt Pittl, and Erich Schikuta. 2014. A forecasting and decision model for successful service negotiation. In Proceedings of the 2014 IEEE International Conference on Services Computing (SCC’14). IEEE, Los Alamitos, CA, 733--740. Google ScholarDigital Library
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google Scholar
Ryszard S. Michalski, Jaime G. Carbonell, and Tom M. Mitchell. 2013. Machine Learning: An Artificial Intelligence Approach. Springer Science 8 Business Media. Google ScholarDigital Library
Nikola Milanovic and Bratislav Milic. 2011. Automatic generation of service availability models. IEEE Transactions on Services Computing 4, 1, 56--69. Google ScholarDigital Library
Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
Chun-Shun Sun, Yao-Nan Wang, and Xin-Ran Li. 2008. A vector autoregression model of hourly wind speed and its applications in hourly wind speed forecasting. Proceedings of the Chinese Society of Electrical Engineering 28, 14, 112.Google Scholar
Ruey S. Tsay. 2005. Analysis of Financial Time Series. Vol. 543. John Wiley 8 Sons.Google Scholar
Ruey S. Tsay. 2013. Multivariate Time Series Analysis: With R and Financial Applications. John Wiley 8 Sons.Google Scholar
Suppawong Tuarob, Sumit Bhatia, Prasenjit Mitra, and C. Lee Giles. 2013. Automatic detection of pseudocodes in scholarly documents using machine learning. In Proceedings of the 2013 12th International Conference on Documents Analysis and Recognition (ICDAR’13). IEEE. Los Alamitos, CA. Google ScholarDigital Library
Suppawong Tuarob and Conrad S. Tucker. 2015. Automated discovery of lead users and latent product features by mining large scale social media networks. Journal of Mechanical Design 137, 7, 1--11.Google ScholarCross Ref
Suppawong Tuarob, Conrad S. Tucker, Soundar Kumara, C. Lee Giles, Aaron L. Pincus, David E. Conroy, and Nilam Ram. 2017. How are you feeling? A personalized methodology for predicting mental states from temporally observable physical and behavioral information. Journal of Biomedical Informatics 68, 1--19.Google ScholarCross Ref
Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, and Nilam Ram. 2014. An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics 49, 2014, 255--268. Google ScholarDigital Library
Suppawong Tuarob, Conrad S. Tucker, Ray Strong, Jeannette Blomberg, Anca Chandra, Pawan Chowdhary, and Sechan Oh. 2015. Automatic discovery of service name replacements using ledger data. In Proceedings of the 2015 IEEE International Conference on Services Computing (SCC’15). IEEE, Los Alamitos, CA, 624--631. Google ScholarDigital Library
Beate Wild, Michael Eichler, Hans-Christoph Friederich, Mechthild Hartmann, Stephan Zipfel, and Wolfgang Herzog. 2010. A graphical vector autoregressive modelling approach to the analysis of electronic diary data. BMC Medical Research Methodology 10, 1, 28.Google ScholarCross Ref

Index Terms

Discovering Discontinuity in Big Financial Transaction Data

Recommendations

Discovering Traders' Heterogeneous Behavior in High-Frequency Financial Data

This paper develops a utility-based heterogeneous agent model for empirically investigating intraday traders' behaviors. Two types of agents, which consist of fundamental traders and technical analysts, are considered in the proposed model. They differ ...
Read More
The Role of Big Data, Data Science and Data Analytics in Financial Engineering
BDE '19: Proceedings of the 2019 International Conference on Big Data Engineering

Financial engineering is the process of creating innovative solutions for the existing financial problems of a company by using applications of mathematical methods. Financial engineering uses tools and knowledge from the fields of computer science, big ...
Read More
Consumers Financial Distress: Prediction and Prescription Using Machine Learning
Dynamics of Information Systems
Abstract
This paper shows how transactional bank account data can be used to predict and to prevent financial distress in consumers. Machine learning methods were used to understand what are the most significant transactional behaviours that cause ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Management Information Systems Volume 9, Issue 1
March 2018
89 pages
ISSN:2158-656X
EISSN:2158-6578
DOI:10.1145/3146385
Editor:
Daniel Zeng
University of Arizona, USA
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2018
- Accepted: 1 October 2017
- Revised: 1 July 2017
- Received: 1 August 2016
Published in tmis Volume 9, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Services delivery
classification
machine learning
name replacement discovery
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 361
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering Discontinuity in Big Financial Transaction Data

ACM Transactions on Management Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Discovering Traders' Heterogeneous Behavior in High-Frequency Financial Data

The Role of Big Data, Data Science and Data Analytics in Financial Engineering

Consumers Financial Distress: Prediction and Prescription Using Machine Learning