Abstract
Business transactions are typically recorded in the company ledger. The primary purpose of such financial information is to accompany a monthly or quarterly report for executives to make sound business decisions and strategies for the next business period. These business strategies often result in transitions that cause underlying infrastructures and components to change, including alteration in the nomenclature system of the business components. As a result, a transaction stream of an affected component would be replaced by another stream with a different component name, resulting in discontinuity of a financial stream of the same component. Recently, advancement in large-scale data mining technologies has enabled a set of critical applications to utilize knowledge extracted from a vast amount of existing data that would otherwise have been unused or underutilized. In financial and services computing domains, recent studies have illustrated that historical financial data could be used to predict future revenues and profits, optimizing costs, among other potential applications. These prediction models rely on long-term availability of the historical data that traces back for multiple years. However, the discontinuity of the financial transaction stream associated with a business component has limited the learning capability of the prediction models. In this article, we propose a set of machine learning–based algorithms to automatically discover component name replacements, using information available in general ledger databases. The algorithms are designed to be scalable for handling massive data points, especially in large companies. Furthermore, the proposed algorithms are generalizable to other domains whose data is time series and shares the same nature as the financial data available in business ledgers. A case study of real-world IBM service delivery retrieved from four different geographical regions is used to validate the efficacy of the proposed methodology.
- Gerard Biau. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13, 2012, 1063--1095. Google ScholarDigital Library
- Jeanette Blomberg, Neil Boyette, Aniruddha Chandra, Sechan Oh, Ruoyi Zhou, Ray Strong, William Jones, Oliver Gehb, Andreas Vogt, and Gerhardt Satzger. 2014. Forecasting service profitability. In Proceedings of the 2014 IEEE International Conference on Services Computing (SCC’14). IEEE, Los Alamitos, CA, 370--377. Google ScholarDigital Library
- Gianluca Bontempi, Souhaib Ben Taieb, and Yann-Aël Le Borgne. 2013. Machine learning strategies for time series forecasting. In Business Intelligence. Springer, 62--77.Google Scholar
- Pei-Chann Chang, Chen-Hao Liu, Jun-Lin Lin, Chin-Yuan Fan, and Celeste S. P. Ng. 2009. A neural network with a case based dynamic window for stock trading prediction. Expert Systems With Applications 36, 3, 6889--6898. Google ScholarDigital Library
- Peter Christen. 2012. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science 8 Business Media. Google ScholarDigital Library
- William W. Cohen. 1995. Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning. 115--123. Google ScholarDigital Library
- Thomas G. Dietterich. 1998a. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 7, 1895--1923. Google ScholarDigital Library
- Thomas G. Dietterich. 1998b. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 7, 1895--1923. Google ScholarDigital Library
- Brian S. Everitt. 1992. The Analysis of Contingency Tables. Vol. 45. CRC Press, Boca Raton, FL.Google Scholar
- Gartheeban Ganeshapillai, John Guttag, and Andrew Lo. 2013. Learning connections in financial time series. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 109--117. Google ScholarDigital Library
- Lee C. Gerhard, William E. Harrison, and Bernold M. Hanson (Eds.). 2001. Geological Perspectives of Global Climate Change. AAPG Studies in Geology #47. American Association of Petroleum Geologists, Tulsa, OK.Google Scholar
- Rainer Hegger, Holger Kantz, and Thomas Schreiber. 1999. Practical implementation of nonlinear time series methods: The TISEAN package. Chaos: An Interdisciplinary Journal of Nonlinear Science 9, 2, 413--435.Google ScholarCross Ref
- Kuang-Jung Hsu. 1992. Time series analysis of the interdependence among air pollutants. Atmospheric Environment. Part B. Urban Atmosphere 26, 4, 491--503.Google ScholarCross Ref
- Ren-Hung Hwang, Chung-Nan Lee, Yi-Ru Chen, and Da-Jing Zhang-Jian. 2014. Cost optimization of elasticity cloud resource subscription policy. IEEE Transactions on Services Computing 7, 4, 561--574.Google ScholarCross Ref
- George H. John and Pat Langley. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. 338--345. Google ScholarDigital Library
- Spencer S. Jones, R. Scott Evans, Todd L. Allen, Alun Thomas, Peter J. Haug, Shari J. Welch, and Gregory L. Snow. 2009. A multivariate time series approach to modeling and forecasting demand in the emergency department. Journal of Biomedical Informatics 42, 1, 123--139. Google ScholarDigital Library
- Michael S. Kaylen. 1988. Vector autoregression forecasting models: Recent developments applied to the US hog market. American Journal of Agricultural Economics 70, 3, 701--712.Google ScholarCross Ref
- S. le Cessie and J. C. van Houwelingen. 1992. Ridge estimators in logistic regression. Applied Statistics 41, 1, 191--201.Google ScholarCross Ref
- P. Leitner, W. Hummer, and S. Dustdar. 2013a. Cost-based optimization of service compositions. IEEE Transactions on Services Computing 6, 2, 239--251. Google ScholarDigital Library
- Philipp Leitner, Waldemar Hummer, and Schahram Dustdar. 2013b. Cost-based optimization of service compositions. IEEE Transactions on Services Computing 6, 2, 239--251. Google ScholarDigital Library
- Jun Li, Bryan Stephenson, Hamid R. Motahari-Nezhad, and Sharad Singhal. 2011. GEODAC: A data assurance policy specification and enforcement framework for outsourced services. IEEE Transactions on Services Computing 4, 4, 340--354. Google ScholarDigital Library
- Ee-Peng Lim, Hsinchun Chen, and Guoqing Chen. 2013. Business intelligence and analytics: Research directions. ACM Transactions on Management Information Systems 3, 4, Article 17, 10 pages. Google ScholarDigital Library
- Ming-Chih Lin, Anthony J. T. Lee, Rung-Tai Kao, and Kuo-Tay Chen. 2008. Stock price movement prediction using representative prototypes of financial reports. ACM Transactions on Management Information Systems 2, 3, Article 19, 18 pages. Google ScholarDigital Library
- Werner Mach, Benedikt Pittl, and Erich Schikuta. 2014. A forecasting and decision model for successful service negotiation. In Proceedings of the 2014 IEEE International Conference on Services Computing (SCC’14). IEEE, Los Alamitos, CA, 733--740. Google ScholarDigital Library
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY. Google Scholar
- Ryszard S. Michalski, Jaime G. Carbonell, and Tom M. Mitchell. 2013. Machine Learning: An Artificial Intelligence Approach. Springer Science 8 Business Media. Google ScholarDigital Library
- Nikola Milanovic and Bratislav Milic. 2011. Automatic generation of service availability models. IEEE Transactions on Services Computing 4, 1, 56--69. Google ScholarDigital Library
- Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
- Chun-Shun Sun, Yao-Nan Wang, and Xin-Ran Li. 2008. A vector autoregression model of hourly wind speed and its applications in hourly wind speed forecasting. Proceedings of the Chinese Society of Electrical Engineering 28, 14, 112.Google Scholar
- Ruey S. Tsay. 2005. Analysis of Financial Time Series. Vol. 543. John Wiley 8 Sons.Google Scholar
- Ruey S. Tsay. 2013. Multivariate Time Series Analysis: With R and Financial Applications. John Wiley 8 Sons.Google Scholar
- Suppawong Tuarob, Sumit Bhatia, Prasenjit Mitra, and C. Lee Giles. 2013. Automatic detection of pseudocodes in scholarly documents using machine learning. In Proceedings of the 2013 12th International Conference on Documents Analysis and Recognition (ICDAR’13). IEEE. Los Alamitos, CA. Google ScholarDigital Library
- Suppawong Tuarob and Conrad S. Tucker. 2015. Automated discovery of lead users and latent product features by mining large scale social media networks. Journal of Mechanical Design 137, 7, 1--11.Google ScholarCross Ref
- Suppawong Tuarob, Conrad S. Tucker, Soundar Kumara, C. Lee Giles, Aaron L. Pincus, David E. Conroy, and Nilam Ram. 2017. How are you feeling? A personalized methodology for predicting mental states from temporally observable physical and behavioral information. Journal of Biomedical Informatics 68, 1--19.Google ScholarCross Ref
- Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, and Nilam Ram. 2014. An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. Journal of Biomedical Informatics 49, 2014, 255--268. Google ScholarDigital Library
- Suppawong Tuarob, Conrad S. Tucker, Ray Strong, Jeannette Blomberg, Anca Chandra, Pawan Chowdhary, and Sechan Oh. 2015. Automatic discovery of service name replacements using ledger data. In Proceedings of the 2015 IEEE International Conference on Services Computing (SCC’15). IEEE, Los Alamitos, CA, 624--631. Google ScholarDigital Library
- Beate Wild, Michael Eichler, Hans-Christoph Friederich, Mechthild Hartmann, Stephan Zipfel, and Wolfgang Herzog. 2010. A graphical vector autoregressive modelling approach to the analysis of electronic diary data. BMC Medical Research Methodology 10, 1, 28.Google ScholarCross Ref
Index Terms
- Discovering Discontinuity in Big Financial Transaction Data
Recommendations
Discovering Traders' Heterogeneous Behavior in High-Frequency Financial Data
This paper develops a utility-based heterogeneous agent model for empirically investigating intraday traders' behaviors. Two types of agents, which consist of fundamental traders and technical analysts, are considered in the proposed model. They differ ...
The Role of Big Data, Data Science and Data Analytics in Financial Engineering
BDE '19: Proceedings of the 2019 International Conference on Big Data EngineeringFinancial engineering is the process of creating innovative solutions for the existing financial problems of a company by using applications of mathematical methods. Financial engineering uses tools and knowledge from the fields of computer science, big ...
Consumers Financial Distress: Prediction and Prescription Using Machine Learning
Dynamics of Information SystemsAbstractThis paper shows how transactional bank account data can be used to predict and to prevent financial distress in consumers. Machine learning methods were used to understand what are the most significant transactional behaviours that cause ...
Comments