skip to main content
Skip header Section
The data warehouse toolkit: practical techniques for building dimensional data warehousesMarch 1996
Publisher:
  • John Wiley & Sons, Inc.
  • 605 Third Ave. New York, NY
  • United States
ISBN:978-0-471-15337-5
Published:15 March 1996
Pages:
388
Skip Bibliometrics Section
Bibliometrics
Abstract

No abstract available.

Cited By

  1. Kuznetsov S, Velikhov P and Fu Q (2023). Real-Time Analytics: Benefits, Limitations, and Tradeoffs, Programming and Computing Software, 49:1, (1-25), Online publication date: 1-Feb-2023.
  2. Martin B and Davis K (2021). Multi-Temperate Logical Data Warehouse Design for Large-Scale Healthcare Data, Big Data Research, 25:C, Online publication date: 15-Jul-2021.
  3. Ben Kraiem M, Alqarni M, Feki J and Ravat F (2019). OLAP operators for social network analysis, Cluster Computing, 23:3, (2347-2374), Online publication date: 1-Sep-2020.
  4. Prakash N and Prakash D Handling the Information Backlog for Data Warehouse Development Database and Expert Systems Applications, (368-378)
  5. ACM
    Omarbekova A, Saukhanova Z, Zakirova A, Abduraimova B and Saukhanova M KPI estimation for the university faculty Proceedings of the 5th International Conference on Engineering and MIS, (1-4)
  6. Bimonte S, Ibtisam F and Boukhalfa K (2020). Logical and Physical Design of Spatial Non-Strict Hierarchies in Relational Spatial Data Warehouse, International Journal of Data Warehousing and Mining, 15:1, (1-18), Online publication date: 1-Jan-2019.
  7. Kvet M and Matiasko K Temporal Extension of the Select Statement - New Clauses Proceedings of the 23rd Conference of Open Innovations Association FRUCT, (211-216)
  8. ACM
    Saccà D, Serra E and Cuzzocrea A The Inverse Tree-OLAP Problem Proceedings of the 22nd International Database Engineering & Applications Symposium, (148-156)
  9. ACM
    Taylor S and Munguia P Towards a data archiving solution for learning analytics Proceedings of the 8th International Conference on Learning Analytics and Knowledge, (260-264)
  10. Bouadi T, Cordier M, Moreau P, Quiniou R, Salmon-Monviola J and Gascuel-Odoux C (2017). A data warehouse to explore multidimensional simulated data from a spatially distributed agro-hydrological model to improve catchment nitrogen management, Environmental Modelling & Software, 97:C, (229-242), Online publication date: 1-Nov-2017.
  11. ACM
    Imane L and Youness T State of the art in MapReduce Proceedings of the 2nd international Conference on Big Data, Cloud and Applications, (1-5)
  12. Khouri S, Berkani N and Bellatreche L (2017). Tracing data warehouse design lifecycle semantically, Computer Standards & Interfaces, 51:C, (132-151), Online publication date: 1-Mar-2017.
  13. Bimonte S, Sautot L, Journaux L and Faivre B (2017). Multidimensional Model Design using Data Mining, International Journal of Data Warehousing and Mining, 13:1, (1-35), Online publication date: 1-Jan-2017.
  14. Pedreira P, Croswhite C and Bona L (2016). Cubrick, Proceedings of the VLDB Endowment, 9:13, (1305-1316), Online publication date: 1-Sep-2016.
  15. Berrahou L, Lalande N, Serrano E, Molla G, Berti-Équille L, Bimonte S, Bringay S, Cernesson F, Grac C, Ienco D, Le Ber F and Teisseire M (2015). A quality-aware spatial data warehouse for querying hydroecological data, Computers & Geosciences, 85:PA, (126-135), Online publication date: 1-Dec-2015.
  16. ACM
    Braun L, Etter T, Gasparis G, Kaufmann M, Kossmann D, Widmer D, Avitzur A, Iliopoulos A, Levy E and Liang N Analytics in Motion Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, (251-264)
  17. ACM
    Bentayeb F, Velcin J, Bonnevay S and Darmont J (2015). Data Science and Decision Support at ERIC, ACM SIGMOD Record, 43:4, (37-42), Online publication date: 18-Feb-2015.
  18. ACM
    Liu X, Thomsen C and Pedersen T CloudETL Proceedings of the 18th International Database Engineering & Applications Symposium, (195-206)
  19. Boukraa D, Boussaid O and Bentayeb F (2014). Complex Object-Based Multidimensional Modeling and Cube Construction, Fundamenta Informaticae, 132:2, (203-238), Online publication date: 1-Apr-2014.
  20. Boukraâ D, Boussaïd O, Bentayeb F and Zegour D A layered multidimensional model of complex objects Proceedings of the 25th international conference on Advanced Information Systems Engineering, (498-513)
  21. Kern R, Stolarczyk T and Nguyen N (2013). A formal framework for query decomposition and knowledge integration in data warehouse federations, Expert Systems with Applications: An International Journal, 40:7, (2592-2606), Online publication date: 1-Jun-2013.
  22. Rehman N, Mansmann S, Weiler A and Scholl M Discovering dynamic classification hierarchies in OLAP dimensions Proceedings of the 20th international conference on Foundations of Intelligent Systems, (425-434)
  23. ACM
    Mansmann S, Rehman N, Weiler A and Scholl M Discovering OLAP dimensions in semi-structured data Proceedings of the fifteenth international workshop on Data warehousing and OLAP, (9-16)
  24. Rehman N, Mansmann S, Weiler A and Scholl M Building a Data Warehouse for Twitter Stream Exploration Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), (1341-1348)
  25. Bimonte S, Boulil K, Chanet J and Pradel M Definition and analysis of new agricultural farm energetic indicators using spatial OLAP Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part II, (373-385)
  26. Blunschi L, Jossen C, Kossmann D, Mori M and Stockinger K (2012). SODA, Proceedings of the VLDB Endowment, 5:10, (932-943), Online publication date: 1-Jun-2012.
  27. Romero O and Abelló A A comprehensive framework on multidimensional modeling Proceedings of the 30th international conference on Advances in conceptual modeling: recent developments and new directions, (108-117)
  28. Wu C, Lin W, Jiang C and Wu C (2011). Toward intelligent data warehouse mining, Expert Systems with Applications: An International Journal, 38:9, (11011-11023), Online publication date: 1-Sep-2011.
  29. Liu X and Yang J (2011). Business intelligence approach to support modelling and analysis of complex economic networks, International Journal of Networking and Virtual Organisations, 8:3/4, (281-291), Online publication date: 1-May-2011.
  30. Graefe G (2011). Modern B-Tree Techniques, Foundations and Trends in Databases, 3:4, (203-402), Online publication date: 1-Apr-2011.
  31. Bimonte S and Kang M Towards a model for the multidimensional analysis of field data Proceedings of the 14th east European conference on Advances in databases and information systems, (58-72)
  32. D'Orazio L and Bimonte S Multidimensional arrays for warehousing data on clouds Proceedings of the Third international conference on Data management in grid and peer-to-peer systems, (26-37)
  33. Marques E, Miani R, De Almeida Gago E and De Souza Mendes L Development of a business intelligence environment for e-gov using open source technologies Proceedings of the 12th international conference on Data warehousing and knowledge discovery, (203-214)
  34. Urbain J and Frieder O Exploring contextual models in chemical patent search Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval, (60-69)
  35. ACM
    Jacobs A (2009). The pathologies of big data, Communications of the ACM, 52:8, (36-44), Online publication date: 1-Aug-2009.
  36. ACM
    Jacobs A (2009). The Pathologies of Big Data, Queue, 7:6, (10-19), Online publication date: 1-Jul-2009.
  37. ACM
    Shahzad K and Johannesson P An evaluation of process warehousing approaches for business process analysis Proceedings of the International Workshop on Enterprises & Organizational Modeling and Simulation, (1-14)
  38. ACM
    Urbain J, Frieder O and Goharian N Passage relevance models for genomics search Proceedings of the 2nd international workshop on Data and text mining in bioinformatics, (45-52)
  39. Li S, Shue L and Lee S (2008). Business intelligence approach to supporting strategy-making of ISP service management, Expert Systems with Applications: An International Journal, 35:3, (739-754), Online publication date: 1-Oct-2008.
  40. Bimonte S, Tchounikine A and Bertolotto M Integration of Geographic Information into Multidimensional Models Proceeding sof the international conference on Computational Science and Its Applications, Part I, (316-329)
  41. Kamble A A conceptual model for multidimensional data Proceedings of the fifth Asia-Pacific conference on Conceptual Modelling - Volume 79, (29-38)
  42. ACM
    Kondratas E and Timko I CT-OLAP Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, (81-88)
  43. Martín L, Bonsma E, Anguita A, Vrijnsen J, García-Remesal M, Crespo J, Tsiknakis M and Maojo V Data access and management in ACGT Proceedings of the 2007 conference on Advances in conceptual modeling: foundations and applications, (24-33)
  44. Cabanac G, Chevalier M, Ravat F and Teste O An annotation management system for multidimensional databases Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery, (89-98)
  45. ACM
    Poess M Controlled SQL query evolution for decision support benchmarks Proceedings of the 6th international workshop on Software and performance, (38-41)
  46. Sahama T and Croll P A data warehouse architecture for clinical data warehousing Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68, (227-232)
  47. Tseng F and Chou A (2006). The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence, Decision Support Systems, 42:2, (727-744), Online publication date: 1-Nov-2006.
  48. Bebel B, Królikowski Z and Wrembel R Managing evolution of data warehouses by means of nested transactions Proceedings of the 4th international conference on Advances in Information Systems, (119-128)
  49. Annoni E, Ravat F, Teste O and Zurfluh G Towards multidimensional requirement design Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery, (75-84)
  50. Ravat F, Teste O and Zurfluh G A multiversion-based multidimensional model Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery, (65-74)
  51. Schewe K and Thalheim B Component-driven engineering of database applications Proceedings of the 3rd Asia-Pacific conference on Conceptual modelling - Volume 53, (105-114)
  52. Mazón J, Trujillo J, Serrano M and Piattini M Improving the development of data warehouses by enriching dimension hierarchies with WordNet Proceedings of the First and Second VLDB conference on Ontologies-based databases and information systems, (85-101)
  53. Poess M and Nambiar R Large scale data warehouses on grid Proceedings of the 31st international conference on Very large data bases, (1055-1066)
  54. Tseng F and Chen C (2005). Integrating heterogeneous data warehouses using XML technologies, Journal of Information Science, 31:3, (209-229), Online publication date: 1-Jun-2005.
  55. ACM
    Zepeda L, Celma M and Zatarain R A methodological framework for conceptual data warehouse design Proceedings of the 43rd annual Southeast regional conference - Volume 1, (256-259)
  56. Schewe K and Zhao J Balancing redundancy and query costs in distributed data warehouses Proceedings of the 2nd Asia-Pacific conference on Conceptual modelling - Volume 43, (97-105)
  57. Kohavi R, Mason L, Parekh R and Zheng Z (2004). Lessons and Challenges from Mining Retail E-Commerce Data, Machine Language, 57:1-2, (83-113), Online publication date: 1-Oct-2004.
  58. Poess M and Stephens J Generating thousand benchmark queries in seconds Proceedings of the Thirtieth international conference on Very large data bases - Volume 30, (1045-1053)
  59. Ohigashi M and Tanaka Y 3D space framework for the multi-facet accessing of database records Proceedings of the 2004 international conference on Intuitive Human Interfaces for Organizing and Accessing Intellectual Assets, (142-158)
  60. Zaho J and Schewe K Using abstract state machines for distributed data warehouse design Proceedings of the first Asian-Pacific conference on Conceptual modelling - Volume 31, (49-58)
  61. ACM
    Stephens J and Poess M MUDD Proceedings of the 4th international workshop on Software and performance, (104-109)
  62. ACM
    Stephens J and Poess M (2004). MUDD, ACM SIGSOFT Software Engineering Notes, 29:1, (104-109), Online publication date: 1-Jan-2004.
  63. ACM
    Rozeva A Index structure for the fact table of a star-join schema and template query processing Proceedings of the 4th international conference conference on Computer systems and technologies: e-Learning, (153-158)
  64. ACM
    Padmanabhan S, Bhattacharjee B, Malkemus T, Cranston L and Huras M Multi-dimensional clustering Proceedings of the 2003 ACM SIGMOD international conference on Management of data, (637-641)
  65. ACM
    Mendelzon A and Pu K Concise descriptions of subsets of structured sets Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, (123-133)
  66. Khatri V, Ram S, Snodgrass R and O'Brien G (2002). Supporting User-Defined Granularities in a Spatiotemporal Conceptual Model, Annals of Mathematics and Artificial Intelligence, 36:1-2, (195-232), Online publication date: 4-Sep-2002.
  67. ACM
    Weininger A Efficient execution of joins in a star schema Proceedings of the 2002 ACM SIGMOD international conference on Management of data, (542-545)
  68. ACM
    Hurtado C and Mendelzon A OLAP dimension constraints Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, (169-179)
  69. Chaudhuri S and Dayal U Data and knowledge in database systems Handbook of data mining and knowledge discovery, (81-85)
  70. Pedersen D, Riis K and Pedersen T A powerful and SQL-compatible data model and query language for OLAP Proceedings of the 13th Australasian database conference - Volume 5, (121-130)
  71. ACM
    Schlesinger L, Bauer A, Lehner W, Ediberidze G and Gutzmann M Efficiently synchronizing multidimensional schema data Proceedings of the 4th ACM international workshop on Data warehousing and OLAP, (69-76)
  72. ACM
    Weininger A XPS a database server for data warehousing Proceedings of the 4th ACM international workshop on Data warehousing and OLAP, (64-69)
  73. ACM
    Pokorny J Modelling stars using XML Proceedings of the 4th ACM international workshop on Data warehousing and OLAP, (24-31)
  74. Jablonski S, Horn S and Schlundt M Process Oriented Knowledge Management Proceedings of the 11th International Workshop on research Issues in Data Engineering
  75. ACM
    Chen C, Cochinwala M, Petrone C, Pucci M, Samtani S and Santa P (2000). Internet traffic warehouse, ACM SIGMOD Record, 29:2, (550-558), Online publication date: 1-Jun-2000.
  76. ACM
    Chen C, Cochinwala M, Petrone C, Pucci M, Samtani S and Santa P Internet traffic warehouse Proceedings of the 2000 ACM SIGMOD international conference on Management of data, (550-558)
  77. ACM
    Tolkin S Aggregation everywhere Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, (79-86)
  78. ACM
    Chen C, Cochinwala M and Yueh E Dealing with slow-evolving fact Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, (22-29)
  79. ACM
    Boehnlein M and Ulbrich-vom Ende A Deriving initial data warehouse structures from the conceptual data models of the underlying operational information systems Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP, (15-21)
  80. Zomaya A, El-Ghazawi T and Frieder O (1999). Guest Editors' Introduction, IEEE Concurrency, 7:4, (11-13), Online publication date: 1-Oct-1999.
  81. Jagadish H, Lakshmanan L and Srivastava D What can Hierarchies do for Data Warehouses? Proceedings of the 25th International Conference on Very Large Data Bases, (530-541)
  82. Pedersen T, Jensen C and Dyreson C Extending Practical Pre-Aggregation in On-Line Analytical Processing Proceedings of the 25th International Conference on Very Large Data Bases, (663-674)
  83. Datta A, Ramamritham K and Thomas H Curio Proceedings of the 25th International Conference on Very Large Data Bases, (730-733)
  84. ACM
    Bose R and Sugumaran V (1999). Application of intelligent agent technology for managerial data analysis and mining, ACM SIGMIS Database: the DATABASE for Advances in Information Systems, 30:1, (77-94), Online publication date: 1-Jan-1999.
  85. Helmer S, Westmann T and Moerkotte G Diag-Join Proceedings of the 24rd International Conference on Very Large Data Bases, (98-109)
  86. Moerkotte G Small Materialized Aggregates Proceedings of the 24rd International Conference on Very Large Data Bases, (476-487)
  87. Baralis E, Paraboschi S and Teniente E Materialized Views Selection in a Multidimensional Database Proceedings of the 23rd International Conference on Very Large Data Bases, (156-165)
Contributors

Recommendations

Lou Agosta

When data warehouse construction is driven by the need to understand customers, products, and key business events, data warehousing completes the promise of the client/server initiative. The promise is to provide access to data in a timely, flexible, and accurate manner. For those wondering what a data warehouse is, whether they might already be operating one, how to tell if one is needed, and how to build one, Kimball provides the answers. The fundamental distinction to be made in successful data warehouse construction is that between day-to-day operations and business strategy. The processes making up operations include highly granular transactions stored at the detail level, and online transactions processing (OLTP). This is the stuff of classic order entry, inventory control, and general ledger. Construction decisions require insight and vision about the performance and objectives of the business, based on facts about the business, delivered fast. Thus, data warehousing consists of modeling the business in terms of basic central facts (units sold or delivered, captured and summarized from operations) in relation to the fundamental dimensions that constitute the business over time. Typically, this results in a multidimensional model: a fact structure surrounded by product, customer (market), and time (history). This is the celebrated “star schema” discussed in the popular trade journals. Kimball claims his work is consistent with the online analytic processing (OLAP) movement, with one difference. His approach is “open,” employing de facto industry standard relational technology, whereas OLAP is still proprietary and, more important, not robust enough to scale to the enterprise level. Chapters 2 through 9 take the reader through a series of progressively more abstract examples of dimensional data modeling—in the grocery store, the warehouse, shipping, banking services, cable TV subscription services, and casualty insurance. A word of caution: if you are interested in a specific industry, you should not jump immediately to that chapter. It builds on groundwork laid earlier. For computing professionals, the wealth of functional business distinctions, especially in connection with the relational database model, is instructive. There are useful suggestions on how the relational model can be extended as well as on measures needed in application code until such extensions occur. For example, measures that record a static level, such as inventory and financial account balances, are not additive across time. Balances cannot simply be added, but must be averaged by time period. Since the SQL avg function considers rows returned, not time periods relevant ( periodavg ), an average period sum must be calculated in an application or by means of a proprietary SQL extension. The size of the performance challenge of data warehousing can be appreciated by considering product, customer, and time dimensions of an average of 10,000 distinctions each. Without sparsity (not all combinations occur), the result is a combination on the order of 100 billion rows. Naturally, the problem is made worse for phone companies and banks, which have millions of customers. (See chapter 6, “The Big Dimensions.”) Kimball claims that the limits of current relational technology, circa 1995, are reached at about 1 billion rows, or about 100 gigabytes. The answer is considered in chapter 13, on aggregation. Since an endless horizon of business days tends to cause combinatorial explosion of the facts at an elementary level, it is useful to define aggregations (summations) that group 20 or more facts together. Combine and store the data on a weekly or monthly basis, rather than daily. The tradeoff is between more work, transforming data in long-running batch processes prior to loading, and quicker online response time to queries submitted interactively. The book contains a wealth of practical advice for information technology practitioners. For example, when the relative size of the central fact table is compared with that of the surrounding dimension structures (differences of orders of magnitude are common), it is clear that little disk space can be saved in normalizing the latter. The book is superbly prepared. It includes a complete glossary of terms, appendices, an index—though no bibliography—and a useful summary of design principles of a dimensional data warehouse. It comes with a CD-ROM, which contains an ACCESS version of the databases described in the book and sample queries against the databases. The continuity with the disciplines of data modeling, data administration, and data mining is useful and productive. Much of what occurs in decomposing data into relational structure by means of Codd's normal forms is relevant here, but with a new spin. The structure of the data gives us insight into the kinds of questions that might be asked. Thus, the prospect of packaging a large but finite set of SQL queries can be envisioned. Kimball has taken the hype out of data warehousing and has shown its importance to business practices as an application of relational technology.

Alfs T. Berztiss

This excellent book describes established data warehouse technology—dependable data warehouses that can be built now. Far too often, writers try to combine coverage of established practice with the most recent research findings, which can leave readers confused. Although this author has outstanding research credentials, he has resisted the temptation to satisfy different reader interests at once. This book is not scholarly, so there are no references. There is also no mention of such “hot” topics as data mining or object-oriented databases. That is how it should be. The book is of primary interest to practitioners who want to know how to set up a data warehouse right now, but researchers will find in the book a good indication of the real needs of the business community. Data processing has two aspects: online processing of incoming data in a transaction database, and the analysis of historical data in a data warehouse in support of management decision making. The thesis of this book is that the two kinds of activities are quite distinct, that the transaction database and the data warehouse should be separate, and that different software systems are needed to manage them. The author expands on this main theme by examining several very large data warehouses he has himself designed for client companies (10 of 17 chapters). He then looks at the administrative side (5 chapters). In the final chapter, he takes a brief look into the future, but this is restricted to technological advances. Throughout the text, new design principles are introduced, which are then brought together in an appendix. Another appendix holds a detailed outline of the process for implementing a data warehouse. A CD-ROM supports the text. It contains sample data relating to the case studies, and software for managing the warehouse.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.