skip to main content
Physical database design decision algorithms and concurrent reorganization for parallel database systems
Publisher:
  • University of Toronto
  • Computer Center Toronto, Ont. M5S 1A1
  • Canada
ISBN:978-0-612-35386-2
Order Number:AAINQ35386
Pages:
277
Bibliometrics
Skip Abstract Section
Abstract

Stringent performance requirements in DB applications have led to the use of parallelism for database processing. To allow the database system to take advantage of the performance of parallel shared-nothing systems, the physical DB design must be appropriate for the DB structure and the workload.

We develop decision algorithms that will select a good physical DB design both when the DB is first loaded into the system (static decision) and while the DB is being used by the workload (dynamic decision). Our decision algorithms take the database structure, workload, and system characteristics as inputs. The static (or initial) physical DB design decision algorithm involves: selecting a partitioning attribute for each relation that determines how the relation is fragmented across the nodes (allowing for high I/O bandwidth); (1) selecting indexes on the relation attributes to allow faster accesses compared to sequential file scans; (2) selecting the attributes by which to cluster a relation in order to take advantage of the prefetching and caching involved in 1/0 access; (3) grouping of relations to allow DB operations (joins) on relation pairs to be executed locally at each node, thus reducing communication costs; (4) selecting the number of partitions per relation group (and thus per relation); and (5) assigning each partition of each relation to a specific system node. Our studies show that, among the algorithms we studied, an algorithm based on a branch-and bound strategy finds designs with the lowest estimated workload average response time and requires an acceptable amount of computation.

The physical DB design reorganization problem, which is the dynamic DB design decision, involves determining how to change the current physical DB design based on new DB structure, workload, and system information. If a new physical DB design is chosen, a strategy to move from the old to the new design must also be identified by the algorithm. A formula is developed to calculate both the benefit resulting after a reorganization is complete and the cost of performing the reorganization. The value from this formula for a specific reorganization is called the ( net ) gain metric, and this metric is used by a decision algorithm to compare reorganizations and to select the reorganization for which the benefit most exceeds its cost. We also develop a method to estimate the costs of executing a reorganization with a workload, and we provide some decision algorithms. The selection of the priority level at which to run the reorganization processes concurrently with the workload is investigated. Our studies indicate that a low priority for the reorganization process compared to the priorities for the workload processes is often but not always best.

Cited By

  1. Mahgoub A, Wood P, Medoff A, Mitra S, Meyer F, Chaterji S and Bagchi S SOPHIA Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, (223-239)
  2. ACM
    Liming D, Weidong L and Jie S Coexistence of Multiple Partition Plan Based Physical Database Design Proceedings of the 5th International Conference on Communications and Broadband Networking, (41-46)
  3. ACM
    Van Aken D, Pavlo A, Gordon G and Zhang B Automatic Database Management System Tuning Through Large-scale Machine Learning Proceedings of the 2017 ACM International Conference on Management of Data, (1009-1024)
  4. ACM
    Golab L, Hadjieleftheriou M, Karloff H and Saha B Distributed data placement to minimize communication costs via graph partitioning Proceedings of the 26th International Conference on Scientific and Statistical Database Management, (1-12)
  5. Shi X, Lv Y, Shao Y and Cui B bCATE Proceedings of the 14th international conference on Web-Age Information Management, (769-780)
  6. ACM
    Pavlo A, Curino C and Zdonik S Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, (61-72)
  7. ACM
    Nehme R and Bruno N Automated partitioning design in parallel database systems Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, (1137-1148)
  8. Kołaczkowski P and Rybiński H Online index selection in RDBMS by evolutionary approach Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II, (475-484)
  9. Curino C, Jones E, Zhang Y and Madden S (2010). Schism, Proceedings of the VLDB Endowment, 3:1-2, (48-57), Online publication date: 1-Sep-2010.
  10. Dimovski A, Velinov G and Sahpaski D Horizontal partitioning by predicate abstraction and its application to data warehouse design Proceedings of the 14th east European conference on Advances in databases and information systems, (164-175)
  11. ACM
    Jones E, Abadi D and Madden S Low overhead concurrency control for partitioned main memory databases Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, (603-614)
  12. ACM
    Sockut G and Iyer B (2009). Online reorganization of databases, ACM Computing Surveys (CSUR), 41:3, (1-136), Online publication date: 1-Jul-2009.
  13. ACM
    Agrawal S, Narasayya V and Yang B Integrating vertical and horizontal partitioning into automated physical database design Proceedings of the 2004 ACM SIGMOD international conference on Management of data, (359-370)
  14. Ganesan P, Bawa M and Garcia-Molina H Online balancing of range-partitioned data with applications to peer-to-peer systems Proceedings of the Thirtieth international conference on Very large data bases - Volume 30, (444-455)
Contributors
  • University of Toronto
  • International Business Machines

Recommendations