skip to main content
10.1145/3035918.3064052acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Query Centric Partitioning and Allocation for Partially Replicated Database Systems

Published:09 May 2017Publication History

ABSTRACT

A key feature of database systems is to provide transparent access to stored data. In distributed database systems, this includes data allocation and fragmentation. Transparent access introduces data dependencies and increases system complexity and inter-process communication. Therefore, many developers are exchanging transparency for better scalability using sharding and similar techniques. However, explicitly managing data distribution and data flow requires a deep understanding of the distributed system and the data access, and it reduces the possibilities for optimizations.

To address this problem, we present an approach for efficient data allocation that features good scalability while keeping the data distribution transparent. We propose a workload-aware, query-centric, heterogeneity-aware analytical model. We formalize our approach and present an efficient allocation algorithm. The algorithm optimizes the partitioning and data layout for local query execution and balances the workload on homogeneous and heterogeneous systems according to the query history. In the evaluation, we demonstrate that our approach scales well in performance for OLTP- and OLAP-style workloads and reduces storage requirements significantly over replicated systems while guaranteeing configurable availability.

References

  1. R. Agrawal, S. Chaudhuri, A. Das, and V. R. Narasayya. Automating Layout of Relational Databases. In ICDE, pages 607--618, 2003.Google ScholarGoogle Scholar
  2. I. Ahmad, K. Karlapalem, Y.-K. Kwok, and S.-K. So. Evolutionary Algorithms for Allocating Data in Distributed Database Systems. Distributed and Parallel Databases, 11(1):5--32, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Alba, V. Bhagwan, M. Ching, A. Cozzi, R. Desai, D. Gruhl, K. Haas, L. Kato, J. Kusnitz, B. Langston, F. Nagy, L. Nguyen, J. Pieper, S. Srinivasan, A. Stuart, and R. Tang. A Funny Thing Happened on the Way to a Billion... IEEE Data Engineering Bulletin, 31(4):27--36, 2006.Google ScholarGoogle Scholar
  4. P. A. Alsberg and J. D. Day. A Principle for Resilient Sharing of Distributed Resources. In ICSE, pages 562--570, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. M. Amdahl. Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities. In AFIPS, pages 483--485, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. R. Amossen. Vertical Partitioning of Relational OLTP Databases Using Integer Programming. In ICDEW, pages 93--98, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Amza, A. L. Cox, and W. Zwaenepoel. Distributed Versioning: Consistent Replication for Scaling Back-End Databases of Dynamic Content Web Sites. In Middleware 2003, pages 282--304. Springer Berlin Heidelberg, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. M. G. Apers. Data Allocation in Distributed Database Systems. ACM Transactions on Database Systems, 13(3):263--304, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. A. Barroso and U. Hölzle. The Case for Engergy-Proportional Computing. IEEE Computer, 40(12):33--37, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H.-G. Beyer. Theory of Evolution Strategies. Springer Berlin / Heidelberg, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Bhattacharjee, M. Canim, C. A. Lang, G. A. Mihaila, and K. A. Ross. Storage Class Memory Aware Data Management. IEEE Data Engineering Bulletin, 33(4):35--40, 2010.Google ScholarGoogle Scholar
  12. A. Borodin and R. El-Yaniv. Online Computation and Competitive Analysis. Cambridge University Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Burkard, M. Dell'Amico, and S. Martello. Assignment Problems. Society for Industrial and Applied Mathematics, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Calder, C. Krintz, S. John, and T. Austin. Cache-Conscious Data Placement. SIGOPS Operation System Review, 32(5):139--149, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Canim, B. Bhattacharjee, G. A. Mihaila, C. A. Lang, and K. A. Ross. An Object Placement Advisor for DB2 Using Solid State Storage. PVLDB, 2(2):1318--1329, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Cecchet. RAIDb: Redundant Array of Inexpensive Databases. In ISPA, pages 115--125, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Cecchet, J. Marguerite, and W. Zwaenepoel. C-JDBC: Flexible Database Clustering Middleware. In USENIX, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. A. Ceri, M. Negri, and G. Pelagatti. Horizontal Data Partitioning in Database Design. In SIGMOD, pages 128--136, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. G. Coffman, M. R. Garey, and D. S. Johnson. Approximation Algorithms for Bin Packing: A Survey. In D. S. Hochbaum, editor, Approximation Algorithms for NP-Hard Problems, chapter 2. PWS Publishing Company, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Copeland, W. Alexander, E. Boughter, and T. Keller. Data Placement in Bubba. SIGMOD Record, 17(3):99--108, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: A Workload-Driven Approach to Database Replication and Partitioning. PVLDB, 3(1--2):48--57, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. CACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database Machine. In VLDB, pages 228--237, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Elnikety, S. Dropsho, and F. Pedone. Tashkent: Uniting Durability with Transaction Ordering for High-Performance Scalable Database Replication. In SIGOPS/EuroSys, pages 117--130, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Elnikety, S. Dropsho, and W. Zwaenepoel. Tashkent: Memory-Aware Load Balancing and Update Filtering in Replicated Databases. In EuroSys, pages 399--412, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Englert, J. Gray, T. Kocher, and P. Shah. A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases. Technical Report 89.4, Tandem Computers Inc., 1989.Google ScholarGoogle Scholar
  27. P. Ganesan, M. Bawa, and H. Garcia-Molina. Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems. In VLDB, pages 444--455. VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Gray. Why Do Computers Stop and What Can be Done About It? Technical Report 85.7, Tandem Computers, 1985.Google ScholarGoogle Scholar
  29. J. O. Hauglid, N. H. Ryeng, and K. Nørvåg. DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database Systems. Distributed and Parallel Databases, 28(2):157--185, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. A. Helal, A. A. Heddaya, and B. B. Bhargava. Replication Techniques in Distributed Systems. Springer, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. M. Hellerstein, M. Stonebraker, and J. Hamilton. Architecture of a Database System. Foundations and Trends in Databases, 1(2):141--259, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Kemme and G. Alonso. Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication. In VLDB, pages 134--143, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. W. Kuhn. The Hungarian Method for the Assignment Problem. Naval Research Logistic, 52(1):7--21, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  34. R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Providing High Availability Using Lazy Replication. ACM Transactions on Computer Systems, 10:360--391, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Operating Systems Review, 44(2):35--40, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Leverich and C. Kozyrakis. On the Energy (In)efficiency of Hadoop Clusters. In HotPower, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Li, J. Naughton, and R. V. Nehme. Resource Bricolage for Parallel Database Systems. PVLDB, 8:25--36, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. M. Milan-Franco, R. Jiménez-Peris, M. P. no Martínez, and B. Kemme. Adaptive Middleware for Data Replication. In Middleware, pages 175--194, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Moscato. On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts: Towards Memetic Algorithms. Technical Report Caltech Concurrent Computation Program 158--79, California Institute of Technology, Pasadena, CA, USA, 1989.Google ScholarGoogle Scholar
  40. J. Munkres. Algorithms for the Assignment and Transportation Problems. Journal of the Society of Industrial and Applied Mathematics, 5(1):32--38, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  41. O. Ozmen, K. Salem, J. Schindler, and S. Daniel. Workload-Aware Storage Layout for Database Systems. In SIGMOD, pages 939--950, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. Özsu and P. Valduriez. Principles of Distributed Database Systems, Third Edition. Springer New York, Upper Saddle River, NJ, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Papadomanolakis, D. Dash, and A. Ailamaki. Efficient Use of the Query Optimizer for Automated Physical Design. In VLDB, pages 1093--1104. VLDB Endowment, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Patino-Martínez, R. Jimínez-Peris, B. Kemme, and G. Alonso. MIDDLE-R: Consistent Database Replication at the Middleware Level. ACM Transactions on Computer Systems, 23(4):375--423, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. Pavlo, C. Curino, and S. Zdonik. Skew-aware Automatic Database Partitioning in Shared-nothing, Parallel OLTP Systems. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Plattner, G. Alonso, and M. T. Özsu. Extending DBMSs with Satellite Databases. VLDBJ, 17(4):657--682, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Quamar, K. A. Kumar, and A. Deshpande. SWORD: Scalable Workload-aware Data Placement for Transactional Workloads. In EDBT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. Rabl. Efficiency in Cluster Database Systems. PhD thesis, University of Passau, 2011.Google ScholarGoogle Scholar
  49. P. Rubel, M. Gillen, J. Loyall, R. Schantz, A. Gokhale, J. Balasubramanian, A. Paulos, and P. Narasimhan. Fault Tolerant Approaches for Distributed Real-time and Embedded Systems. In MILCOM, pages 1--8. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  50. C. Rusu, A. Ferreira, C. Scordino, and A. Watson. Energy-Efficient Real-Time Heterogeneous Server Clusters. In RTAS, pages 418--428, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. D. Sacca and G. Wiederhold. Database Partitioning in a Cluster of Processors. ACM Transactions on Database Systems, 10(1):29--56, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. Scheuermann, G. Weikum, and P. Zabback. Data Partitioning and Load Balancing in Parallel Disk Systems. VLDBJ, 7(1):48--66, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H.-P. Schwefel and G. Rudolph. Contemporary Evolution Strategies. In Proceedings of the Third European Conference on Advances in Artificial Life, volume 929 of Lecture Notes in Computer Science, pages 893--907, Berlin, Germany, 1995. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. S. Shankland. Google spotlights data center inner workings, May 2008. https://www.cnet.com/news/google-spotlights-data-center-inner -workings/.Google ScholarGoogle Scholar
  55. M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MapReduce and Parallel DBMSs: Friends or Foes? CACM, 53(1):64--71, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In VLDB, pages 553--564. VLDB Endowment, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu. Mariposa: A Wide-Area Distributed Database System. VLDBJ, 5(1):48--63, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. A. Syropoulos. Mathematics of Multisets. In WMP, volume 2235 of Lecture Notes in Computer Science, pages 347--358, Berlin, Germany, 2001. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S. Voß. Meta-heuristics: The State of the Art. In ECAI, pages 1--23, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In SIGMOD, pages 17--30, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Query Centric Partitioning and Allocation for Partially Replicated Database Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
          May 2017
          1810 pages
          ISBN:9781450341974
          DOI:10.1145/3035918

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 May 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader