skip to main content
10.1109/MICRO.2012.24acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Vector Extensions for Decision Support DBMS Acceleration

Published:01 December 2012Publication History

ABSTRACT

Database management systems (DBMS) have become an essential tool for industry and research and are often a significant component of data centres. As a result of this criticality, efficient execution of DBMS engines has become an important area of investigation. This work takes a top-down approach to accelerating decision support systems (DSS) on x86-64 microprocessors using vector ISA extensions. In the first step, a leading DSS DBMS is analysed for potential data-level parallelism. We discuss why the existing multimedia SIMD extensions (SSE/AVX) are not suitable for capturing this parallelism and propose a complementary instruction set reminiscent of classical vector architectures. The instruction set is implemented using unintrusive modifications to a modern x86-64 micro architecture tailored for DSS DBMS. The ISA and micro architecture are evaluated using a cycle-accurate x86-64 micro architectural simulator coupled with a highly-detailed memory simulator. We have found a single operator is responsible for 41% of total execution time for the TPC-H DSS benchmark. Our results show performance speedups between 1.94x and 4.56x for an implementation of this operator run with our proposed hardware modifications.

References

  1. M. Abrash, "A First Look at the Larrabee New Instructions (LRBni)," http://drdobbs.com/high-performance-computing/216402188, 2009, accessed on 2011-09-08.Google ScholarGoogle Scholar
  2. D. Abts et al., "The Cray BlackWidow: A Highly Scalable Vector Multiprocessor," in Proceedings of the ACM/IEEE Conference on Supercomputing, 2007, pp. 17:1-17:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Actian, "Vectorwise. Record Breaking Action Engine for Big Data," http://www.actian.com/products/vectorwise.Google ScholarGoogle Scholar
  4. K. Asanovic, "Vector Microprocessors," Ph.D. dissertation, EECS Department, University of California, Berkeley, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. A. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," in Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, pp. 3-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Boncz, S. Manegold, and M. L. Kersten, "Database Architecture Optimized for the New Bottleneck: Memory Access," in Proceedings of the 25th International Conference on Very Large Data Bases, 1999, pp. 54-65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. A. Boncz, M. Zukowski, and N. Nes, "MonetDB/X100: Hyper-Pipelining Query Execution," in Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research, 2005, pp. 225-237.Google ScholarGoogle Scholar
  8. S. Chen et al., "Improving Hash Join Performance through Prefetching," in Proceedings of the 20th International Conference on Data Engineering, 2004, pp. 116-127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. P. Copeland and S. N. Khoshafian, "A Decomposition Storage Model," in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1985, pp. 268-279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Corbal, M. Valero, and R. Espasa, "Exploiting a New Level of DLP in Multimedia Applications," in Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, 1999, pp. 72-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Espasa et al., "Tarantula: A Vector Extension to the Alpha Architecture," in Proceedings of the 29th Annual International Symposium on Computer Architecture, 2002, pp. 281-292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Espasa, M. Valero, and J. E. Smith, "Out-of-Order Vector Architectures," in Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 1997, pp. 160-170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Espasa, M. Valero, and J. E. Smith, "Vector Architectures: Past, Present and Future," in Proceedings of the 12th International Conference on Supercomputing, 1998, pp. 425-432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Gschwind et al., "Synergistic Processing in Cell's Multicore Architecture," IEEE Micro, vol. 26, no. 2, pp. 10-24, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. He et al., "Relational Query Coprocessing on Graphics Processors," ACM Transactions on Database Systems, vol. 34, no. 4, pp. 21:1-21:39, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Héman et al., "Vectorized Data Processing on the Cell Broadband Engine," in Proceedings of the 3rd International Workshop on Data Management on New Hardware, 2007, pp. 4:1-4:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. U. Hoelzle and L. A. Barroso, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st ed. Morgan and Claypool Publishers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ingres, "Ingres/VectorWise sneak Preview on the Intel Xeon Processor 5500 series-based platform," white paper, 2009.Google ScholarGoogle Scholar
  19. Intel®64 and IA-32 Architectures Optimization Reference Manual, Intel, June 2011.Google ScholarGoogle Scholar
  20. Intel®Advanced Vector Extensions Programming Reference, Intel, June 2011.Google ScholarGoogle Scholar
  21. B. Jacob, S. Ng, and D. Wang, Memory Systems: Cache, DRAM, Disk, 1st ed. Morgan Kaufmann Publishers Inc., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Kim et al., "Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs," Proceedings of The VLDB Endowment, vol. 2, no. 2, pp. 1378-1389, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Kozyrakis and D. Patterson, "Vector Vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks," in Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, 2002, pp. 283-293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Lee et al., "Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators," in Proceedings of the 38th Annual International Symposium on Computer Architecture, 2011, pp. 129-140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Lemuet et al., "The Potential Energy Efficiency of Vector Acceleration," in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Martin, "A Vectorized Hash-Join," 1996, iRAM technical report, University of California at Berkeley.Google ScholarGoogle Scholar
  27. S. Meki and Y. Kambayashi, "Acceleration of Relational Database Operations on Vector Processors," Systems and Computers in Japan, vol. 31, no. 8, pp. 79-88, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  28. W. Oed and M. Walker, "An Overview of Cray Research Computers including the Y-MP/C90 and the new MPP T3D," in Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, 1993, pp. 271-272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-Effective Superscalar Processors," in Proceedings of the 24th Annual International Symposium on Computer Architecture, 1997, pp. 206-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Quintana et al., "Adding a Vector Unit to a Superscalar Processor," in Proceedings of the 13th International Conference on Supercomputing, 1999, pp. 1-10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. K. Raman, V. Pentkovski, and J. Keshava, "Implementing Streaming SIMD Extensions on the Pentium III Processor," IEEE Micro, vol. 20, no. 4, pp. 47-57, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Rosenfeld, E. Cooper-Balis, and B. Jacob, "DRAMSim2: A Cycle Accurate Memory System Simulator," IEEE Computer Architecture Letters, vol. 10, no. 1, pp. 16-19, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Schonauer, Scientific Computing on Vector Computers. Elsevier Science Publisher B.V., 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1-18:15, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. T. Slingerland and A. J. Smith, "Multimedia Extensions for General Purpose Microprocessors: A Survey," Microprocessors and Microsystems, vol. 29, no. 5, pp. 225-246, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  36. J. E. Smith, G. Faanes, and R. Sugumar, "Vector Instruction Set Support for Conditional Operations," in Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 260-269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Sompolski, M. Zukowski, and P. Boncz, "Vectorization vs. Compilation in Query Execution," in Proceedings of the 7th International Workshop on Data Management on New Hardware, 2011, pp. 33-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Srinivasan et al., "CMP Memory Modeling: How Much Does Accuracy Matter?" in Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, 2009, pp. 24-33.Google ScholarGoogle Scholar
  39. Transaction Processing Performance Council, "TPC-H Standard Specification v2.14.2," http://www.tpc.org/tpch/, 2011.Google ScholarGoogle Scholar
  40. T. Willhalm et al., "SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units," Proceedings of The VLDB Endowment, vol. 2, no. 1, pp. 385-394, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. T. Yourst, "PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator," in Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 23-34.Google ScholarGoogle ScholarCross RefCross Ref
  42. J. Zhou and K. A. Ross, "Implementing Database Operations Using SIMD Instructions," in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2002, pp. 145-156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zukowski, "Balancing Vectorized Query Execution with Bandwidth-Optimized Storage," Ph.D. dissertation, Universiteit van Amsterdam, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Vector Extensions for Decision Support DBMS Acceleration

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader