ABSTRACT
Database management systems (DBMS) have become an essential tool for industry and research and are often a significant component of data centres. As a result of this criticality, efficient execution of DBMS engines has become an important area of investigation. This work takes a top-down approach to accelerating decision support systems (DSS) on x86-64 microprocessors using vector ISA extensions. In the first step, a leading DSS DBMS is analysed for potential data-level parallelism. We discuss why the existing multimedia SIMD extensions (SSE/AVX) are not suitable for capturing this parallelism and propose a complementary instruction set reminiscent of classical vector architectures. The instruction set is implemented using unintrusive modifications to a modern x86-64 micro architecture tailored for DSS DBMS. The ISA and micro architecture are evaluated using a cycle-accurate x86-64 micro architectural simulator coupled with a highly-detailed memory simulator. We have found a single operator is responsible for 41% of total execution time for the TPC-H DSS benchmark. Our results show performance speedups between 1.94x and 4.56x for an implementation of this operator run with our proposed hardware modifications.
- M. Abrash, "A First Look at the Larrabee New Instructions (LRBni)," http://drdobbs.com/high-performance-computing/216402188, 2009, accessed on 2011-09-08.Google Scholar
- D. Abts et al., "The Cray BlackWidow: A Highly Scalable Vector Multiprocessor," in Proceedings of the ACM/IEEE Conference on Supercomputing, 2007, pp. 17:1-17:12. Google ScholarDigital Library
- Actian, "Vectorwise. Record Breaking Action Engine for Big Data," http://www.actian.com/products/vectorwise.Google Scholar
- K. Asanovic, "Vector Microprocessors," Ph.D. dissertation, EECS Department, University of California, Berkeley, 1998. Google ScholarDigital Library
- L. A. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," in Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, pp. 3-14. Google ScholarDigital Library
- P. A. Boncz, S. Manegold, and M. L. Kersten, "Database Architecture Optimized for the New Bottleneck: Memory Access," in Proceedings of the 25th International Conference on Very Large Data Bases, 1999, pp. 54-65. Google ScholarDigital Library
- P. A. Boncz, M. Zukowski, and N. Nes, "MonetDB/X100: Hyper-Pipelining Query Execution," in Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research, 2005, pp. 225-237.Google Scholar
- S. Chen et al., "Improving Hash Join Performance through Prefetching," in Proceedings of the 20th International Conference on Data Engineering, 2004, pp. 116-127. Google ScholarDigital Library
- G. P. Copeland and S. N. Khoshafian, "A Decomposition Storage Model," in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1985, pp. 268-279. Google ScholarDigital Library
- J. Corbal, M. Valero, and R. Espasa, "Exploiting a New Level of DLP in Multimedia Applications," in Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, 1999, pp. 72-79. Google ScholarDigital Library
- R. Espasa et al., "Tarantula: A Vector Extension to the Alpha Architecture," in Proceedings of the 29th Annual International Symposium on Computer Architecture, 2002, pp. 281-292. Google ScholarDigital Library
- R. Espasa, M. Valero, and J. E. Smith, "Out-of-Order Vector Architectures," in Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 1997, pp. 160-170. Google ScholarDigital Library
- R. Espasa, M. Valero, and J. E. Smith, "Vector Architectures: Past, Present and Future," in Proceedings of the 12th International Conference on Supercomputing, 1998, pp. 425-432. Google ScholarDigital Library
- M. Gschwind et al., "Synergistic Processing in Cell's Multicore Architecture," IEEE Micro, vol. 26, no. 2, pp. 10-24, 2006. Google ScholarDigital Library
- B. He et al., "Relational Query Coprocessing on Graphics Processors," ACM Transactions on Database Systems, vol. 34, no. 4, pp. 21:1-21:39, 2009. Google ScholarDigital Library
- S. Héman et al., "Vectorized Data Processing on the Cell Broadband Engine," in Proceedings of the 3rd International Workshop on Data Management on New Hardware, 2007, pp. 4:1-4:6. Google ScholarDigital Library
- U. Hoelzle and L. A. Barroso, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st ed. Morgan and Claypool Publishers, 2009. Google ScholarDigital Library
- Ingres, "Ingres/VectorWise sneak Preview on the Intel Xeon Processor 5500 series-based platform," white paper, 2009.Google Scholar
- Intel®64 and IA-32 Architectures Optimization Reference Manual, Intel, June 2011.Google Scholar
- Intel®Advanced Vector Extensions Programming Reference, Intel, June 2011.Google Scholar
- B. Jacob, S. Ng, and D. Wang, Memory Systems: Cache, DRAM, Disk, 1st ed. Morgan Kaufmann Publishers Inc., 2007. Google ScholarDigital Library
- C. Kim et al., "Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs," Proceedings of The VLDB Endowment, vol. 2, no. 2, pp. 1378-1389, 2009. Google ScholarDigital Library
- C. Kozyrakis and D. Patterson, "Vector Vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks," in Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, 2002, pp. 283-293. Google ScholarDigital Library
- Y. Lee et al., "Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators," in Proceedings of the 38th Annual International Symposium on Computer Architecture, 2011, pp. 129-140. Google ScholarDigital Library
- C. Lemuet et al., "The Potential Energy Efficiency of Vector Acceleration," in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006. Google ScholarDigital Library
- R. Martin, "A Vectorized Hash-Join," 1996, iRAM technical report, University of California at Berkeley.Google Scholar
- S. Meki and Y. Kambayashi, "Acceleration of Relational Database Operations on Vector Processors," Systems and Computers in Japan, vol. 31, no. 8, pp. 79-88, 2000.Google ScholarCross Ref
- W. Oed and M. Walker, "An Overview of Cray Research Computers including the Y-MP/C90 and the new MPP T3D," in Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, 1993, pp. 271-272. Google ScholarDigital Library
- S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-Effective Superscalar Processors," in Proceedings of the 24th Annual International Symposium on Computer Architecture, 1997, pp. 206-218. Google ScholarDigital Library
- F. Quintana et al., "Adding a Vector Unit to a Superscalar Processor," in Proceedings of the 13th International Conference on Supercomputing, 1999, pp. 1-10. Google ScholarDigital Library
- S. K. Raman, V. Pentkovski, and J. Keshava, "Implementing Streaming SIMD Extensions on the Pentium III Processor," IEEE Micro, vol. 20, no. 4, pp. 47-57, 2000. Google ScholarDigital Library
- P. Rosenfeld, E. Cooper-Balis, and B. Jacob, "DRAMSim2: A Cycle Accurate Memory System Simulator," IEEE Computer Architecture Letters, vol. 10, no. 1, pp. 16-19, 2011. Google ScholarDigital Library
- W. Schonauer, Scientific Computing on Vector Computers. Elsevier Science Publisher B.V., 1987. Google ScholarDigital Library
- L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1-18:15, 2008. Google ScholarDigital Library
- N. T. Slingerland and A. J. Smith, "Multimedia Extensions for General Purpose Microprocessors: A Survey," Microprocessors and Microsystems, vol. 29, no. 5, pp. 225-246, 2005.Google ScholarCross Ref
- J. E. Smith, G. Faanes, and R. Sugumar, "Vector Instruction Set Support for Conditional Operations," in Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 260-269. Google ScholarDigital Library
- J. Sompolski, M. Zukowski, and P. Boncz, "Vectorization vs. Compilation in Query Execution," in Proceedings of the 7th International Workshop on Data Management on New Hardware, 2011, pp. 33-40. Google ScholarDigital Library
- S. Srinivasan et al., "CMP Memory Modeling: How Much Does Accuracy Matter?" in Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, 2009, pp. 24-33.Google Scholar
- Transaction Processing Performance Council, "TPC-H Standard Specification v2.14.2," http://www.tpc.org/tpch/, 2011.Google Scholar
- T. Willhalm et al., "SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units," Proceedings of The VLDB Endowment, vol. 2, no. 1, pp. 385-394, 2009. Google ScholarDigital Library
- M. T. Yourst, "PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator," in Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 23-34.Google ScholarCross Ref
- J. Zhou and K. A. Ross, "Implementing Database Operations Using SIMD Instructions," in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2002, pp. 145-156. Google ScholarDigital Library
- M. Zukowski, "Balancing Vectorized Query Execution with Bandwidth-Optimized Storage," Ph.D. dissertation, Universiteit van Amsterdam, 2009.Google Scholar
Index Terms
- Vector Extensions for Decision Support DBMS Acceleration
Recommendations
ALP: Efficient support for all levels of parallelism for complex media applications
The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for ...
Fine-grain performance scaling of soft vector processors
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systemsEmbedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for ...
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP
Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a ...
Comments