Article

Vector Extensions for Decision Support DBMS Acceleration

MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2012Pages 166–176https://doi.org/10.1109/MICRO.2012.24

Published:01 December 2012Publication History

MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 166–176

ABSTRACT

Database management systems (DBMS) have become an essential tool for industry and research and are often a significant component of data centres. As a result of this criticality, efficient execution of DBMS engines has become an important area of investigation. This work takes a top-down approach to accelerating decision support systems (DSS) on x86-64 microprocessors using vector ISA extensions. In the first step, a leading DSS DBMS is analysed for potential data-level parallelism. We discuss why the existing multimedia SIMD extensions (SSE/AVX) are not suitable for capturing this parallelism and propose a complementary instruction set reminiscent of classical vector architectures. The instruction set is implemented using unintrusive modifications to a modern x86-64 micro architecture tailored for DSS DBMS. The ISA and micro architecture are evaluated using a cycle-accurate x86-64 micro architectural simulator coupled with a highly-detailed memory simulator. We have found a single operator is responsible for 41% of total execution time for the TPC-H DSS benchmark. Our results show performance speedups between 1.94x and 4.56x for an implementation of this operator run with our proposed hardware modifications.

References

M. Abrash, "A First Look at the Larrabee New Instructions (LRBni)," http://drdobbs.com/high-performance-computing/216402188, 2009, accessed on 2011-09-08.Google Scholar
D. Abts et al., "The Cray BlackWidow: A Highly Scalable Vector Multiprocessor," in Proceedings of the ACM/IEEE Conference on Supercomputing, 2007, pp. 17:1-17:12. Google ScholarDigital Library
Actian, "Vectorwise. Record Breaking Action Engine for Big Data," http://www.actian.com/products/vectorwise.Google Scholar
K. Asanovic, "Vector Microprocessors," Ph.D. dissertation, EECS Department, University of California, Berkeley, 1998. Google ScholarDigital Library
L. A. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," in Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998, pp. 3-14. Google ScholarDigital Library
P. A. Boncz, S. Manegold, and M. L. Kersten, "Database Architecture Optimized for the New Bottleneck: Memory Access," in Proceedings of the 25th International Conference on Very Large Data Bases, 1999, pp. 54-65. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes, "MonetDB/X100: Hyper-Pipelining Query Execution," in Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research, 2005, pp. 225-237.Google Scholar
S. Chen et al., "Improving Hash Join Performance through Prefetching," in Proceedings of the 20th International Conference on Data Engineering, 2004, pp. 116-127. Google ScholarDigital Library
G. P. Copeland and S. N. Khoshafian, "A Decomposition Storage Model," in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1985, pp. 268-279. Google ScholarDigital Library
J. Corbal, M. Valero, and R. Espasa, "Exploiting a New Level of DLP in Multimedia Applications," in Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, 1999, pp. 72-79. Google ScholarDigital Library
R. Espasa et al., "Tarantula: A Vector Extension to the Alpha Architecture," in Proceedings of the 29th Annual International Symposium on Computer Architecture, 2002, pp. 281-292. Google ScholarDigital Library
R. Espasa, M. Valero, and J. E. Smith, "Out-of-Order Vector Architectures," in Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 1997, pp. 160-170. Google ScholarDigital Library
R. Espasa, M. Valero, and J. E. Smith, "Vector Architectures: Past, Present and Future," in Proceedings of the 12th International Conference on Supercomputing, 1998, pp. 425-432. Google ScholarDigital Library
M. Gschwind et al., "Synergistic Processing in Cell's Multicore Architecture," IEEE Micro, vol. 26, no. 2, pp. 10-24, 2006. Google ScholarDigital Library
B. He et al., "Relational Query Coprocessing on Graphics Processors," ACM Transactions on Database Systems, vol. 34, no. 4, pp. 21:1-21:39, 2009. Google ScholarDigital Library
S. Héman et al., "Vectorized Data Processing on the Cell Broadband Engine," in Proceedings of the 3rd International Workshop on Data Management on New Hardware, 2007, pp. 4:1-4:6. Google ScholarDigital Library
U. Hoelzle and L. A. Barroso, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 1st ed. Morgan and Claypool Publishers, 2009. Google ScholarDigital Library
Ingres, "Ingres/VectorWise sneak Preview on the Intel Xeon Processor 5500 series-based platform," white paper, 2009.Google Scholar
Intel®64 and IA-32 Architectures Optimization Reference Manual, Intel, June 2011.Google Scholar
Intel®Advanced Vector Extensions Programming Reference, Intel, June 2011.Google Scholar
B. Jacob, S. Ng, and D. Wang, Memory Systems: Cache, DRAM, Disk, 1st ed. Morgan Kaufmann Publishers Inc., 2007. Google ScholarDigital Library
C. Kim et al., "Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs," Proceedings of The VLDB Endowment, vol. 2, no. 2, pp. 1378-1389, 2009. Google ScholarDigital Library
C. Kozyrakis and D. Patterson, "Vector Vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks," in Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, 2002, pp. 283-293. Google ScholarDigital Library
Y. Lee et al., "Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators," in Proceedings of the 38th Annual International Symposium on Computer Architecture, 2011, pp. 129-140. Google ScholarDigital Library
C. Lemuet et al., "The Potential Energy Efficiency of Vector Acceleration," in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 2006. Google ScholarDigital Library
R. Martin, "A Vectorized Hash-Join," 1996, iRAM technical report, University of California at Berkeley.Google Scholar
S. Meki and Y. Kambayashi, "Acceleration of Relational Database Operations on Vector Processors," Systems and Computers in Japan, vol. 31, no. 8, pp. 79-88, 2000.Google ScholarCross Ref
W. Oed and M. Walker, "An Overview of Cray Research Computers including the Y-MP/C90 and the new MPP T3D," in Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, 1993, pp. 271-272. Google ScholarDigital Library
S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-Effective Superscalar Processors," in Proceedings of the 24th Annual International Symposium on Computer Architecture, 1997, pp. 206-218. Google ScholarDigital Library
F. Quintana et al., "Adding a Vector Unit to a Superscalar Processor," in Proceedings of the 13th International Conference on Supercomputing, 1999, pp. 1-10. Google ScholarDigital Library
S. K. Raman, V. Pentkovski, and J. Keshava, "Implementing Streaming SIMD Extensions on the Pentium III Processor," IEEE Micro, vol. 20, no. 4, pp. 47-57, 2000. Google ScholarDigital Library
P. Rosenfeld, E. Cooper-Balis, and B. Jacob, "DRAMSim2: A Cycle Accurate Memory System Simulator," IEEE Computer Architecture Letters, vol. 10, no. 1, pp. 16-19, 2011. Google ScholarDigital Library
W. Schonauer, Scientific Computing on Vector Computers. Elsevier Science Publisher B.V., 1987. Google ScholarDigital Library
L. Seiler et al., "Larrabee: A Many-Core x86 Architecture for Visual Computing," ACM Transactions on Graphics, vol. 27, no. 3, pp. 18:1-18:15, 2008. Google ScholarDigital Library
N. T. Slingerland and A. J. Smith, "Multimedia Extensions for General Purpose Microprocessors: A Survey," Microprocessors and Microsystems, vol. 29, no. 5, pp. 225-246, 2005.Google ScholarCross Ref
J. E. Smith, G. Faanes, and R. Sugumar, "Vector Instruction Set Support for Conditional Operations," in Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000, pp. 260-269. Google ScholarDigital Library
J. Sompolski, M. Zukowski, and P. Boncz, "Vectorization vs. Compilation in Query Execution," in Proceedings of the 7th International Workshop on Data Management on New Hardware, 2011, pp. 33-40. Google ScholarDigital Library
S. Srinivasan et al., "CMP Memory Modeling: How Much Does Accuracy Matter?" in Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, 2009, pp. 24-33.Google Scholar
Transaction Processing Performance Council, "TPC-H Standard Specification v2.14.2," http://www.tpc.org/tpch/, 2011.Google Scholar
T. Willhalm et al., "SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units," Proceedings of The VLDB Endowment, vol. 2, no. 1, pp. 385-394, 2009. Google ScholarDigital Library
M. T. Yourst, "PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator," in Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2007, pp. 23-34.Google ScholarCross Ref
J. Zhou and K. A. Ross, "Implementing Database Operations Using SIMD Instructions," in Proceedings of the ACM SIGMOD International Conference on Management of Data, 2002, pp. 145-156. Google ScholarDigital Library
M. Zukowski, "Balancing Vectorized Query Execution with Bandwidth-Optimized Storage," Ph.D. dissertation, Universiteit van Amsterdam, 2009.Google Scholar

Index Terms

Vector Extensions for Decision Support DBMS Acceleration

Recommendations

ALP: Efficient support for all levels of parallelism for complex media applications

The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for ...
Read More
Fine-grain performance scaling of soft vector processors
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems

Embedded systems are often implemented on FPGA devices and 25% of the time include a soft processor--a processor built using the FPGA reprogrammable fabric. Because of their prevalence and flexibility, soft processors are compelling targets for ...
Read More
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Multimedia applications have become increasingly important in daily computing. These applications are composed of heterogeneous regions of code mixed with data-level parallelism (DLP) and instruction-level parallelism (ILP). A standard solution for a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
December 2012
487 pages
ISBN:9780769549248
Sponsors
In-Cooperation
Publisher
IEEE Computer Society
United States
Publication History
- Published: 1 December 2012
Check for updates
Author Tags
database
dbms
decision
dlp
microarchitecture
parallelism
simd
support
vector
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 214
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Vector Extensions for Decision Support DBMS Acceleration

MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

ALP: Efficient support for all levels of parallelism for complex media applications

Fine-grain performance scaling of soft vector processors

Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Vector Extensions for Decision Support DBMS Acceleration

MICRO-45: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

ALP: Efficient support for all levels of parallelism for complex media applications

Fine-grain performance scaling of soft vector processors

Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media