skip to main content
Vector microprocessors
Publisher:
  • University of California, Berkeley
ISBN:978-0-591-99087-4
Order Number:AAI9901978
Pages:
278
Bibliometrics
Skip Abstract Section
Abstract

Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector supercomputer implementations. This thesis expands the body of vector research by examining designs appropriate for single-chip full-custom vector microprocessor implementations targeting a much broader range of applications.

I present the design, implementation, and evaluation of T0 (Torrent-0): the first single-chip vector microprocessor. T0 is a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle. T0 demonstrates that vector architectures are well suited to full-custom VLSI implementation and that they perform well on many multimedia and human-machine interface tasks.

The remainder of the thesis contains proposals for future vector microprocessor designs. I show that the most area-efficient vector register file designs have several banks with several ports, rather than many banks with few ports as used by traditional vector supercomputers, or one bank with many ports as used by superscalar microprocessors. To extend the range of vector processing, I propose a vector flag processing model which enables speculative vectorization of "while" loops. To improve the performance of inexpensive vector memory systems, I introduce virtual processor caches, a new form of primary vector cache which can convert some forms of strided and indexed vector accesses into unit-stride bursts.

Cited By

  1. ACM
    Şuşu A Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector Processor Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing, (1-8)
  2. ACM
    Stanic M, Palomar O, Hayes T, Ratkovic I, Cristal A, Unsal O and Valero M (2017). An Integrated Vector-Scalar Design on an In-Order ARM Core, ACM Transactions on Architecture and Code Optimization, 14:2, (1-26), Online publication date: 21-Jul-2017.
  3. ACM
    Stanic M, Palomar O, Hayes T, Ratkovic I, Unsal O, Cristal A and Valero M POSTER Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (447-448)
  4. ACM
    Ratković I, Palomar O, Stanić M, Unsal O, Cristal A and Valero M A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques Proceedings of the 2016 International Symposium on Low Power Electronics and Design, (362-367)
  5. ACM
    Stanic M, Palomar O, Hayes T, Ratkovic I, Unsal O and Cristal A Towards low-power embedded vector processor Proceedings of the ACM International Conference on Computing Frontiers, (339-342)
  6. ACM
    Stanic M, Palomar O, Ratkovic I, Duric M, Unsal O and Cristal A VALib and SimpleVector Proceedings of the 11th ACM Conference on Computing Frontiers, (1-10)
  7. Duric M, Palomar O, Smith A, Unsal O, Cristal A, Valero M and Burger D EVX Proceedings of the conference on Design, Automation & Test in Europe, (1-4)
  8. ACM
    Lee Y, Avizienis R, Bishara A, Xia R, Lockhart D, Batten C and Asanović K (2013). Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators, ACM Transactions on Computer Systems, 31:3, (1-38), Online publication date: 1-Aug-2013.
  9. ACM
    Vaidya A, Shayesteh A, Woo D, Saharoy R and Azimi M (2013). SIMD divergence optimization through intra-warp compaction, ACM SIGARCH Computer Architecture News, 41:3, (368-379), Online publication date: 26-Jun-2013.
  10. ACM
    Vaidya A, Shayesteh A, Woo D, Saharoy R and Azimi M SIMD divergence optimization through intra-warp compaction Proceedings of the 40th Annual International Symposium on Computer Architecture, (368-379)
  11. Soliman M (2013). Design, implementation, and evaluation of a low-complexity vector-core for executing scalar/vector instructions, Journal of Parallel and Distributed Computing, 73:6, (836-850), Online publication date: 1-Jun-2013.
  12. Hayes T, Palomar O, Unsal O, Cristal A and Valero M Vector Extensions for Decision Support DBMS Acceleration Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, (166-176)
  13. ACM
    Takano S (2012). Design and analysis of adaptive processor, ACM Transactions on Reconfigurable Technology and Systems, 5:1, (1-34), Online publication date: 1-Mar-2012.
  14. ACM
    Chou C, Severance A, Brant A, Liu Z, Sant S and Lemieux G VEGAS Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, (15-24)
  15. ACM
    Yiannacouras P, Steffan J and Rose J Fine-grain performance scaling of soft vector processors Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, (97-106)
  16. ACM
    Yu J, Eagleston C, Chou C, Perreault M and Lemieux G (2009). Vector Processing as a Soft Processor Accelerator, ACM Transactions on Reconfigurable Technology and Systems, 2:2, (1-34), Online publication date: 1-Jun-2009.
  17. ACM
    Yu J, Lemieux G and Eagleston C Vector processing as a soft-core CPU accelerator Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, (222-232)
  18. Chouliaras V, Dwyer V, Agha S, Nunez-Yanez J, Reisis D, Nakos K and Manolopoulos K (2008). Customization of an embedded RISC CPU with SIMD extensions for video encoding, Integration, the VLSI Journal, 41:1, (135-152), Online publication date: 1-Jan-2008.
  19. Yang H, Ziavras S and Hu J (2007). Reconfiguration support for vector operations, International Journal of High Performance Systems Architecture, 1:2, (89-97), Online publication date: 1-Oct-2007.
  20. Raghavan P, Lambrechts A, Jayapala M, Catthoor F, Verkest D and Corporaal H Very wide register Proceedings of the conference on Design, automation and test in Europe, (1066-1071)
  21. ACM
    Sasanka R, Li M, Adve S, Chen Y and Debes E (2007). ALP, ACM Transactions on Architecture and Code Optimization, 4:1, (3-es), Online publication date: 1-Mar-2007.
  22. ACM
    Hampton M and Asanović K Implementing virtual memory in a vector processor with software restart markers Proceedings of the 20th annual international conference on Supercomputing, (135-144)
  23. Oikonomakos P, Fournier J and Moore S Implementing cryptography on TFT technology for secure display applications Proceedings of the 7th IFIP WG 8.8/11.2 international conference on Smart Card Research and Advanced Applications, (32-47)
  24. Fournier J and Moore S A vector approach to cryptography implementation Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems, (277-297)
  25. ACM
    Kozyrakis C and Patterson D Overcoming the limitations of conventional vector processors Proceedings of the 30th annual international symposium on Computer architecture, (399-409)
  26. ACM
    Kozyrakis C and Patterson D (2003). Overcoming the limitations of conventional vector processors, ACM SIGARCH Computer Architecture News, 31:2, (399-409), Online publication date: 1-May-2003.
  27. Khailany B, Dally W, Rixner S, Kapasi U, Owens J and Towles B Exploring the VLSI Scalability of Stream Processors Proceedings of the 9th International Symposium on High-Performance Computer Architecture
  28. Corbal J, Espasa R and Valero M Three-dimensional memory vectorization for high bandwidth media memory systems Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, (149-160)
  29. Pajuelo A, González A and Valero M Speculative dynamic vectorization Proceedings of the 29th annual international symposium on Computer architecture, (271-280)
  30. ACM
    Pajuelo A, González A and Valero M (2002). Speculative dynamic vectorization, ACM SIGARCH Computer Architecture News, 30:2, (271-280), Online publication date: 1-May-2002.
  31. Soliman M and Sedukhin S (2002). Trident, Australian Computer Science Communications, 24:3, (91-99), Online publication date: 1-Jan-2002.
  32. Soliman M and Sedukhin S Trident Proceedings of the seventh Asia-Pacific conference on Computer systems architecture, (91-99)
  33. ACM
    Chiou D, Jain P, Rudolph L and Devadas S Application-specific memory management for embedded systems using software-controlled caches Proceedings of the 37th Annual Design Automation Conference, (416-419)
  34. Corbal J, Valero M and Espasa R Exploiting a new level of DLP in multimedia applications Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, (72-79)
  35. Lee C and Stoodley M Simple vector microprocessors for multimedia applications Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, (25-36)
Contributors
  • University of California, Berkeley
  • University of California, Berkeley

Recommendations