skip to main content
research-article
Free Access

Virtual machine showdown: Stack versus registers

Published:30 January 2008Publication History
Skip Abstract Section

Abstract

Virtual machines (VMs) enable the distribution of programs in an architecture-neutral format, which can easily be interpreted or compiled. A long-running question in the design of VMs is whether a stack architecture or register architecture can be implemented more efficiently with an interpreter. We extend existing work on comparing virtual stack and virtual register architectures in three ways. First, our translation from stack to register code and optimization are much more sophisticated. The result is that we eliminate an average of more than 46% of executed VM instructions, with the bytecode size of the register machine being only 26% larger than that of the corresponding stack one. Second, we present a fully functional virtual-register implementation of the Java virtual machine (JVM), which supports Intel, AMD64, PowerPC and Alpha processors. This register VM supports inline-threaded, direct-threaded, token-threaded, and switch dispatch. Third, we present experimental results on a range of additional optimizations such as register allocation and elimination of redundant heap loads. On the AMD64 architecture the register machine using switch dispatch achieves an average speedup of 1.48 over the corresponding stack machine. Even using the more efficient inline-threaded dispatch, the register VM achieves a speedup of 1.15 over the equivalent stack-based VM.

References

  1. Alexander, W. and Wortman, D. 1975. Static and dynamic chracteristics of XPL programs. Computer 8, 11 (Nov.), 41--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Antonioli, D. N. and Pilz, M. 1998. Analysis of the Java class file format. Tech. rep. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arnold, M., Hind, M., and Ryder, B. G. 2002. Online feedback-directed optimization of Java. In OOPSLA '02: Proceedings of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York. 111--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bell, J. R. 1973. Threaded code. Commun. ACM 16, 6, 370--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berndl, M., Vitale, B., Zaleski, M., and Brown, A. D. 2005. Context threading: A flexible and efficient dispatch technique for virtual machine interpreters. In 2005 International Symposium on Code Generation and Optimization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Briggs, P., Cooper, K. D., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems 16, 3 (May), 428--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bull, M., Smith, L., Westhead, M., Henty, D., and Davey, R. 2000. Benchmarking Java Grande applications. In Second Ineternational Conference and Exhibtion on the Practical Application of Java. Manchester, UK.Google ScholarGoogle Scholar
  8. Choi, J.-D., Grove, D., Hind, M., and Sarkar, V. 1999. Efficient and precise modeling of exceptions for the analysis of Java programs. In PASTE '99: Proceedings of the 1999 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering. ACM Press, New York. 21--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Clausen, L. R., Schultz, U. P., Consel, C., and Muller, G. 2000. Java bytecode compression for low-end embedded systems. ACM Trans. Program. Lang. Syst. 22, 3, 471--489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (Oct.), 451--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Davis, B., Beatty, A., Casey, K., Gregg, D., and Waldron, J. 2003. The case for virtual register machines. In Interpreters, Virtual Machines and Emulators (IVME '03). 41--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Diwan, A., McKinley, K. S., and Moss, J. E. B. 1998. Type-based alias analysis. In SIGPLAN Conference on Programming Language Design and Implementation. 106--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eller, H. 2005. Optimizing interpreters with superinstructions. M.S. thesis, Institut für Computersprachen, Technische Universität Wien. http://www.complang.tuwien.ac.at/Dimplomarbeiten/eller05.ps.gz.Google ScholarGoogle Scholar
  14. Ertl, M. A. 1995. Stack caching for interpreters. In SIGPLAN '95 Conference on Programming Language Design and Implementation. 315--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ertl, M. A. and Gregg, D. 2003. The structure and performance of efficient interpreters. The Journal of Instruction-Level Parallelism 5. http://www.jilp.org/vol5/.Google ScholarGoogle Scholar
  16. Ertl, M. A., Gregg, D., Krall, A., and Paysan, B. 2002. vmgen---A generator of efficient virtual machine interpreters. Software---Practice and Experience 32, 3, 265--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ertl, M. A., Thalinger, C., and Krall, A. 2006. Superinstructions and replication in the Cacao JVM interpreter. Journal of .NET Technologies 4, 25--32. Journal papers from .NET Technologies 2006 conference.Google ScholarGoogle Scholar
  18. Fink, S., Knobe, K., and Sarkar, V. 2000. Unified analysis of array and object references in strongly typed languages. Lecture Notes in Computer Science Volume 1824 (Feb.), 155--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gosling, J. 1995. Java Intermediate Bytecodes. In Proc. ACM SIGPLAN Workshop on Intermediate Representations. ACM Sigplan Notices, vol. 30:3. San Francisco, CA. 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gregg, D., Beatty, A., Casey, K., Davis, B., and Nisbet, A. 2005. The case for virtual register machines. Science of Computer Programming, Special Issue on Interpreters Virtual Machines and Emulators 57, 319--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ierusalimschy, R. L., de Figueiredo, H., and Celes, W. 1996. Lua---an extensible extension language. Software: Practice and Experience 26, 6, 635--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ierusalimschy, R., de Figueiredo, L., and Celes, W. 2005. The implementation of Lua 5.0. Journal of Universal Computer Science 11, 7, 1159--1176. http://www.jucs.org/jucs_11_7/the_implementation_of_lua.Google ScholarGoogle Scholar
  23. Kistler, T. and Franz, M. 1999. A tree-based alternative to Java byte-codes. International Journal of Parallel Programming 27, 1, 21--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Klint, P. 1981. Interpretation techniques. Software---Practice and Experience 11, 963--973.Google ScholarGoogle ScholarCross RefCross Ref
  25. Krall, A. and Grafl, R. 1997. Cacao---a 64-bit JavaVM just-in-time compiler. Concurrency---Practice and Experience 9, 11, 1017--1030.Google ScholarGoogle ScholarCross RefCross Ref
  26. Lengauer, T. and Tarjan, R. E. 1979. A fast algorithm for finding dominators in a flowgraph. ACM Trans. Program. Lang. Syst. 1, 1, 121--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. McGlashan, B. and Bower, A. 1999. The interpreter is dead (slow). Isn't it? In OOPSLA'99 Workshop: Simplicity, Performance and Portability in Virtual Machine Design.Google ScholarGoogle Scholar
  28. Mössenböck, H. 2000. Adding static single assignment form and a graph coloring register allocator to the Java Hotspot client compiler. Technical Report TR-15, Johannes Kepler University Linz Institute for Practical Computer Science, Altenbergerstra 69, A-4040 Linz.Google ScholarGoogle Scholar
  29. Muchnick, S. S. 1997. Advanced compiler design and implementation. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Myers, G. J. 1977. The case against stack-oriented instruction sets. Computer Architecture News 6, 3 (Aug.), 7--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Patterson, D. and Ditzel, D. 1980. The case for the reduced instruction set computer. Computer Architecture News 8, 6 (Oct.), 25--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Piumarta, I. and Riccardi, F. 1998. Optimizing direct threaded code by selective inlining. In SIGPLAN '98 Conference on Programming Language Design and Implementation. 291--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pugh, W. 1999. Compressing Java class files. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation. ACM Press, New York. 247--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Radhakrishnan, R., Vijaykrishnan, N., John, L. K., and Sivasubramaniam, A. 2000. Architectural issues in Java runtime systems. In Proceedings of the Sixth International Symposium on High-Performance Computer Architecture (8--12 Jan.). Toulouse, France. 387--398.Google ScholarGoogle Scholar
  35. Schulthess, P. and Mumprecht, E. 1977. Reply to the case against stack-oriented instruction sets. Computer Architecture News 6, 5 (Dec.), 24--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shi, Y., Gregg, D., Beatty, A., and Ertl, M. A. 2005. Virtual machine showdown: stack versus registers. In ACM/SIGPLAN Conference on Virtual Execution Environments. ACM Press, New York. 153--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shustek, L. 1978. Aanalysis and performance of computer instruction sets. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. SPEC. 1998. SPEC releases SPEC JVM98, first industry-standard benchmark for measuring Java virtual machine performance. Press Release. http://www.spec.org/jvm98/press.html.Google ScholarGoogle Scholar
  39. Sugalski, D. 2002. Parrot in detail. In Yet Another Perl Conference (YAPC 02). Saint Louis, Missouri. http://www.parrotcode.org/talks/ParrotInDetail2.pdf.Google ScholarGoogle Scholar
  40. Sun-Microsystems. 2001. The Java Hotspot virtual machine. Tech. rep., Sun Microsystems Inc.Google ScholarGoogle Scholar
  41. Tip, F., Sweeney, P. F., Laffra, C., Eisma, A., and Streeter, D. 2002. Practical extraction techniques for Java. ACM Trans. Program. Lang. Syst. 24, 6, 625--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vallée-Rai, R., Hendren, L., Sundaresan, V., Lam, P., Gagnon, E., and Co, P. 1999. Soot---a Java optimization framework. In Proceedings of CASCON 1999. 125--135.Google ScholarGoogle Scholar
  43. Vitale, B. and Abdelrahman, T. S. 2004. Catenation and specialization for Tcl virtual machine performance. In IVME '04: Proceedings of the 2004 workshop on Interpreters, Virtual Machines and Emulators. ACM Press, New York. 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Wiecek, C. 1982. A case study of the VAX 11 instruction set usage for compiler execution. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operation Systems. IEEE/ACM, Palo Alto, California. 177--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Winterbottom, P. and Pike, R. 1997. The design of the Inferno virtual machine. In IEEE Compcon 97 Proceedings. San Jose, California. 241--244. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Virtual machine showdown: Stack versus registers

    Recommendations

    Reviews

    Charles Robert Morgan

    Programming language interpreters represent the executing program as an abstract machine. The major representation for Java is an abstract stack machine called Java bytecodes. The alternative representation is an abstract register machine. This paper is a thorough comparison of the benefits of each representation. It comes to the conclusion that the stack machine is more space efficient, while the register machine provides faster interpretation. The paper is remarkable. This subject is usually hotly contested, frequently with emotional zeal. The authors implement both representations and perform a careful analysis, taking care to make the comparison as honest as possible. The abstract stack machine contains instructions that load operands onto an execution stack. The computational and comparison operations take values from the top of the stack and return results on top of the stack. This representation is space efficient, since the operands are not encoded in the individual instructions; however, the representation uses more instructions, since values must be loaded onto the stack before use and redundant expressions elimination is less effective. The register machine looks much like a modern processor, with each instruction taking register numbers to indicate the registers holding the operands. Each instruction is larger than in the stack machine; however, there are fewer instructions, since most of the load operations can be eliminated. The difference in interpreter performance can be attributed to the cost of dispatching the instructions for execution and the retrieval of operands from the stack. The paper analyzes a number of different techniques for dispatching instructions, ranging from using a C-style switch statement to machine-dependent techniques that minimize the cost of dispatch. The C-style switch statement is the most costly and gives the register machine the biggest advantage. As the dispatch mechanisms become more efficient, the performance advantage for the register machine decreases. This is an excellent example of performing a difficult comparison. Shi et al. anticipate the questions that the reader might raise and carefully study them. I expect that some of the results surprised the authors themselves. The paper is well worth reading, both for the information and the comparison methodology. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 4, Issue 4
      January 2008
      187 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/1328195
      Issue’s Table of Contents

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 January 2008
      • Accepted: 1 April 2007
      • Revised: 1 February 2007
      • Received: 1 August 2006
      Published in taco Volume 4, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader