skip to main content
10.1145/291069.291032acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free Access

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Authors Info & Claims
Published:01 October 1998Publication History

ABSTRACT

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of software complexity, with a view to allowing more flexible memory system design. This paper investigates some issues in choosing between RAMpage and a conventionalcache architecture, with a view to illustrating trade-offs which can be made in choosing whether to place complexity in the memory system in hardware or in software. Performance results in this paper are based on a simple Rambus implementation of DRAM, with performance characteristics of Direct Rambus, which should be available in 1999. This paper explores the conditions under which it becomes feasible to perform a context switch on a miss in the RAMpage model, and the conditions under which RAMpage is a win over a conventional cache architecture: as the CPU-DRAM speed gap grows, RAMpage becomes more viable.

References

  1. AP93.A. Agarwal and S.D. Pudar. Column associative caches: A technique for reducing the miss rate of direct mapped caches. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 179-190, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. BA97.D. Burger and T M. Austin. The SimpleScalar Tool Set. Version 2.0, Tech. Report No. 1342, Computer Sciences Department, University of Wisconsin- Madison, June 1997. ftp://ftp, cs .wisc. edu/galileo/ dburger/papers / TR_13 4 2 . ps.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BCZ90.J.K. Bennet, J.B. Carter, and W. Zwaenepoel. Adaptive software cache management for distributed shared memory architectures. In Proc. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 125- 134, Seattle, WA, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BD94.K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59-67, August 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BK96.S Belayneh and D.R. Kaeli. A discussion of nonblocking/lockup-free caches. Computer Architecture News, 24(3):18-25, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BLRC94.B.N. Bershad, D. Lee, T.H. Romer, and J.B. Chen. Avoiding conflict misses dynamically in large directmapped caches. In Proc. 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-6), pages 158-170, October 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CB92.T. Chen and J. Baer. Reducing memory latency via non-blocking and prefetching caches. In Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 51-61, September 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CGBG88.D.R. Cheriton, A. Gupta, P.D. Boyle, and H.A Goosen. The VMP multiprocessor: Initial experience, refinements and performance evaluation. In Proc. 15th Int. Syrup. on ComputerArchitecture (ISCA '88), pages 410-421, Honolulu, May/June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CGHM93.D.R. Cheriton, H.A. Goosen, H. Holbrook, and P. Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: The value of distributed synchronization. In Proc. 7th Workshop on Parallel and Distributed Simulation, pages 159-162, San Diego, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CGM91.D.R. Cheriton, H.A. Goosen, and P Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memorymultiprocessor: A first experience. In Proc. Int. Symp. on SharedMemory Multiprocessing, pages 109-118, Tokyo, April 1991.Google ScholarGoogle Scholar
  11. Che95.T-F. Chen. An effective programmable prefetch engine for on-chip caches. In Proc. 28th Int. Symp. on Microarchitecture (MICRO-28), pages 237-242, Ann Arbor, MI, 29 November- 1 December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cri97.Richard Crisp. Direct Rambus tecnology: The new main memory standard. IEEE Micro, 17(6):18-28, November/December 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cro97.C. Crowley. Operating Systems: A Design-Oriented Approach. Irwin Publishing, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CSB86.D.R. Cheriton, G. Slavenburg, and P. Boyle. Softwarecontrolled caches in the VMP multiprocessor. In Proc. 13th Int. Syrup. on Computer Architecture (ISCA '86), pages 366-374, Tokyo, June 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dul98.C. Dulong. The IA-64 architecture at work. Computer, 31(7):24-32, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fat90.R.A. Fatoohi. Vector performance analysis of the NEC SX-2. In Proc. Int. Conf. on Supercomputing, pages 389-400, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Han98.J. Handy. The Cache Memory Book. Academic Press, San Diego, CA, 2nd edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. HH93.J. Huck and J. Hays. Architectural support for translation table management in large address space machines. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 39-50, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. HKT93.Y. Hidaka, H. Koike, and H Tanaka. Multiple threads in cyclic register windows. In Proc. 20th Annual Int. Symp. on Computer architecture (ISCA '93), pages 131-142, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. HP96.J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kauffmann, San Francisco, CA, 2nd edition, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. IBM97.IBM. Synchronous DRAMs' The DRAM of the Future. http' //www. chips, ibm. com/products / memory / sdramart / sdramart, html, 1997.Google ScholarGoogle Scholar
  22. IBM98.IBM. PowerPC 750 RISC Microprocessor Technical Summary. http: //www. chips, ibm. corn/ products/ppc / do cumen ts / da tashe et s / 750/750_TS_R%0 .pdf, January 1998.Google ScholarGoogle Scholar
  23. IKWS92.J. Inouye, R. Konuru, J. Walpole, and B. Sears. The Effects of Virtually Addressed Caches on Virtual Memory Design and Performance. Tech. Report No. CS/E 92-010, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Engineering, March 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Int98.Intel. Pentium H Processor Product Overview. http: //developer. intel, com/design/ PentiumII/prodbref/index.htm, 1998.Google ScholarGoogle Scholar
  25. JM97.B. Jacob and T. Mudge. Software-managed address translation. In Prec. Third Int. Symp. on High- Performance Computer Architecture, San Antonio, Texas, February 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jou90.N.P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Prec. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 364-373, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. KE97.D.R. Kaeli and P.G. Emma. Improving the accuracy of history-based branch prediction. IEEE Transactions on Computers, 46(4):469-472, April 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. KELS62.T Kilburn, D.B.J. Edwards, M.J. Lanigan, and F.H. Sumner. One-level storage system. IRE Transactions on Electronic Computers, EC-11(2):223-35, April 1962.Google ScholarGoogle ScholarCross RefCross Ref
  29. KH92a.G. Kane and J. Heinrich. MIPS RISC Architecture. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. KH92b.R.E. Kessler and M.D. Hill. Page placement algorithms for large real-indexed caches. A CM Transactions on Computer Systems, 10(4):338-359, November 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. KK97.A. Ki and A. E. Knowles. Adaptive data prefetching using cache information. In Prec. 1997 Int. Conf. on Supercomputing, pages 204-212, Vienna, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kro81.D. Kroft. Lockup-free instruction fetch/prefetch cache organisation. In Prec. 8th Int. Symp. on Computer Architecture (ISCA '81), pages 81-84, May 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. LEL+97.J.L. Lo, J.S. Emer, H.M. Levy, R.L. Stamm, and D.M. Tullsen. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems, 15(3):322-354, August 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mac96.P. Machanick. The case for SRAM main memory. ComputerArchitecture News, 24(5):23-30, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. MLG92.TC. Mowry, M.S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 62-73, September 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. MS98.P. Machanick and P. Salverda. Preliminary investigation of the RAMpage memory hierarchy. South African Computer Journal, 1998. In press. http://www, cs .wits .ac. za/~philip/ papers / rampage, html.Google ScholarGoogle Scholar
  37. NUS+93.D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software-managed TLBs. In Prec. 20th Int. Symp. on ComputerArchitecture (ISCA '93), pages 27-38, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. RHWG95.M. Rosenblum, S.A. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel and Distributed Technology, 3(4):34-43, Winter 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. RL92.A. Rogers and K. Li. Software support for speculative loads. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 38-50, September 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. SWL+92.M.L. Simmons, H.J. Wasserman, O.A. Lubeck, C. Eoyang, R Mendez, H Harada, and M Ishigum. A performance comparison of four supercomputers. Comm. ACM, 35(8):116-124, August 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. WB92.B. Wheeler and B.N. Bershad. Consistency management for virtually indexed caches. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 124-136, September 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. WW93.C.A. Waldspurger and W.E. Weihl. Register relocation: flexible contexts for multithreading. In Prec. 20th Annual Int. Syrup. on Computer architecture (ISCA '93), pages 120-130, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
                October 1998
                326 pages
                ISBN:1581131070
                DOI:10.1145/291069

                Copyright © 1998 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 October 1998

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                ASPLOS VIII Paper Acceptance Rate28of123submissions,23%Overall Acceptance Rate535of2,713submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader