ABSTRACT
The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of software complexity, with a view to allowing more flexible memory system design. This paper investigates some issues in choosing between RAMpage and a conventionalcache architecture, with a view to illustrating trade-offs which can be made in choosing whether to place complexity in the memory system in hardware or in software. Performance results in this paper are based on a simple Rambus implementation of DRAM, with performance characteristics of Direct Rambus, which should be available in 1999. This paper explores the conditions under which it becomes feasible to perform a context switch on a miss in the RAMpage model, and the conditions under which RAMpage is a win over a conventional cache architecture: as the CPU-DRAM speed gap grows, RAMpage becomes more viable.
- AP93.A. Agarwal and S.D. Pudar. Column associative caches: A technique for reducing the miss rate of direct mapped caches. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 179-190, May 1993. Google ScholarDigital Library
- BA97.D. Burger and T M. Austin. The SimpleScalar Tool Set. Version 2.0, Tech. Report No. 1342, Computer Sciences Department, University of Wisconsin- Madison, June 1997. ftp://ftp, cs .wisc. edu/galileo/ dburger/papers / TR_13 4 2 . ps.Google ScholarDigital Library
- BCZ90.J.K. Bennet, J.B. Carter, and W. Zwaenepoel. Adaptive software cache management for distributed shared memory architectures. In Proc. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 125- 134, Seattle, WA, May 1990. Google ScholarDigital Library
- BD94.K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59-67, August 1994. Google ScholarDigital Library
- BK96.S Belayneh and D.R. Kaeli. A discussion of nonblocking/lockup-free caches. Computer Architecture News, 24(3):18-25, June 1996. Google ScholarDigital Library
- BLRC94.B.N. Bershad, D. Lee, T.H. Romer, and J.B. Chen. Avoiding conflict misses dynamically in large directmapped caches. In Proc. 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-6), pages 158-170, October 1994. Google ScholarDigital Library
- CB92.T. Chen and J. Baer. Reducing memory latency via non-blocking and prefetching caches. In Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 51-61, September 1992. Google ScholarDigital Library
- CGBG88.D.R. Cheriton, A. Gupta, P.D. Boyle, and H.A Goosen. The VMP multiprocessor: Initial experience, refinements and performance evaluation. In Proc. 15th Int. Syrup. on ComputerArchitecture (ISCA '88), pages 410-421, Honolulu, May/June 1988. Google ScholarDigital Library
- CGHM93.D.R. Cheriton, H.A. Goosen, H. Holbrook, and P. Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: The value of distributed synchronization. In Proc. 7th Workshop on Parallel and Distributed Simulation, pages 159-162, San Diego, May 1993. Google ScholarDigital Library
- CGM91.D.R. Cheriton, H.A. Goosen, and P Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memorymultiprocessor: A first experience. In Proc. Int. Symp. on SharedMemory Multiprocessing, pages 109-118, Tokyo, April 1991.Google Scholar
- Che95.T-F. Chen. An effective programmable prefetch engine for on-chip caches. In Proc. 28th Int. Symp. on Microarchitecture (MICRO-28), pages 237-242, Ann Arbor, MI, 29 November- 1 December 1995. Google ScholarDigital Library
- Cri97.Richard Crisp. Direct Rambus tecnology: The new main memory standard. IEEE Micro, 17(6):18-28, November/December 1997. Google ScholarDigital Library
- Cro97.C. Crowley. Operating Systems: A Design-Oriented Approach. Irwin Publishing, 1997. Google ScholarDigital Library
- CSB86.D.R. Cheriton, G. Slavenburg, and P. Boyle. Softwarecontrolled caches in the VMP multiprocessor. In Proc. 13th Int. Syrup. on Computer Architecture (ISCA '86), pages 366-374, Tokyo, June 1986. Google ScholarDigital Library
- Dul98.C. Dulong. The IA-64 architecture at work. Computer, 31(7):24-32, July 1998. Google ScholarDigital Library
- Fat90.R.A. Fatoohi. Vector performance analysis of the NEC SX-2. In Proc. Int. Conf. on Supercomputing, pages 389-400, 1990. Google ScholarDigital Library
- Han98.J. Handy. The Cache Memory Book. Academic Press, San Diego, CA, 2nd edition, 1998. Google ScholarDigital Library
- HH93.J. Huck and J. Hays. Architectural support for translation table management in large address space machines. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 39-50, San Diego, CA, May 1993. Google ScholarDigital Library
- HKT93.Y. Hidaka, H. Koike, and H Tanaka. Multiple threads in cyclic register windows. In Proc. 20th Annual Int. Symp. on Computer architecture (ISCA '93), pages 131-142, San Diego, CA, May 1993. Google ScholarDigital Library
- HP96.J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kauffmann, San Francisco, CA, 2nd edition, 1996. Google ScholarDigital Library
- IBM97.IBM. Synchronous DRAMs' The DRAM of the Future. http' //www. chips, ibm. com/products / memory / sdramart / sdramart, html, 1997.Google Scholar
- IBM98.IBM. PowerPC 750 RISC Microprocessor Technical Summary. http: //www. chips, ibm. corn/ products/ppc / do cumen ts / da tashe et s / 750/750_TS_R%0 .pdf, January 1998.Google Scholar
- IKWS92.J. Inouye, R. Konuru, J. Walpole, and B. Sears. The Effects of Virtually Addressed Caches on Virtual Memory Design and Performance. Tech. Report No. CS/E 92-010, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Engineering, March 1992.Google ScholarDigital Library
- Int98.Intel. Pentium H Processor Product Overview. http: //developer. intel, com/design/ PentiumII/prodbref/index.htm, 1998.Google Scholar
- JM97.B. Jacob and T. Mudge. Software-managed address translation. In Prec. Third Int. Symp. on High- Performance Computer Architecture, San Antonio, Texas, February 1997. Google ScholarDigital Library
- Jou90.N.P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Prec. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 364-373, May 1990. Google ScholarDigital Library
- KE97.D.R. Kaeli and P.G. Emma. Improving the accuracy of history-based branch prediction. IEEE Transactions on Computers, 46(4):469-472, April 1997. Google ScholarDigital Library
- KELS62.T Kilburn, D.B.J. Edwards, M.J. Lanigan, and F.H. Sumner. One-level storage system. IRE Transactions on Electronic Computers, EC-11(2):223-35, April 1962.Google ScholarCross Ref
- KH92a.G. Kane and J. Heinrich. MIPS RISC Architecture. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarDigital Library
- KH92b.R.E. Kessler and M.D. Hill. Page placement algorithms for large real-indexed caches. A CM Transactions on Computer Systems, 10(4):338-359, November 1992. Google ScholarDigital Library
- KK97.A. Ki and A. E. Knowles. Adaptive data prefetching using cache information. In Prec. 1997 Int. Conf. on Supercomputing, pages 204-212, Vienna, 1997. Google ScholarDigital Library
- Kro81.D. Kroft. Lockup-free instruction fetch/prefetch cache organisation. In Prec. 8th Int. Symp. on Computer Architecture (ISCA '81), pages 81-84, May 1981. Google ScholarDigital Library
- LEL+97.J.L. Lo, J.S. Emer, H.M. Levy, R.L. Stamm, and D.M. Tullsen. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems, 15(3):322-354, August 1997. Google ScholarDigital Library
- Mac96.P. Machanick. The case for SRAM main memory. ComputerArchitecture News, 24(5):23-30, December 1996. Google ScholarDigital Library
- MLG92.TC. Mowry, M.S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 62-73, September 1992. Google ScholarDigital Library
- MS98.P. Machanick and P. Salverda. Preliminary investigation of the RAMpage memory hierarchy. South African Computer Journal, 1998. In press. http://www, cs .wits .ac. za/~philip/ papers / rampage, html.Google Scholar
- NUS+93.D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software-managed TLBs. In Prec. 20th Int. Symp. on ComputerArchitecture (ISCA '93), pages 27-38, San Diego, CA, May 1993. Google ScholarDigital Library
- RHWG95.M. Rosenblum, S.A. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel and Distributed Technology, 3(4):34-43, Winter 1995. Google ScholarDigital Library
- RL92.A. Rogers and K. Li. Software support for speculative loads. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 38-50, September 1992. Google ScholarDigital Library
- SWL+92.M.L. Simmons, H.J. Wasserman, O.A. Lubeck, C. Eoyang, R Mendez, H Harada, and M Ishigum. A performance comparison of four supercomputers. Comm. ACM, 35(8):116-124, August 1992. Google ScholarDigital Library
- WB92.B. Wheeler and B.N. Bershad. Consistency management for virtually indexed caches. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 124-136, September 1992. Google ScholarDigital Library
- WW93.C.A. Waldspurger and W.E. Weihl. Register relocation: flexible contexts for multithreading. In Prec. 20th Annual Int. Syrup. on Computer architecture (ISCA '93), pages 120-130, San Diego, CA, May 1993. Google ScholarDigital Library
Index Terms
- Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Recommendations
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of ...
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of ...
Understanding the trade-offs in multi-level cell ReRAM memory design
DAC '13: Proceedings of the 50th Annual Design Automation ConferenceResistive Random Access Memory (ReRAM) is one of the most promising emerging memory technologies as a potential replacement for DRAM memory and/or NAND Flash. Multi-level cell (MLC) ReRAM, which can store multiple bits in a single ReRAM cell, can ...
Comments