Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Authors:
Philip Machanick

Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa

Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa
View Profile

,
Pierre Salverda

Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa

Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa
View Profile

,
Lance Pompe

Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa

Department of Computer Science, University of the Witwatersrand, Private Bag 3, 2050 Wits, South Africa
View Profile

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systemsOctober 1998Pages 105–114https://doi.org/10.1145/291069.291032

Published:01 October 1998Publication History

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

Pages 105–114

ABSTRACT

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of software complexity, with a view to allowing more flexible memory system design. This paper investigates some issues in choosing between RAMpage and a conventionalcache architecture, with a view to illustrating trade-offs which can be made in choosing whether to place complexity in the memory system in hardware or in software. Performance results in this paper are based on a simple Rambus implementation of DRAM, with performance characteristics of Direct Rambus, which should be available in 1999. This paper explores the conditions under which it becomes feasible to perform a context switch on a miss in the RAMpage model, and the conditions under which RAMpage is a win over a conventional cache architecture: as the CPU-DRAM speed gap grows, RAMpage becomes more viable.

References

AP93.A. Agarwal and S.D. Pudar. Column associative caches: A technique for reducing the miss rate of direct mapped caches. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 179-190, May 1993. Google ScholarDigital Library
BA97.D. Burger and T M. Austin. The SimpleScalar Tool Set. Version 2.0, Tech. Report No. 1342, Computer Sciences Department, University of Wisconsin- Madison, June 1997. ftp://ftp, cs .wisc. edu/galileo/ dburger/papers / TR_13 4 2 . ps.Google ScholarDigital Library
BCZ90.J.K. Bennet, J.B. Carter, and W. Zwaenepoel. Adaptive software cache management for distributed shared memory architectures. In Proc. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 125- 134, Seattle, WA, May 1990. Google ScholarDigital Library
BD94.K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59-67, August 1994. Google ScholarDigital Library
BK96.S Belayneh and D.R. Kaeli. A discussion of nonblocking/lockup-free caches. Computer Architecture News, 24(3):18-25, June 1996. Google ScholarDigital Library
BLRC94.B.N. Bershad, D. Lee, T.H. Romer, and J.B. Chen. Avoiding conflict misses dynamically in large directmapped caches. In Proc. 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-6), pages 158-170, October 1994. Google ScholarDigital Library
CB92.T. Chen and J. Baer. Reducing memory latency via non-blocking and prefetching caches. In Proc. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 51-61, September 1992. Google ScholarDigital Library
CGBG88.D.R. Cheriton, A. Gupta, P.D. Boyle, and H.A Goosen. The VMP multiprocessor: Initial experience, refinements and performance evaluation. In Proc. 15th Int. Syrup. on ComputerArchitecture (ISCA '88), pages 410-421, Honolulu, May/June 1988. Google ScholarDigital Library
CGHM93.D.R. Cheriton, H.A. Goosen, H. Holbrook, and P. Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: The value of distributed synchronization. In Proc. 7th Workshop on Parallel and Distributed Simulation, pages 159-162, San Diego, May 1993. Google ScholarDigital Library
CGM91.D.R. Cheriton, H.A. Goosen, and P Machanick. Restructuring a parallel simulation to improve cache behavior in a shared-memorymultiprocessor: A first experience. In Proc. Int. Symp. on SharedMemory Multiprocessing, pages 109-118, Tokyo, April 1991.Google Scholar
Che95.T-F. Chen. An effective programmable prefetch engine for on-chip caches. In Proc. 28th Int. Symp. on Microarchitecture (MICRO-28), pages 237-242, Ann Arbor, MI, 29 November- 1 December 1995. Google ScholarDigital Library
Cri97.Richard Crisp. Direct Rambus tecnology: The new main memory standard. IEEE Micro, 17(6):18-28, November/December 1997. Google ScholarDigital Library
Cro97.C. Crowley. Operating Systems: A Design-Oriented Approach. Irwin Publishing, 1997. Google ScholarDigital Library
CSB86.D.R. Cheriton, G. Slavenburg, and P. Boyle. Softwarecontrolled caches in the VMP multiprocessor. In Proc. 13th Int. Syrup. on Computer Architecture (ISCA '86), pages 366-374, Tokyo, June 1986. Google ScholarDigital Library
Dul98.C. Dulong. The IA-64 architecture at work. Computer, 31(7):24-32, July 1998. Google ScholarDigital Library
Fat90.R.A. Fatoohi. Vector performance analysis of the NEC SX-2. In Proc. Int. Conf. on Supercomputing, pages 389-400, 1990. Google ScholarDigital Library
Han98.J. Handy. The Cache Memory Book. Academic Press, San Diego, CA, 2nd edition, 1998. Google ScholarDigital Library
HH93.J. Huck and J. Hays. Architectural support for translation table management in large address space machines. In Proc. 20th Int. Syrup. on Computer Architecture (ISCA '93), pages 39-50, San Diego, CA, May 1993. Google ScholarDigital Library
HKT93.Y. Hidaka, H. Koike, and H Tanaka. Multiple threads in cyclic register windows. In Proc. 20th Annual Int. Symp. on Computer architecture (ISCA '93), pages 131-142, San Diego, CA, May 1993. Google ScholarDigital Library
HP96.J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kauffmann, San Francisco, CA, 2nd edition, 1996. Google ScholarDigital Library
IBM97.IBM. Synchronous DRAMs' The DRAM of the Future. http' //www. chips, ibm. com/products / memory / sdramart / sdramart, html, 1997.Google Scholar
IBM98.IBM. PowerPC 750 RISC Microprocessor Technical Summary. http: //www. chips, ibm. corn/ products/ppc / do cumen ts / da tashe et s / 750/750_TS_R%0 .pdf, January 1998.Google Scholar
IKWS92.J. Inouye, R. Konuru, J. Walpole, and B. Sears. The Effects of Virtually Addressed Caches on Virtual Memory Design and Performance. Tech. Report No. CS/E 92-010, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Engineering, March 1992.Google ScholarDigital Library
Int98.Intel. Pentium H Processor Product Overview. http: //developer. intel, com/design/ PentiumII/prodbref/index.htm, 1998.Google Scholar
JM97.B. Jacob and T. Mudge. Software-managed address translation. In Prec. Third Int. Symp. on High- Performance Computer Architecture, San Antonio, Texas, February 1997. Google ScholarDigital Library
Jou90.N.P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Prec. 17th Int. Symp. on Computer Architecture (ISCA '90), pages 364-373, May 1990. Google ScholarDigital Library
KE97.D.R. Kaeli and P.G. Emma. Improving the accuracy of history-based branch prediction. IEEE Transactions on Computers, 46(4):469-472, April 1997. Google ScholarDigital Library
KELS62.T Kilburn, D.B.J. Edwards, M.J. Lanigan, and F.H. Sumner. One-level storage system. IRE Transactions on Electronic Computers, EC-11(2):223-35, April 1962.Google ScholarCross Ref
KH92a.G. Kane and J. Heinrich. MIPS RISC Architecture. Prentice Hall, Englewood Cliffs, NJ, 1992. Google ScholarDigital Library
KH92b.R.E. Kessler and M.D. Hill. Page placement algorithms for large real-indexed caches. A CM Transactions on Computer Systems, 10(4):338-359, November 1992. Google ScholarDigital Library
KK97.A. Ki and A. E. Knowles. Adaptive data prefetching using cache information. In Prec. 1997 Int. Conf. on Supercomputing, pages 204-212, Vienna, 1997. Google ScholarDigital Library
Kro81.D. Kroft. Lockup-free instruction fetch/prefetch cache organisation. In Prec. 8th Int. Symp. on Computer Architecture (ISCA '81), pages 81-84, May 1981. Google ScholarDigital Library
LEL+97.J.L. Lo, J.S. Emer, H.M. Levy, R.L. Stamm, and D.M. Tullsen. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems, 15(3):322-354, August 1997. Google ScholarDigital Library
Mac96.P. Machanick. The case for SRAM main memory. ComputerArchitecture News, 24(5):23-30, December 1996. Google ScholarDigital Library
MLG92.TC. Mowry, M.S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 62-73, September 1992. Google ScholarDigital Library
MS98.P. Machanick and P. Salverda. Preliminary investigation of the RAMpage memory hierarchy. South African Computer Journal, 1998. In press. http://www, cs .wits .ac. za/~philip/ papers / rampage, html.Google Scholar
NUS+93.D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software-managed TLBs. In Prec. 20th Int. Symp. on ComputerArchitecture (ISCA '93), pages 27-38, San Diego, CA, May 1993. Google ScholarDigital Library
RHWG95.M. Rosenblum, S.A. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel and Distributed Technology, 3(4):34-43, Winter 1995. Google ScholarDigital Library
RL92.A. Rogers and K. Li. Software support for speculative loads. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 38-50, September 1992. Google ScholarDigital Library
SWL+92.M.L. Simmons, H.J. Wasserman, O.A. Lubeck, C. Eoyang, R Mendez, H Harada, and M Ishigum. A performance comparison of four supercomputers. Comm. ACM, 35(8):116-124, August 1992. Google ScholarDigital Library
WB92.B. Wheeler and B.N. Bershad. Consistency management for virtually indexed caches. In Prec. 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pages 124-136, September 1992. Google ScholarDigital Library
WW93.C.A. Waldspurger and W.E. Weihl. Register relocation: flexible contexts for multithreading. In Prec. 20th Annual Int. Syrup. on Computer architecture (ISCA '93), pages 120-130, San Diego, CA, May 1993. Google ScholarDigital Library

Index Terms

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Recommendations

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of ...
Read More
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

The RAMpage memory hierarchy is an alternative to the traditional division between cache and main memory: main memory is moved up a level and DRAM is used as a paging device. The idea behind RAMpage is to reduce hardware complexity, if at the cost of ...
Read More
Understanding the trade-offs in multi-level cell ReRAM memory design
DAC '13: Proceedings of the 50th Annual Design Automation Conference

Resistive Random Access Memory (ReRAM) is one of the most promising emerging memory technologies as a potential replacement for DRAM memory and/or NAND Flash. Multi-level cell (MLC) ReRAM, which can store multiple bits in a single ReRAM cell, can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
October 1998
326 pages
ISBN:1581131070
DOI:10.1145/291069
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwal
Massachusetts Institute of Technology, Cambridge
ACM SIGPLAN Notices Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwel
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 32, Issue 5
Dec. 1998
309 pages
ISSN:0163-5980
DOI:10.1145/384265
Editor:
William M. Waite
Univ. of Colorado, Boulder
Issue’s Table of Contents
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1998
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ASPLOS VIII Paper Acceptance Rate28of123submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 620
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Understanding the trade-offs in multi-level cell ReRAM memory design