skip to main content
10.1145/2485922.2485943acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Efficient virtual memory for big memory servers

Published:23 June 2013Publication History

ABSTRACT

Our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10% of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workloads use read-write permission on most pages, are provisioned not to swap, and rarely benefit from the full flexibility of page-based virtual memory.

To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of the virtual address space. Direct segments use minimal hardware---base, limit and offset registers per core---to map contiguous virtual memory regions directly to contiguous physical memory. They eliminate the possibility of TLB misses for key data structures such as database buffer pools and in-memory key-value stores. Memory mapped by a direct segment may be converted back to paging when needed.

We prototype direct-segment software support for x86-64 in Linux and emulate direct-segment hardware. For our workloads, direct segments eliminate almost all TLB misses and reduce the execution time wasted on TLB misses to less than 0.5%.

References

  1. Adams, K. and Agesen, O. 2006. A comparison of software and hardware techniques for x86 virtualization. Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (Oct. 2006), 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahn, J. et al. 2012. Revisiting Hardware-Assisted Page Walks for Virtualized Systems. Proceedings of the 39th Annual International Symposium on Computer Architecture (Jun. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barr, T. W. et al. 2011. SpecTLB: a mechanism for speculative address translation. Proceedings of the 38th Annual International Symposium on Computer Architecture (Jun. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Barr, T. W. et al. 2010. Translation caching: skip, don't walk (the page table). Proceedings of the 37th Annual International Symposium on Computer Architecture (Jun. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Basu, A. et al. 2012. Reducing Memory Reference Energy With Opportunistic Virtual Caching. Proceedings of the 39th annual international symposium on Computer architecture (Jun. 2012), 297--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bhargava, R. et al. 2008. Accelerating two-dimensional page walks for virtualized systems. Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (Mar. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bhattacharjee, A. et al. 2011. Shared last-level TLBs for chip multiprocessors. Proc. of the 17th IEEE Symp. on High-Performance Computer Architecture (Feb. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bhattacharjee, A. and Martonosi, M. 2009. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors. Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (Sep. 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bhattacharjee, A. and Martonosi, M. 2010. Inter-core cooperative TLB for chip multiprocessors. Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (Mar. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Binkert, N. et al. 2011. The gem5 simulator. Computer Architecture News (CAN). (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, J. B. et al. 1992. A Simulation Based Study of TLB Performance. Proceedings of the 19th Annual International Symposium on Computer Architecture (May. 1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Christos Kozyrakis, A. K. and Vaid, K. 2010. Server Engineering Insights for Large-Scale Online Services. IEEE Micro (Jul. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Couleur, J. F. and Glaser, E. L. 1968. Shared-access Data Processing System. Nov. 1968.Google ScholarGoogle Scholar
  14. Daley, R. C. and Dennis, J. B. 1968. Virtual memory, processes, and sharing in MULTICS. Communications of the ACM. 11, 5 (May. 1968), 306--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Denning, P. J. 1970. Virtual Memory. ACM Computing Surveys. 2, 3 (Sep. 1970), 153--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Emer, J. S. and Clark, D. W. 1984. A Characterization of Processor Performance in the vax-11/780. Proceedings of the 11th Annual International Symposium on Computer Architecture (Jun. 1984), 301--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ferdman, M. et al. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. Proceedings of the 17th Conference on Architectural Support for Programming Languages and Operating Systems (Mar. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ganapathy, N. and Schimmel, C. 1998. General purpose operating system support for multiple page sizes. Proceedings of the annual conference on USENIX Annual Technical Conference (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. graph500 -- The Graph500 List: http://www.graph500.org/.Google ScholarGoogle Scholar
  20. Huge Pages/libhugetlbfs: 2010. http://lwn.net/Articles/374424/.Google ScholarGoogle Scholar
  21. Intel 8086: http://en.wikipedia.org/wiki/Intel_8086.Google ScholarGoogle Scholar
  22. Jacob, B. and Mudge, T. 2001. Uniprocessor Virtual Memory without TLBs. IEEE Transaction on Computer. 50, 5 (May. 2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jacob, B. and Mudge, T. 1998. Virtual Memory in Contemporary Microprocessors. IEEE Micro. 18, 4 (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kandiraju, G. B. and Sivasubramaniam, A. 2002. Going the distance for TLB prefetching: an application-driven study. Proceedings of the 29th Annual International Symposium on Computer Architecture (May. 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Large Page Performance: ESX Server 3.5 and ESX Server 3i v3.5: http://www.vmware.com/files/pdf/large_pg_performance.pdf.Google ScholarGoogle Scholar
  26. Linux pmap utility: http://linux.die.net/man/1/pmap.Google ScholarGoogle Scholar
  27. Lustig, D. et al. 2013. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs. ACM Transactions on Architecture and Code Optimization. (Jan. 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Marissa Mayer at Web 2.0: http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html.Google ScholarGoogle Scholar
  29. Mars, J. et al. 2011. Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations. Proceedings of the 44th Annual IEEE/ACM International Symp. on Microarchitecture (Dec. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. McCurdy, C. et al. 2008. Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors. Proceedings of IEEE International Symposium on Performance Analysis of Systems and software (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Memory Hotplug: http://www.kernel.org/doc/Documentation/memory-hotplug.txt.Google ScholarGoogle Scholar
  32. Microsystems, S. 2007. UltraSPARC T2#8482; Supplement to the UltraSPARC Architecture 2007. (Sep. 2007).Google ScholarGoogle Scholar
  33. Navarro, J. et al. 2002. Practical Transparent Operating System Support for Superpages. Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Dec. 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Oprofile: http://oprofile.sourceforge.net/.Google ScholarGoogle Scholar
  35. Ousterhout, J. and al, et 2011. The case for RAMCloud. Communications of the ACM. 54, 7 (Jul. 2011), 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pham, B. et al. 2012. CoLT: Coalesced Large Reach TLBs. Proceedings of 45th Annual IEEE/ACM International Symposium on Microarchitecture (Dec. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ranganathan, P. 2011. From Microprocessors to Nanostores: Rethinking Data-Centric Systems. Computer. 44, 1 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Reiss, C. et al. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. Proceedings of the 3rd ACM Symposium on Cloud Computing (Oct. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rosenblum, N. E. et al. 2008. Virtual machine-provided context sensitive page mappings. Proceedings of the 4th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments (Mar. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Saulsbury, A. et al. 2000. Recency-based TLB preloading. Proceedings of the 27th Annual International Symposium on Computer Architecture (Jun. 2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sodani, A. 2011. Race to Exascale: Opportunities and Challenges. MICRO 2011 Keynote address.Google ScholarGoogle Scholar
  42. Srikantaiah, S. and Kandemir, M. 2010. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors. Proceedings of 43rd Annual IEEE/ACM International Symposium on Microarchitecture (Dec. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Talluri, M. et al. 1992. Tradeoffs in Supporting Two Page Sizes. Proceedings of the 19th Annual International Symposium on Computer Architecture (May. 1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Talluri, M. and Hill, M. D. 1994. Surpassing the TLB performance of superpages with less operating system support. Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (Oct. 1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. TCMalloc: Thread-Caching Malloc: http://goog-perftools.sourceforge.net/doc/tcmalloc.html.Google ScholarGoogle Scholar
  46. Transparent huge pages: 2011. www.lwn.net/Articles/423584/.Google ScholarGoogle Scholar
  47. Volos, H. et al. 2011. Mnemosyne: Lightweight Persistent Memory. Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (Mar. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Waldspurger, C. A. 2002. Memory Resource Management in VMware ESX Server. Proceedings of the 2002 Symposium on Operating Systems Design and Implementation (Dec. 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wood, D. A. et al. 1986. An in-cache address translation mechanism. Proceedings of 13th annual international symposium on Computer architecture (Jun. 1986). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zhang, L. et al. 2010. Enigma: architectural and operating system support for reducing the impact of address translation. Proceedings of the 24th ACM International Conference on Supercomputing (Jun. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient virtual memory for big memory servers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
      June 2013
      686 pages
      ISBN:9781450320795
      DOI:10.1145/2485922
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
        ICSA '13
        June 2013
        666 pages
        ISSN:0163-5964
        DOI:10.1145/2508148
        Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader