skip to main content
article
Open Access

Remix: online detection and repair of cache contention for the JVM

Published:02 June 2016Publication History
Skip Abstract Section

Abstract

As ever more computation shifts onto multicore architectures, it is increasingly critical to find effective ways of dealing with multithreaded performance bugs like true and false sharing. Previous approaches to fixing false sharing in unmanaged languages have employed highly-invasive runtime program modifications. We observe that managed language runtimes, with garbage collection and JIT code compilation, present unique opportunities to repair such bugs directly, mirroring the techniques used in manual repairs. We present Remix, a modified version of the Oracle HotSpot JVM which can detect cache contention bugs and repair false sharing at runtime. Remix's detection mechanism leverages recent performance counter improvements on Intel platforms, which allow for precise, unobtrusive monitoring of cache contention at the hardware level. Remix can detect and repair known false sharing issues in the LMAX Disruptor high-performance inter-thread messaging library and the Spring Reactor event-processing framework, automatically providing 1.5-2x speedups over unoptimized code and matching the performance of hand-optimization. Remix also finds a new false sharing bug in SPECjvm2008, and uncovers a true sharing bug in the HotSpot JVM that, when fixed, improves the performance of three NAS Parallel Benchmarks by 7-25x. Remix incurs no statistically-significant performance overhead on other benchmarks that do not exhibit cache contention, making Remix practical for always-on use.

References

  1. Ali-Reza Adl-Tabatabai, Richard L. Hudson, Mauricio J. Serrano, and Sreenivas Subramoney. Prefetch Injection Based on Hardware Monitoring and Object Metadata. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI ’04, pages 267–276, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovi´c, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA ’06, pages 169– 190, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. An Analysis of Linux Scalability to Many Cores. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 1–8, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dries Buytaert, Andy Georges, Michael Hind, Matthew Arnold, Lieven Eeckhout, and Koen De Bosschere. Using HPM-sampling to Drive Dynamic Compilation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, OOPSLA ’07, pages 553–568, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Trishul M. Chilimbi and James R. Larus. Using Generational Garbage Collection to Implement Cache-conscious Data Placement. In Proceedings of the 1st International Symposium on Memory Management, ISMM ’98, pages 37–48, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. Scalable Address Spaces Using RCU Balanced Trees. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 199–210, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. RadixVM: Scalable Address Spaces for Multithreaded Applications. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pages 211–224, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Intel Corporation. Avoiding and Identifying False Sharing Among Threads. https://software.intel.com/en-us/articles/ avoiding-and-identifying-false-sharing-among-threads, 2011.Google ScholarGoogle Scholar
  9. Intel Corporation. Intel(R) 64 and IA-32 Architectures Software Developer’s Manual, Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B and 3C, 6 2015.Google ScholarGoogle Scholar
  10. Oracle Corporation. VisualVM: All-in-One Java Troubleshooting Tool. https://visualvm.java.net/, 2015.Google ScholarGoogle Scholar
  11. Standard Performance Evaluation Corporation. SPECjvm2008. http://www.spec.org/jvm2008/, 2008.Google ScholarGoogle Scholar
  12. Florian David, Gael Thomas, Julia Lawall, and Gilles Muller. Continuously Measuring Critical Section Pressure with the Free-lunch Profiler. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’14, pages 291– 307, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 37–48, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Julian Dolby. Automatic Inline Allocation of Objects. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation, PLDI ’97, pages 7–17, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Julian Dolby and Andrew Chien. An Automatic Object Inlining Optimization and Its Evaluation. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI ’00, pages 345–357, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Julian Dolby and Andrew A. Chien. An Evaluation of Automatic Object Inline Allocation Techniques. In Proceedings of the 13th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’98, pages 1–20, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Apache Software Foundation. Apache Log4j 2 website. http: //logging.apache.org/log4j/2.x/, 2015.Google ScholarGoogle Scholar
  18. Michael A. Frumkin, Matthew Schultz, Haoqiang Jin, and Jerry Yan. Implementation of the NAS Parallel Benchmarks in Java. Technical Report NAS-02-009, NASA Advanced Supercomputing Division, 2002.Google ScholarGoogle Scholar
  19. functionaljava.org. functionaljava: A Library for Functional Programming in Java. functionaljava.org, 2010.Google ScholarGoogle Scholar
  20. Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKinley, J Eliot B. Moss, Zhenlin Wang, and Perry Cheng. The Garbage Collection Advantage: Improving Program Locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’04, pages 69–80, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Hupel and typelevel.org. scalaz: Functional programming for Scala. http://typelevel.org/projects/scalaz/, 2010.Google ScholarGoogle Scholar
  22. Shams Imam and Vivek Sarkar. Habanero-Java Library: A Java 8 Framework for Multicore Programming. In Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, PPPJ ’14, pages 75–86, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  23. Shams M. Imam and Vivek Sarkar. Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries. In Proceedings of the 4th International Workshop on Programming Based on Actors, Agents & Decentralized Control, AGERE! ’14, pages 67–80, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ondrej Lhoták and Laurie Hendren. Run-time Evaluation of Opportunities for Object Inlining in Java. In Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande, JGI ’02, pages 175–184, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. The Java Virtual Machine Specification: Java SE 8 Edition, chapter 4.4 The class File Format. Oracle Corporation, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.-L. Liu. False Sharing Analysis for Multithreaded Programs. Master’s thesis, National Chung Cheng University, 7 2009.Google ScholarGoogle Scholar
  27. Tongping Liu and Emery D. Berger. SHERIFF: Precise Detection and Automatic Mitigation of False Sharing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’11, pages 3–18, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tongping Liu, Chen Tian, Ziang Hu, and Emery D. Berger. PREDATOR: Predictive False Sharing Detection. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 3–14, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. LMAX. LMAX Disruptor — Open Source — LMAX Exchange. https://www.lmax.com/disruptor, 2015.Google ScholarGoogle Scholar
  30. Kai Lu, Xu Zhou, Tom Bergan, and Xiaoping Wang. Efficient Deterministic Multithreading Without Global Barriers. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 287–300, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Liang Luo, Akshitha Sriraman, Brooke Fugate, Shiliang Hu, Gilles Pokam, Chris Newburn, and Joseph Devietti. LASER: Light, Accurate Sharing dEtection and Repair. In Proceedings of the 2016 IEEE 22nd International Symposium on High Performance Computer Architecture, HPCA ’16, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  32. Linux Programmer’s Manual. perf event open(2) Linux Programmer’s Manual, 2015.Google ScholarGoogle Scholar
  33. mcmcc. false sharing in boost::detail::spinlock pool? http://stackoverflow.com/questions/11037655/ false-sharing-in-boostdetailspinlock-pool, June 2012.Google ScholarGoogle Scholar
  34. Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield. Whose Cache Line is It Anyway?: Operating System Support for Live Detection and Repair of False Sharing. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys ’13, pages 141–154, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Scott Oaks. Java Performance: The Definitive Guide. O’Reilly Media, 3rd edition, April 2014. Page 266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Oracle. Java 7 SE API documentation: java.util.Random. http: //docs.oracle.com/javase/7/docs/api/java/util/Random.html, 2014.Google ScholarGoogle Scholar
  37. Reactor Project. Spring Reactor. http://projectreactor.io/, 2015.Google ScholarGoogle Scholar
  38. Mikael Ronstrom. MySQL team increases scalability by > 50% for Sysbench OLTP RO in MySQL 5.6 labs release april 2012. http://mikaelronstrom.blogspot.com/2012/ 04/mysql-team-increases-scalability-by-50.html, April 2012.Google ScholarGoogle Scholar
  39. Martin Schindewolf. Analysis of Cache Misses Using SIMICS. Master’s thesis, Institute for Computing Systems Architecture, University of Edinburgh, 2007.Google ScholarGoogle Scholar
  40. Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. Da Capo con Scala: Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine. In Proceedings of the 26th Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA ’11, pages 657–676, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yefim Shuf, Manish Gupta, Hubertus Franke, Andrew Appel, and Jaswinder Pal Singh. Creating and Preserving Locality of Java Applications at Allocation and Garbage Collection Times. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’02, pages 13–25, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Spring.io. Spring.io website. https://spring.io/, 2015.Google ScholarGoogle Scholar
  43. Suriya Subramanian, Michael Hicks, and Kathryn S. McKinley. Dynamic Software Updates: A VM-centric Approach. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’09, pages 1–12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Peter F. Sweeney, Matthias Hauswirth, Brendon Cahoon, Perry Cheng, Amer Diwan, David Grove, and Michael Hind. Using Hardware Performance Monitors to Understand the Behavior of Java Applications. In Proceedings of the 3rd Conference on Virtual Machine Research And Technology Symposium - Volume 3, VM’04, pages 5–5, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. The GPars team. The GPars Project - Reference Documentation. http://www.gpars.org/guide/, 2014.Google ScholarGoogle Scholar
  46. Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, and Andrew Stewart. Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads. http://disruptor.googlecode.com/files/Disruptor-1.0. pdf, 5 2011.Google ScholarGoogle Scholar
  47. Christian Wimmer and Hanspeter Mössenböck. Automatic Feedback-directed Object Inlining in the Java Hotspot Virtual Machine. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE ’07, pages 12–21, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Christian Wimmer and Hanspeter Mössenböck. Automatic Array Inlining in Java Virtual Machines. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’08, pages 14–23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Christian Wimmer and Hanspeter Mössenbösck. Automatic Feedback-directed Object Fusing. ACM Trans. Archit. Code Optim., 7(2):7:1–7:35, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. LLC. WorldWide Conferencing. Lift Framework - LiftActor. http://liftweb.net/, 2014.Google ScholarGoogle Scholar
  51. Derek Wyatt. Akka Concurrency - Building reliable software in a multicore world. Technical report, Artima Incorporation, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. YourKit. YourKit Java Profiler - .NET Profiler. https://www. yourkit.com/, 2015.Google ScholarGoogle Scholar
  53. Qin Zhao, David Koh, Syed Raza, Derek Bruening, Weng-Fai Wong, and Saman Amarasinghe. Dynamic Cache Contention Detection in Multi-threaded Applications. In Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’11, pages 27–38, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Remix: online detection and repair of cache contention for the JVM

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 6
        PLDI '16
        June 2016
        726 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2980983
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2016
          726 pages
          ISBN:9781450342612
          DOI:10.1145/2908080
          • General Chair:
          • Chandra Krintz,
          • Program Chair:
          • Emery Berger

        Copyright © 2016 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 June 2016

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader