skip to main content
10.1145/3218603.3218631acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Public Access

GAS: A Heterogeneous Memory Architecture for Graph Processing

Published:23 July 2018Publication History

ABSTRACT

Graph processing has become important for various applications in today's big data era. However, most graph processing applications suffer from large memory overhead due to random memory accesses. Such random memory access pattern provides little temporal and spatial locality which cannot be accelerated by the conventional hierarchical memory system. In this work, we propose GAS, a heterogeneous memory architecture, to accelerate graph applications implemented in message-based vertex program model, which is widely used in various graph processing systems. GAS utilizes the specialized content-addressable memory (CAM) to store random data, and determine exact access patterns by a series of associative search. Thus, GAS not only removes the inefficiency of random accesses but also reduces the memory access latency by accurate prefetching. We test the efficiency of GAS with three important graph processing kernels on five well-known graphs. Our experimental results show that GAS can significantly reduce cache miss rate and improve the bandwidth utilization as compared to a conventional system with a state-of-the-art graph-specific prefetching mechanism. These enhancements result in 34% and 27% reduction in energy consumption and execution time, respectively.

References

  1. Y. Low et al., "Distributed graphlab: a framework for machine learning and data mining in the cloud," Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Malewicz et al., "Pregel: a system for large-scale graph processing," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135--146, ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Beamer et al., "Locality exists in graph processing: Workload characterization on an ivy bridge server," in IEEE IISWC, pp. 56--65, IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. J. Ham et al., "Graphicionado: A high-performance and energy-efficient accelerator for graph analytics," in IEEE/ACM Micro, pp. 1--13, IEEE, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. M. Ozdal et al., "Energy efficient architecture for graph analytics accelerators," in ACM/IEEE ISCA, pp. 166--177, IEEE, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Sundaram et al., "Graphmat: High performance graph analytics made productive," Proceedings of the VLDB Endowment, vol. 8, no. 11, pp. 1214--1225, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Imani et al., "Ultra-efficient processing in-memory for data intensive applications," in ACM DAC, p. 6, ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Imani et al., "Mpim: Multi-purpose in-memory processing using configurable resistive memory," in IEEE ASP-DAC, pp. 757--763, IEEE, 2017.Google ScholarGoogle Scholar
  9. M. Imani et al., "Nvquery: Efficient query processing in non-volatile memory," IEEE TCAD, 2018.Google ScholarGoogle Scholar
  10. M. Imani et al., "Efficient query processing in crossbar memory," in IEEE ISLPED, pp. 1--6, IEEE, 2017.Google ScholarGoogle Scholar
  11. J. Sim et al., "Lupis:latch-up based ultra efficient processing in-memory system," in IEEE ISQED, pp. 1--6, IEEE, 2018.Google ScholarGoogle Scholar
  12. S. Salehi et al., "Mitigating process variability for non-volatile cache resilience and yield," IEEE Transactions on Emerging Topics in Computing, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  13. N. Khoshavi et al., "Variation-immune resistive non-volatile memory using self-organized sub-bank circuit designs," in IEEE ISQED, pp. 52--57, IEEE, 2017.Google ScholarGoogle Scholar
  14. L. Nai et al., "Graphpim: Enabling instruction-level pim offloading in graph computing frameworks," in IEEE HPCA, pp. 457--468, IEEE, 2017.Google ScholarGoogle Scholar
  15. L. Song et al., "Graphr: Accelerating graph processing using reram," arXiv preprint arXiv:1708.06248, 2017.Google ScholarGoogle Scholar
  16. S. Ainsworth et al., "Graph prefetching using data structure knowledge," in Proceedings of the 2016 International Conference on Supercomputing, p. 39, ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Yu et al., "Imp: Indirect memory prefetcher," in Proceedings of the 48th International Symposium on Microarchitecture, pp. 178--190, ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. S. Riazi et al., "Camsure: Secure content-addressable memory for approximate search," ACM TECS, vol. 16, no. 5s, p. 136, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Sharad et al., "Ultra low power associative computing with spin neurons and resistive crossbar memory," in ACM DAC, p. 107, ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Imani et al., "Exploring hyperdimensional associative memory," in IEEE HPCA, pp. 445--456, IEEE, 2017.Google ScholarGoogle Scholar
  21. M. Imani et al., "Masc: Ultra-low energy multiple-access single-charge tcam for approximate computing," in IEEE/ACM DATE, pp. 373--378, EDA Consortium, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Imani et al., "Approximate computing using multiple-access single-charge associative memory," IEEE TETC, 2016.Google ScholarGoogle Scholar
  23. T. E. Carlson et al., "An evaluation of high-level mechanistic core models," ACM TACO, vol. 11, no. 3, p. 28, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C.-K. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," in Acm sigplan notices, vol. 40, pp. 190--200, ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Li et al., "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pp. 469--480, IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Heirman et al., "Power-aware multi-core simulation for early design stage hardware/software co-optimization," in IEEE PACT, pp. 3--12, IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Shivakumar et al., "Cacti 3.0: An integrated cache timing, power, and area model," 2001.Google ScholarGoogle Scholar
  28. S. Kvatinsky et al., "Vteam: A general model for voltage-controlled memristors," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 8, pp. 786--790, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  29. X. Dong et al., "Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory," in Emerging Memory Technologies, pp. 15--50, Springer, 2014.Google ScholarGoogle Scholar
  30. P. Boldi et al., "Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks," in ACM WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Boldi et al., "The WebGraph framework I: Compression techniques," in ACM WWW, (Manhattan, USA), pp. 595--601, ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Beamer et al., "The gap benchmark suite," arXiv preprint arXiv:1508.03619, 2015.Google ScholarGoogle Scholar
  33. "Graph500 benchmark." http://graph500.org/.Google ScholarGoogle Scholar

Index Terms

  1. GAS: A Heterogeneous Memory Architecture for Graph Processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design
        July 2018
        327 pages
        ISBN:9781450357043
        DOI:10.1145/3218603

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate398of1,159submissions,34%

        Upcoming Conference

        ISLPED '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader