research-article

Public Access

GAS: A Heterogeneous Memory Architecture for Graph Processing

Authors:
Minxuan Zhou

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

,
Mohsen Imani

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

,
Saransh Gupta

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

,
Tajana Rosing

CSE Department, UC San Diego, La Jolla, CA, USA

CSE Department, UC San Diego, La Jolla, CA, USA
View Profile

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and DesignJuly 2018Article No.: 27Pages 1–6https://doi.org/10.1145/3218603.3218631

Published:23 July 2018Publication History

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

Pages 1–6

ABSTRACT

Graph processing has become important for various applications in today's big data era. However, most graph processing applications suffer from large memory overhead due to random memory accesses. Such random memory access pattern provides little temporal and spatial locality which cannot be accelerated by the conventional hierarchical memory system. In this work, we propose GAS, a heterogeneous memory architecture, to accelerate graph applications implemented in message-based vertex program model, which is widely used in various graph processing systems. GAS utilizes the specialized content-addressable memory (CAM) to store random data, and determine exact access patterns by a series of associative search. Thus, GAS not only removes the inefficiency of random accesses but also reduces the memory access latency by accurate prefetching. We test the efficiency of GAS with three important graph processing kernels on five well-known graphs. Our experimental results show that GAS can significantly reduce cache miss rate and improve the bandwidth utilization as compared to a conventional system with a state-of-the-art graph-specific prefetching mechanism. These enhancements result in 34% and 27% reduction in energy consumption and execution time, respectively.

References

Y. Low et al., "Distributed graphlab: a framework for machine learning and data mining in the cloud," Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 716--727, 2012. Google ScholarDigital Library
G. Malewicz et al., "Pregel: a system for large-scale graph processing," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135--146, ACM, 2010. Google ScholarDigital Library
S. Beamer et al., "Locality exists in graph processing: Workload characterization on an ivy bridge server," in IEEE IISWC, pp. 56--65, IEEE, 2015. Google ScholarDigital Library
T. J. Ham et al., "Graphicionado: A high-performance and energy-efficient accelerator for graph analytics," in IEEE/ACM Micro, pp. 1--13, IEEE, 2016. Google ScholarDigital Library
M. M. Ozdal et al., "Energy efficient architecture for graph analytics accelerators," in ACM/IEEE ISCA, pp. 166--177, IEEE, 2016. Google ScholarDigital Library
N. Sundaram et al., "Graphmat: High performance graph analytics made productive," Proceedings of the VLDB Endowment, vol. 8, no. 11, pp. 1214--1225, 2015. Google ScholarDigital Library
M. Imani et al., "Ultra-efficient processing in-memory for data intensive applications," in ACM DAC, p. 6, ACM, 2017. Google ScholarDigital Library
M. Imani et al., "Mpim: Multi-purpose in-memory processing using configurable resistive memory," in IEEE ASP-DAC, pp. 757--763, IEEE, 2017.Google Scholar
M. Imani et al., "Nvquery: Efficient query processing in non-volatile memory," IEEE TCAD, 2018.Google Scholar
M. Imani et al., "Efficient query processing in crossbar memory," in IEEE ISLPED, pp. 1--6, IEEE, 2017.Google Scholar
J. Sim et al., "Lupis:latch-up based ultra efficient processing in-memory system," in IEEE ISQED, pp. 1--6, IEEE, 2018.Google Scholar
S. Salehi et al., "Mitigating process variability for non-volatile cache resilience and yield," IEEE Transactions on Emerging Topics in Computing, 2018.Google ScholarCross Ref
N. Khoshavi et al., "Variation-immune resistive non-volatile memory using self-organized sub-bank circuit designs," in IEEE ISQED, pp. 52--57, IEEE, 2017.Google Scholar
L. Nai et al., "Graphpim: Enabling instruction-level pim offloading in graph computing frameworks," in IEEE HPCA, pp. 457--468, IEEE, 2017.Google Scholar
L. Song et al., "Graphr: Accelerating graph processing using reram," arXiv preprint arXiv:1708.06248, 2017.Google Scholar
S. Ainsworth et al., "Graph prefetching using data structure knowledge," in Proceedings of the 2016 International Conference on Supercomputing, p. 39, ACM, 2016. Google ScholarDigital Library
X. Yu et al., "Imp: Indirect memory prefetcher," in Proceedings of the 48th International Symposium on Microarchitecture, pp. 178--190, ACM, 2015. Google ScholarDigital Library
M. S. Riazi et al., "Camsure: Secure content-addressable memory for approximate search," ACM TECS, vol. 16, no. 5s, p. 136, 2017. Google ScholarDigital Library
M. Sharad et al., "Ultra low power associative computing with spin neurons and resistive crossbar memory," in ACM DAC, p. 107, ACM, 2013. Google ScholarDigital Library
M. Imani et al., "Exploring hyperdimensional associative memory," in IEEE HPCA, pp. 445--456, IEEE, 2017.Google Scholar
M. Imani et al., "Masc: Ultra-low energy multiple-access single-charge tcam for approximate computing," in IEEE/ACM DATE, pp. 373--378, EDA Consortium, 2016. Google ScholarDigital Library
M. Imani et al., "Approximate computing using multiple-access single-charge associative memory," IEEE TETC, 2016.Google Scholar
T. E. Carlson et al., "An evaluation of high-level mechanistic core models," ACM TACO, vol. 11, no. 3, p. 28, 2014. Google ScholarDigital Library
C.-K. Luk et al., "Pin: building customized program analysis tools with dynamic instrumentation," in Acm sigplan notices, vol. 40, pp. 190--200, ACM, 2005. Google ScholarDigital Library
S. Li et al., "Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures," in Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pp. 469--480, IEEE, 2009. Google ScholarDigital Library
W. Heirman et al., "Power-aware multi-core simulation for early design stage hardware/software co-optimization," in IEEE PACT, pp. 3--12, IEEE, 2012. Google ScholarDigital Library
P. Shivakumar et al., "Cacti 3.0: An integrated cache timing, power, and area model," 2001.Google Scholar
S. Kvatinsky et al., "Vteam: A general model for voltage-controlled memristors," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 8, pp. 786--790, 2015.Google ScholarCross Ref
X. Dong et al., "Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory," in Emerging Memory Technologies, pp. 15--50, Springer, 2014.Google Scholar
P. Boldi et al., "Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks," in ACM WWW. Google ScholarDigital Library
P. Boldi et al., "The WebGraph framework I: Compression techniques," in ACM WWW, (Manhattan, USA), pp. 595--601, ACM Press, 2004. Google ScholarDigital Library
S. Beamer et al., "The gap benchmark suite," arXiv preprint arXiv:1508.03619, 2015.Google Scholar
"Graph500 benchmark." http://graph500.org/.Google Scholar

Index Terms

GAS: A Heterogeneous Memory Architecture for Graph Processing
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Emerging technologies

Recommendations

Optimizing Program Locality Through CMEs and GAs
PACT '03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques

Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data ...
Read More
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited ...
Read More
Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Placing the DRAM in the same package as a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design
July 2018
327 pages
ISBN:9781450357043
DOI:10.1145/3218603

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate398of1,159submissions,34%
Upcoming Conference
ISLPED '24

Sponsor:

sigda

ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

Newport Beach , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 575
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GAS: A Heterogeneous Memory Architecture for Graph Processing

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing Program Locality Through CMEs and GAs

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

GAS: A Heterogeneous Memory Architecture for Graph Processing

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing Program Locality Through CMEs and GAs

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media