research-article

Open Access

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs

Authors:
Shafiur Rahman

University of California Riverside, CA, USA

University of California Riverside, CA, USA
View Profile

,
Nael Abu-Ghazaleh

University of California Riverside, CA, USA

University of California Riverside, CA, USA
View Profile

,
Walid Najjar

University of California Riverside, CA, USA

University of California Riverside, CA, USA
View Profile

ACM Transactions on Modeling and Computer Simulation Volume 29 Issue 2Article No.: 12pp 1–25https://doi.org/10.1145/3302259

Published:18 April 2019Publication History

ACM Transactions on Modeling and Computer Simulation

Abstract

In this article, we present experiences implementing a general Parallel Discrete Event Simulation (PDES) accelerator on a Field Programmable Gate Array (FPGA). The accelerator can be specialized to any particular simulation model by defining the object states and the event handling code, which are then synthesized into a custom accelerator for the given model. The accelerator consists of several event processors that can process events in parallel while maintaining the dependencies between them. Events are automatically sorted by a self-sorting event queue. The accelerator supports optimistic simulation by automatically keeping track of event history and supporting rollbacks. The architecture is limited in scalability locally by the communication and port bandwidth of the different structures. However, it is designed to allow multiple accelerators to be connected to scale up the simulation. We evaluate the design and explore several design trade-offs and optimizations. We show that the accelerator can scale to 64 concurrent event processors relative to the performance of a single event processor. At this point, the scalability becomes limited by contention on the shared structures within the datapath. To alleviate this bottleneck, we also develop a new version of the datapath that partitions the state and event space of the simulation but allows these partitions to share the use of the event processors. The new design substantially reduces contention and improves the performance with 64 processors from 49x to 62x relative to a single processor design. We went through two iterations of the design of PDES-A, first using Verilog and then using Chisel (for the partitioned version of the design). We report in this article on some observations in the differences in prototyping accelerators using these two different languages. PDES-A outperforms the ROSS simulator running on a 12-core Intel Xeon machine by a factor of 3.2x with less than 15% of the power consumption. Our future work includes building multiple interconnected PDES-A cores.

References

Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: Constructing hardware in a scala embedded language. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 1216--1225. Google ScholarDigital Library
R. Bhagwan and B. Lin. 2000. Fast and scalable priority queue architecture for high-speed network switches. In Proceedings of IEEE INFOCOM 2000. Conference on Computer Communications. 19th Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 2. IEEE, Tel Aviv, Israel, 538--547.Google Scholar
R. Brown. 1988. Calendar queues: A fast 0(1) priority queue implementation for the simulation event set problem. Commun. ACM 31, 10 (Oct. 1988), 1220--1227. Google ScholarDigital Library
J. Burt. 2016. Intel Begins Shipping Xeon Chips With FPGA Accelerators. Retrieved February 2017 from http://www.eweek.com/servers/intel-begins-shipping-xeon-chips-with-fpga-accelerators.html.Google Scholar
Christopher D. Carothers. 2018. ROSS-Models. Retrieved January 31, 2019 from https://github.com/carothersc/ROSS-Models.Google Scholar
Christopher D. Carothers, David Bauer, and Shawn Pearce. 2000. ROSS: A high-performance, low memory, modular time warp system. In Proceedings of the 14th Workshop on Parallel and Distributed Simulation (PADS’00). IEEE Computer Society, Washington, DC, 53--60. http://dl.acm.org/citation.cfm?id=336146.336157 Google ScholarDigital Library
Guillaume Chapuis, Stephan Eidenbenz, Nandakishore Santhi, and Eun Jung Park. 2015. Simian integrated framework for parallel discrete event simulation on GPUs. In Proceedings of the 2015 Winter Simulation Conference (WSC’15). IEEE Press, Piscataway, NJ, 1127--1138. http://dl.acm.org/citation.cfm?id=2888619.2888742 Google ScholarDigital Library
Huilong Chen, Yiping Yao, Wenjie Tang, Dong Meng, Feng Zhu, Yuewen Fu, and Yiping Yao. 2015. Can MIC find its place in the field of PDES? An early performance evaluation of PDES simulator on Intel many integrated cores coprocessor. In Proceedings of the 19th International Symposium on Distributed Simulation and Real Time Applications (DS-RT’15). IEEE Press, Piscataway, NJ, 41--49. Google ScholarDigital Library
Convey Computers Corporation. 2013. The Convey WX Series (conv-13-045.5 ed.). https://www.micron.com/-/media/client/global/documents/products/product-flyer/conv13045,-d-,5-wolverine_r1b.pdf.Google Scholar
Convey Computers Corporation. 2014. Convey Wolverine® Application Accelerators Architectural Overview (CONV-14-049.1 ed.). https://www.micron.com/-/media/client/global/documents/products/white-paper/wp_conv14049,-d-,1_wolverine_arch_overview.pdf.Google Scholar
Samir Das, Richard Fujimoto, Kiran Panesar, Don Allison, and Maria Hybinette. 1994. GTW: A time warp system for shared memory multiprocessors. In Proceedings of the 26th Conference on Winter Simulation (WSC’94). Society for Computer Simulation International, San Diego, CA, 1332--1339. http://dl.acm.org/citation.cfm?id=193201.194885 Google ScholarDigital Library
Richard Fujimoto. 2015. Parallel and distributed simulation. In Proceedings of the 2015 Winter Simulation Conference (WSC’15). IEEE Press, Piscataway, NJ, 45--59. http://dl.acm.org/citation.cfm?id=2888619.2888624 Google ScholarDigital Library
Richard M. Fujimoto. 1999. Parallel and Distribution Simulation Systems. John Wiley 8 Sons, Inc., New York, NY. Google ScholarDigital Library
Richard M. Fujimoto, Jya-Jang Tsai, and Ganesh C. Gopalakrishnan. 1992. Design and evaluation of the rollback chip: Special purpose hardware for time warp. IEEE Trans. Comput. 41, 1 (Jan. 1992), 68--82. Google ScholarDigital Library
Sounak Gupta and Philip A. Wilsey. 2014. Lock-free pending event set management in time warp. In Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM PADS’14). ACM, New York, NY, 15--26. Google ScholarDigital Library
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 37--47. Google ScholarDigital Library
Muhammad Amber Hassaan, Martin Burtscher, and Keshav Pingali. 2011. Ordered vs. unordered: A comparison of parallelism and work-efficiency in irregular algorithms. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, 3--12. Google ScholarDigital Library
M. C. Herbordt, F. Kosie, and J. Model. 2008. An efficient O(1) priority queue for large FPGA-based discrete event simulations of molecular dynamics. In 2008 16th International Symposium on Field-Programmable Custom Computing Machines. IEEE, Palo Alto, CA, 248--257. Google ScholarDigital Library
Hybrid Memory Cube Consortium. 2014. Hybrid Memory Cube Specification 2.1 (2.1 ed.). http://hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.1_20151105.pdf.Google Scholar
Amazon Web Services, Inc. 2018. Amazon EC2 F1 Instances. Retrieved January 31, 2019 from https://aws.amazon.com/ec2/instance-types/f1/.Google Scholar
Deepak Jagtap, Ketan Bahulkar, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2012. Characterizing and understanding PDES behavior on Tilera architecture. In Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS’12). IEEE Computer Society, Washington, DC, 53--62. Google ScholarDigital Library
David R. Jefferson. 1985. Virtual time. ACM Trans. Program. Lang. Syst. 7, 3 (July 1985), 404--425. Google ScholarDigital Library
Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A scalable architecture for ordered parallelism. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO’15). ACM, New York, NY, 228--241. Google ScholarDigital Library
Ranjit Noronha and Nael B. Abu-Ghazaleh. 2002. Early cancellation: An active NIC optimization for time-warp. In Proceedings of the 16th Workshop on Parallel and Distributed Simulation (PADS’02). IEEE Computer Society, Washington, DC, 43--50. http://dl.acm.org/citation.cfm?id=564062.564070 Google ScholarDigital Library
Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, and Guy Boudoukh. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’17). ACM, New York, NY, 5--14. Google ScholarDigital Library
Hyungwook Park and Paul A. Fishwick. 2010. A GPU-based application framework supporting fast discrete-event simulation. Simulation 86, 10 (Oct. 2010), 613--628. Google ScholarDigital Library
Alessandro Pellegrini and Francesco Quaglia. 2014. Transparent multi-core speculative parallelization of DES models with event and cross-state dependencies. In Proceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM PADS’14). ACM, New York, NY, 105--116. Google ScholarDigital Library
Kalyan S. Perumalla. 2006. Discrete-event execution alternatives on general purpose graphical processing units (GPGPUs). In Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation (PADS’06). IEEE Computer Society, Washington, DC, 74--81. Google ScholarDigital Library
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of parallelism in algorithms. SIGPLAN Not. 46, 6 (June 2011), 12--25. Google ScholarDigital Library
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA’14). IEEE Press, Piscataway, NJ, 13--24. http://dl.acm.org/citation.cfm?id=2665671.2665678 Google ScholarDigital Library
Shafiur Rahman, Nael Abu-Ghazaleh, and Walid Najjar. 2017. PDES-A: A parallel discrete event simulation accelerator for FPGAs. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS’17). ACM, New York, NY, 133--144. Google ScholarDigital Library
Joseph Rios. 2007. An Efficient FPGA Priority Queue Implementation with Application to the Routing Problem. UC Santa Cruz Technical Report. University of California, Santa Cruz, Santa Cruz, CA. https://www.soe.ucsc.edu/research/technical-reports/UCSC-CRL-07-01Google Scholar
Robert Rönngren and Rassul Ayani. 1997. A comparative study of parallel and sequential priority queue algorithms. ACM Transactions on Modeling and Computer Simulation 7, 2 (1997), 157--209. Google ScholarDigital Library
N. Santhi, S. Eidenbenz, and J. Liu. 2015. The Simian concept: Parallel discrete event simulation with interpreted languages and just-in-time compilation. In 2015 Winter Simulation Conference (WSC’15). IEEE, Huntington Beach, CA, 3013--3024. Google ScholarDigital Library
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE Press, Piscataway, NJ, Article 17, 12 pages. http://dl.acm.org/citation.cfm?id=3195638.3195659 Google ScholarDigital Library
Philip Andrew Simpson. 2015. FPGA Design. Springer International Publishing, Cham. http://link.springer.com/10.1007/978-3-319-17924-7Google Scholar
Jeffrey S. Steinman. 2005. The WarpIV simulation kernel. In Proceedings of the 19th Workshop on Principles of Advanced and Distributed Simulation (PADS’05). IEEE Computer Society, Washington, DC, 161--170. Google ScholarDigital Library
Zhangxi Tan, Andrew Waterman, Rimas Avizienis, Yunsup Lee, Henry Cook, David Patterson, and Krste Asanović. 2010. RAMP Gold: An FPGA-based architecture simulator for multiprocessors. In Proceedings of the 47th Design Automation Conference (DAC’10). ACM, New York, NY, 463--468. Google ScholarDigital Library
Wenjie Tang and Yiping Yao. 2013. A GPU-based discrete event simulation kernel. Simulation 89, 11 (Nov. 2013), 1335--1354. Google ScholarDigital Library
Jingjing Wang, Deepak Jagtap, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2014. Parallel discrete event simulation for multi-core systems: Analysis and optimization. IEEE Transactions on Parallel and Distributed Systems 25, 6 (2014), 1574--1584. Google ScholarDigital Library
Jingjing Wang, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2012. Performance analysis of a multithreaded PDES simulator on multicore clusters. In Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation (PADS’12). IEEE Computer Society, Washington, DC, 93--95. Google ScholarDigital Library
Barry Williams, Dmitry Ponomarev, Nael Abu-Ghazaleh, and Philip Wilsey. 2017. Performance characterization of parallel discrete event simulation on Knights Landing processor. In Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS’17). ACM, New York, NY, 121--132. Google ScholarDigital Library
S. Zhou, C. Chelmis, and V. K. Prasanna. 2016. High-throughput and energy-efficient graph processing on FPGA. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). IEEE, Washington, DC, 103--110.Google Scholar

Index Terms

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation types and techniques
      1. Discrete-event simulation
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

PDES-A: a Parallel Discrete Event Simulation Accelerator for FPGAs
SIGSIM-PADS '17: Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

In this paper, we present initial experiences implementing a general Parallel Discrete Event Simulation (PDES) accelerator on a Field Programmable Gate Array (FPGA). The accelerator can be specialized to any particular simulation model by defining the ...
Read More
SAccO

This paper presents SAccO (Scalable Accelerator platform Osnabrück), a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators. Instead of using expensive "reconfigurable ...
Read More
Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform
HPRCTA '07: Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07

An innovative reconfigurable supercomputing platform -- XD1000 is developed by XtremeData Inc. to exploit the rapid progress of FPGA technology and the high-performance of Hyper-Transport interconnection. In this paper, we present the implementations of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Modeling and Computer Simulation Volume 29, Issue 2
Special Issue on PADS 2017
April 2019
105 pages
ISSN:1049-3301
EISSN:1558-1195
DOI:10.1145/3320014
Editor:
Adelinde M. Uhrmacher
University of Rostock, Germany
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2019
- Accepted: 1 December 2018
- Revised: 1 September 2018
- Received: 1 December 2017
Published in tomacs Volume 29, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
PDES
accelerator
coprocessor
parallel simulation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 567
  Total Downloads
- Downloads (Last 12 months)141
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs

ACM Transactions on Modeling and Computer Simulation

Abstract

References

Cited By

Index Terms

Recommendations

PDES-A: a Parallel Discrete Event Simulation Accelerator for FPGAs

SAccO

Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

PDES-A: Accelerators for Parallel Discrete Event Simulation Implemented on FPGAs

ACM Transactions on Modeling and Computer Simulation

Abstract

References

Cited By

Index Terms

Recommendations

PDES-A: a Parallel Discrete Event Simulation Accelerator for FPGAs

SAccO

Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media