ABSTRACT
The computational efficiency of a state of the art ab initio quantum transport (QT) solver, capable of revealing the coupled electrothermal properties of atomically-resolved nano-transistors, has been improved by up to two orders of magnitude through a data centric reorganization of the application. The approach yields coarse- and fine-grained data-movement characteristics that can be used for performance and communication modeling, communication-avoidance, and dataflow transformations. The resulting code has been tuned for two top-6 hybrid supercomputers, reaching a sustained performance of 85.45 Pflop/s on 4,560 nodes of Summit (42.55% of the peak) in double precision, and 90.89 Pflop/s in mixed precision. These computational achievements enable the restructured QT simulator to treat realistic nanoelectronic devices made of more than 10,000 atoms within a 14x shorter duration than the original code needs to handle a system with 1,000 atoms, on the same number of CPUs/GPUs and with the same physical accuracy.
- T. Ben-Nun, J. de Fine Licht, A. N. Ziogas, T. Schneider, and T. Hoefler. 2019. Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis.Google Scholar
- M. Calderara, S. Brück, A. Pedersen, M. H. Bani-Hashemian, J. VandeVondele, and M. Luisier. 2015. Pushing Back the Limit of Ab-initio Quantum Transport Simulations on Hybrid Supercomputers. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, 3:1--3:12.Google Scholar
- E. Carson, J. Demmel, L. Grigori, N. Knight, P. Koanantakool, O. Schwartz, and H. V. Simhadri. 2016. Write-Avoiding Algorithms. In 2016 IEEE Int'l Parallel and Distributed Processing Symposium (IPDPS). 648--658.Google Scholar
- Swiss National Supercomputing Centre. 2019. Piz Daint. https://www.cscs.ch/computers/piz-daint/Google Scholar
- S. Datta. 1995. Electronic Transport in Mesoscopic Systems. Cambridge Uni. Press.Google Scholar
- J. Demmel. 2013. Communication-avoiding algorithms for linear algebra and beyond. In IEEE 27th Int'l Symposium on Parallel and Distributed Processing.Google Scholar
- Oak Ridge Leadership Computing Facility. 2019. Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/Google Scholar
- J. Ferrer, C. J. Lambert, V. M. García-Suárez, D. Manrique, D. Visontai, L. Oroszlany, R. Rodríguez-Ferradás, I. Grace, S. W. D. Bailey, K. Gillemot, et al. 2014. GOLLUM: a next-generation simulation tool for electron, thermal and spin transport. New Journal of Physics 16, 9 (2014), 093029.Google ScholarCross Ref
- CEA Grenoble. 2013. TB_Sim. http://inac.cea.fr/Lsim/TBsim/Google Scholar
- C. W. Groth, M. Wimmer, A. R. Akhmerov, and X. Waintal. 2014. Kwant: a software package for quantum transport. New Journal of Physics 16, 6 (2014).Google ScholarCross Ref
- The Nanoelectronic Modeling Group and Gerhard Klimeck. 2018. NEMO5. https://engineering.purdue.edu/gekcogrp/software-projects/nemo5/Google Scholar
- W. Kohn and L. J. Sham. 1965. Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev. 140 (Nov 1965), A1133-A1138. Issue 4A.Google Scholar
- M. Luisier. 2010. A Parallel Implementation of Electron-Phonon Scattering in Nanoelectronic Devices up to 95k Cores. In SC '10: Proc. ACM/IEEE Int'l Conference for High Performance Computing, Networking, Storage and Analysis. 1--11.Google ScholarDigital Library
- M. Luisier, T. B. Boykin, G. Klimeck, and W. Fichtner. 2011. Atomistic Nanoelectronic Device Engineering with Sustained Performances Up to 1.44 PFlop/s. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, 2:1--2:11.Google Scholar
- M. Luisier, A. Schenk, W. Fichtner, and G. Klimeck. 2006. Atomistic simulation of nanowires in the sp3 d5 s* tight-binding formalism: From boundary conditions to strain calculations. Phys. Rev. B 74 (2006), 12. Issue 20.Google ScholarCross Ref
- I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra. 2016. High-Performance Matrix-Matrix Multiplications of Very Small Matrices. In Proc. 22Nd Int'l Conference on Euro-Par 2016: Parallel Processing - Volume 9833. Springer-Verlag New York, Inc., 659--671.Google Scholar
- NanoTCAD. 2017. ViDES. http://vides.nanotcad.com/vides/Google Scholar
- P. McCormick. 2019. Yin & Yang: Hardware Heterogeneity & Software Productivity. Talk at SOS23 meeting, Asheville, NC.Google Scholar
- R. Pawlik. 2016. Current CPUs produce 4 times more heat than hot plates. https://cloudandheat.com/blog/current-cpus-produce-4-times-more/Google Scholar
- E. Pop, S. Sinha, and K. E. Goodson. 2006. Heat Generation and Transport in Nanometer-Scale Transistors. Proc. IEEE 94, 8 (Aug 2006), 1587--1601.Google ScholarCross Ref
- B. Prisacari, G. Rodriguez, C. Minkenberg, and T. Hoefler. 2013. Bandwidth-optimal all-to-all exchanges in fat tree networks. In Proc. 27th Int'l ACM conference on supercomputing. ACM, 139--148.Google Scholar
- C. Stieger, A. Szabo, T. Bunjaku, and M. Luisier. 2017. Ab-initio quantum transport simulation of self-heating in single-layer 2-D materials. Journal of Applied Physics 122, 4 (2017), 045708.Google ScholarCross Ref
- A. Svizhenko, M. P. Anantram, T. R. Govindan, B. Biegel, and R. Venugopal. 2002. Two-dimensional quantum mechanical modeling of nanotransistors. Journal of Applied Physics 91, 4 (2002), 2343--2354.Google ScholarCross Ref
- Synopsys. 2019. QuantumATK. http://synopsys.com/silicon/quantumatk.htmlGoogle Scholar
- TOP500.org. 2019. TOP500 Supercomputer Sites.Google Scholar
- D. Unat et al. 2017. Trends in Data Locality Abstractions for HPC Systems. IEEE Transactions on Parallel and Distributed Systems 28, 10 (Oct 2017), 3007--3020.Google ScholarCross Ref
- J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello, T. Chassaing, and J. Hutter. 2005. Quickstep: Fast and accurate density functional calculations using a mixed Gaussian and plane waves approach. Comput. Phys. Comm. 167, 2 (2005), 103--128.Google ScholarCross Ref
- J. Wei. 2008. Challenges in Cooling Design of CPU Packages for High-Performance Servers. Heat Transfer Engineering 29, 2 (2008), 178--187.Google ScholarCross Ref
- S. Williams, A. Waterman, and D. Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM 52, 4 (2009).Google ScholarDigital Library
- A. N. Ziogas, T. Ben-Nun, G. Indalecio Fernandez, T. Schneider, M. Luisier, and T. Hoefler. 2019. Optimizing the Data Movement in Quantum Transport Simulations via Data-Centric Parallel Programming. In Proc. Int'l Conference for High Performance Computing, Networking, Storage and Analysis.Google Scholar
Index Terms
- A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations
Recommendations
Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisRaman spectroscopy provides chemical and compositional information that can serve as a structural fingerprint for various materials. Therefore, simulations of Raman spectra, including both quantum perturbation analyses and ground-state calculations, are ...
The many-body Wigner Monte Carlo method for time-dependent ab-initio quantum simulations
The aim of ab-initio approaches is the simulation of many-body quantum systems from the first principles of quantum mechanics. These methods are traditionally based on the many-body Schrodinger equation which represents an incredible mathematical ...
Dissipative quantum repeater
By implementing a quantum repeater protocol, our aim in this paper is the production of entanglement between two two-level atoms locating far from each other. To make our model close to experimental realizations, the atomic and field sources of ...
Comments