skip to main content
Skip header Section
Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd EditionJuly 2016
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-0-12-809194-4
Published:01 July 2016
Pages:
662
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

This book is an all-in-one source of information for programming the Second-Generation Intel Xeon Phi product family also called Knights Landing. The authors provide detailed and timely Knights Landingspecific details, programming advice, and real-world examples. The authors distill their years of Xeon Phi programming experience coupled with insights from many expert customers Intel Field Engineers, Application Engineers, and Technical Consulting Engineers to create this authoritative book on the essentials of programming for Intel Xeon Phi products. Intel Xeon Phi Processor High-Performance Programming is useful even before you ever program a system with an Intel Xeon Phi processor. To help ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi processors, or other high-performance microprocessors. Applying these techniques will generally increase your program performance on any system and prepare you better for Intel Xeon Phi processors. A practical guide to the essentials for programming Intel Xeon Phi processors Definitive coverage of the Knights Landing architecture Presents best practices for portable, high-performance computing and a familiar and proven threads and vectors programming model Includes real world code examples that highlight usages of the unique aspects of this new highly parallel and high-performance computational productCovers use of MCDRAM, AVX-512, Intel Omni-Path fabric, many-cores (up to 72), and many threads (4 per core)Covers software developer tools, libraries and programming modelsCovers using Knights Landing as a processor and a coprocessor

Cited By

  1. Bylina B, Bylina J, Chabudziński Ł, Karpowicz K, Klisowski M, Oleszczuk P, Potiopa J and Stpiczyński P (2024). Fast slope algorithm with the use of vectorization and parallelization for multicore architectures, Geoinformatica, 28:1, (145-175), Online publication date: 1-Jan-2024.
  2. Park Y, Kim R, Nguyen T and Choi J (2023). Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors, Cluster Computing, 26:5, (2539-2549), Online publication date: 1-Oct-2023.
  3. Arunachalam A, Kundu S, Raha A, Banerjee S, Natarajan S and Basu K (2023). A Novel Low-Power Compression Scheme for Systolic Array-Based Deep Learning Accelerators, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42:4, (1085-1098), Online publication date: 1-Apr-2023.
  4. Silva R and Sobral J (2023). Efficient High-Level Programming in Plain Java, International Journal of Parallel Programming, 51:1, (22-42), Online publication date: 1-Feb-2023.
  5. Behnam P and Bojnordi M (2022). Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth Memory, IEEE Transactions on Computers, 71:10, (2675-2686), Online publication date: 1-Oct-2022.
  6. Dmitruk B and Stpiczyński P Parallel Vectorized Implementations of Compensated Summation Algorithms Parallel Processing and Applied Mathematics, (63-74)
  7. ACM
    Krishnan G, Mandal S, Chakrabarti C, Seo J, Ogras U and Cao Y (2021). Impact of On-chip Interconnect on In-memory Acceleration of Deep Neural Networks, ACM Journal on Emerging Technologies in Computing Systems, 18:2, (1-22), Online publication date: 30-Apr-2022.
  8. ACM
    Krishnan G, Mandal S, Pannala M, Chakrabarti C, Seo J, Ogras U and Cao Y (2021). SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks, ACM Transactions on Embedded Computing Systems, 20:5s, (1-24), Online publication date: 31-Oct-2021.
  9. Mandal S, Ayoub R, Kishinevsky M, Islam M and Ogras U (2021). Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic, IEEE Embedded Systems Letters, 13:3, (98-101), Online publication date: 1-Sep-2021.
  10. Szustak L, Wyrzykowski R, Olas T and Mele V (2020). Correlation of Performance Optimizations and Energy Consumption for Stencil-Based Application on Intel Xeon Scalable Processors, IEEE Transactions on Parallel and Distributed Systems, 31:11, (2582-2593), Online publication date: 1-Nov-2020.
  11. Sanz V, Pousa A, Naiouf M and De Giusti A Accelerating Pattern Matching on Intel Xeon Phi Processors Algorithms and Architectures for Parallel Processing, (262-274)
  12. Mazumdar S and Scionti A (2019). Ring-mesh: a scalable and high-performance approach for manycore accelerators, The Journal of Supercomputing, 76:9, (6720-6752), Online publication date: 1-Sep-2020.
  13. Dmitruk B and Stpiczyński P High Performance Portable Solver for Tridiagonal Toeplitz Systems of Linear Equations Euro-Par 2020: Parallel Processing Workshops, (172-184)
  14. Behnam P and Bojnordi M RedCache Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference, (1-6)
  15. Herruzo J, González-Navarro S, Ibáñez-Marín P, Viñals-Yúfera V, Alastruey-Benedé J and Plata O (2020). Accelerating Sequence Alignments Based on FM-Index Using the Intel KNL Processor, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17:4, (1093-1104), Online publication date: 1-Jul-2020.
  16. Arima E and Schulz M Pattern-Aware Staging for Hybrid Memory Systems High Performance Computing, (474-495)
  17. Arima E, Hanawa T, Trinitis C and Schulz M Footprint-Aware Power Capping for Hybrid Memory Based Systems High Performance Computing, (347-369)
  18. Choi J, Park G and Nam D (2019). Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining, Cluster Computing, 23:1, (57-69), Online publication date: 1-Mar-2020.
  19. Dongarra J, Tourancheau B, Kim J and Vetter J (2020). Implementing efficient data compression and encryption in a persistent key-value store for HPC, International Journal of High Performance Computing Applications, 33:6, (1098-1112), Online publication date: 1-Nov-2019.
  20. ACM
    Kronbichler M and Kormann K (2019). Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators, ACM Transactions on Mathematical Software, 45:3, (1-40), Online publication date: 30-Sep-2019.
  21. Royuela S, Serrano M, Garcia-Gasulla M, Mateo Bellido S, Labarta J and Quiñones E The Cooperative Parallel: A Discussion About Run-Time Schedulers for Nested Parallelism OpenMP: Conquering the Full Hardware Spectrum, (171-185)
  22. ACM
    Jin Z and Finkel H Simulation of Random Network of Hodgkin and Huxley Neurons with Exponential Synaptic Conductances on an FPGA Platform Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, (653-657)
  23. ACM
    Feng Z, Qiu S, Wang L and Luo Q Accelerating Long Read Alignment on Three Processors Proceedings of the 48th International Conference on Parallel Processing, (1-10)
  24. ACM
    Zlateski A, Jia Z, Li K and Durand F The anatomy of efficient FFT and winograd convolutions on modern CPUs Proceedings of the ACM International Conference on Supercomputing, (414-424)
  25. ACM
    Perdacher M, Plant C and Böhm C Cache-oblivious High-performance Similarity Join Proceedings of the 2019 International Conference on Management of Data, (87-104)
  26. ACM
    Kronbichler M and Ljungkvist K (2019). Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors, ACM Transactions on Parallel Computing, 6:1, (1-32), Online publication date: 24-Jun-2019.
  27. ACM
    Radulovic M, Sánchez Verdejo R, Carpenter P, Radojković P, Jacob B and Ayguadé E (2019). PROFET, Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3:2, (1-33), Online publication date: 19-Jun-2019.
  28. ACM
    Thaler F, Moosbrugger S, Osuna C, Bianco M, Vogt H, Afanasyev A, Mosimann L, Fuhrer O, Schulthess T and Hoefler T Porting the COSMO Weather Model to Manycore CPUs Proceedings of the Platform for Advanced Scientific Computing Conference, (1-11)
  29. ACM
    Sakai Y, Mendez S, Strandenes H, Ohlerich M, Pasichnyk I, Allalen M and Manhart M Performance Optimisation of the Parallel CFD Code MGLET across Different HPC Platforms Proceedings of the Platform for Advanced Scientific Computing Conference, (1-13)
  30. Laso R, Rivera F and Cabaleiro J Influence of Architectural Features of the SNC-4 Mode of the Intel Xeon Phi KNL on Matrix Multiplication Computational Science – ICCS 2019, (483-490)
  31. ACM
    Horro M, Kandemir M, Pouchet L, Rodríguez G and Touriño J Effect of Distributed Directories in Mesh Interconnects Proceedings of the 56th Annual Design Automation Conference 2019, (1-6)
  32. Mencagli G, França F, Bentes C, Justen Marzulo L, Lima Pilla M, Wyrzykowski R, Deelman E, Szustak L and Bratek P (2020). Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors, International Journal of High Performance Computing Applications, 33:3, (534-553), Online publication date: 1-May-2019.
  33. ACM
    Miao H, Jeon M, Pekhimenko G, McKinley K and Lin F StreamBox-HBM Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, (167-181)
  34. ACM
    Jaleel A, Ebrahimi E and Duncan S (2019). DUCATI, ACM Transactions on Architecture and Code Optimization, 16:1, (1-24), Online publication date: 8-Mar-2019.
  35. ACM
    Liu Y, Hong D, Wu J, Fu S and Hsu W (2019). Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation, ACM Transactions on Architecture and Code Optimization, 16:1, (1-24), Online publication date: 8-Mar-2019.
  36. Park G, Rho S, Kim J and Nam D (2019). Towards optimal scheduling policy for heterogeneous memory architecture in many-core system, Cluster Computing, 22:1, (121-133), Online publication date: 1-Mar-2019.
  37. ACM
    Kim R, Choi J and Lee M Optimizing parallel GEMM routines using auto-tuning with Intel AVX-512 Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, (101-110)
  38. ACM
    Zhao H, Chen Q, Qiu Y, Wu M, Shen Y, Leng J, Li C and Guo M (2018). Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory, ACM Transactions on Architecture and Code Optimization, 15:4, (1-26), Online publication date: 31-Dec-2019.
  39. Lim R, Lee Y, Kim R and Choi J (2018). An implementation of matrix---matrix multiplication on the Intel KNL processor with AVX-512, Cluster Computing, 21:4, (1785-1795), Online publication date: 1-Dec-2018.
  40. Huang H and Chow E Accelerating quantum chemistry with vectorized and batched integrals Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-14)
  41. Malakar P, Munson T, Knight C, Vishwanath V and Papka M Topology-aware space-shared co-analysis of large-scale molecular dynamics simulations Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-15)
  42. Huang H and Chow E Accelerating quantum chemistry with vectorized and batched integrals Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-14)
  43. Malakar P, Munson T, Knight C, Vishwanath V and Papka M Topology-aware space-shared co-analysis of large-scale molecular dynamics simulations Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-15)
  44. ACM
    Peng Z, Powell A, Wu B, Bicer T and Ren B Graphphi Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, (1-14)
  45. ACM
    Ruhela A, Subramoni H, Chakraborty S, Bayatpour M, Kousha P and Panda D Efficient Asynchronous Communication Progress for MPI without Dedicated Resources Proceedings of the 25th European MPI Users' Group Meeting, (1-11)
  46. ACM
    Bouter A, Alderliesten T, Bel A, Witteveen C and Bosman P Large-scale parallelization of partial evaluations in evolutionary algorithms for real-world problems Proceedings of the Genetic and Evolutionary Computation Conference, (1199-1206)
  47. ACM
    Jin Z and Finkel H Nuclear Reactor Simulation on OpenCL FPGA Proceedings of the International Workshop on OpenCL, (1-9)
  48. ACM
    Kynigos M, Navaridas J, Plana L and Furber S Network-on-chip evaluation for a novel neural architecture Proceedings of the 15th ACM International Conference on Computing Frontiers, (216-219)
  49. Stpiczyński P (2018). Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus, The Journal of Supercomputing, 74:4, (1461-1472), Online publication date: 1-Apr-2018.
  50. ACM
    Jia Z, Zlateski A, Durand F and Li K (2018). Optimizing N-dimensional, winograd-based convolution for manycore CPUs, ACM SIGPLAN Notices, 53:1, (109-123), Online publication date: 23-Mar-2018.
  51. ACM
    Jia Z, Zlateski A, Durand F and Li K Optimizing N-dimensional, winograd-based convolution for manycore CPUs Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (109-123)
  52. Cai Y, Li G and Liu W (2018). Parallelized implementation of an explicit finite element method in many integrated core (MIC) architecture, Advances in Engineering Software, 116:C, (50-59), Online publication date: 1-Feb-2018.
  53. Stpiczyński P (2018). Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures, The Journal of Supercomputing, 74:2, (936-952), Online publication date: 1-Feb-2018.
  54. ACM
    Lim R, Lee Y, Kim R and Choi J OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing Proceedings of Workshops of HPC Asia, (63-66)
  55. ACM
    Malakar P, Knight C, Munson T, Vishwanath V and Papka M Scalable In situ Analysis of Molecular Dynamics Simulations Proceedings of the In Situ Infrastructures on Enabling Extreme-Scale Analysis and Visualization, (1-6)
  56. ACM
    Li A, Liu W, Kristensen M, Vinter B, Wang H, Hou K, Marquez A and Song S Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-14)
  57. ACM
    Cheng X, He B, Du X and Lau C A Study of Main-Memory Hash Joins on Many-core Processor Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (657-666)
  58. ACM
    Yu X, Hughes C, Satish N, Mutlu O and Devadas S Banshee Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, (1-14)
  59. ACM
    Tang X, Kislal O, Kandemir M and Karakoy M Data movement aware computation partitioning Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, (730-744)
  60. Egawa R, Komatsu K, Momose S, Isobe Y, Musa A, Takizawa H and Kobayashi H (2017). Potential of a modern vector supercomputer for practical applications, The Journal of Supercomputing, 73:9, (3948-3976), Online publication date: 1-Sep-2017.
  61. ACM
    Grelck C and Sarris N Towards Compiling SAC for the Xeon Phi Knights Corner and Knights Landing Architectures Proceedings of the 29th Symposium on the Implementation and Application of Functional Programming Languages, (1-12)
  62. ACM
    Holmen J, Humphrey A, Sunderland D and Berzins M Improving Uintah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, (1-8)
  63. ACM
    Ouermi T, Knoll A, Kirby R and Berzins M OpenMP 4 Fortran Modernization of WSM6 for KNL Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, (1-8)
  64. ACM
    Arora R and Koesterke L Interactive Code Adaptation Tool for Modernizing Applications for Intel Knights Landing Processors Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, (1-8)
  65. Zapletal J, Merta M and Mal L (2017). Boundary element quadrature schemes for multi- and many-core architectures, Computers & Mathematics with Applications, 74:1, (157-173), Online publication date: 1-Jul-2017.
  66. Kronbichler M, Kormann K, Pasichnyk I and Allalen M Fast Matrix-Free Discontinuous Galerkin Kernels on Modern Computer Architectures High Performance Computing, (237-255)
  67. ACM
    Zlateski A and Seung H Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs Proceedings of the International Conference on Supercomputing, (1-10)
  68. Lawson G, Sosonkina M, Ezer T and Shen Y Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi Processors Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (1000-1008)
  69. ACM
    Zivanovic D, Pavlovic M, Radulovic M, Shin H, Son J, Mckee S, Carpenter P, Radojković P and Ayguadé E (2017). Main Memory in HPC, ACM Transactions on Architecture and Code Optimization, 14:1, (1-26), Online publication date: 31-Mar-2017.
  70. Khaldi D and Chapman B Towards automatic HBM allocation using LLVM Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC, (12-20)
  71. Lawson G, Sundriyal V, Sosonkina M and Shen Y Runtime power limiting of parallel applications on Intel Xeon Phi processors Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, (39-45)
Contributors
  • University of Massachusetts Lowell
  • Intel Corporation
  • Intel Corporation

Recommendations