skip to main content
Skip header Section
Programming Massively Parallel Processors: A Hands-on ApproachFebruary 2010
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-0-12-381472-2
Published:05 February 2010
Pages:
280
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Multi-core processors are no longer the future of computing-they are the present day reality. A typical mass-produced CPU features multiple processor cores, while a GPU (Graphics Processing Unit) may have hundreds or even thousands of cores. With the rise of multi-core architectures has come the need to teach advanced programmers a new and essential skill: how to program massively parallel processors.Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing.Utilizes CUDA (Compute Unified Device Architecture), NVIDIA's software development tool created specifically for massively parallel environments.Shows you how to achieve both high-performance and high-reliability using the CUDA programming model as well as OpenCL.

Cited By

  1. Faqir-Rhazoui Y and García C (2023). Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures, The Journal of Supercomputing, 79:16, (18480-18506), Online publication date: 1-Nov-2023.
  2. Valdez A, Wee F, Odasco A, Rey M and Cabarle F (2023). GPU simulations of spiking neural P systems on modern web browsers, Natural Computing: an international journal, 22:1, (171-180), Online publication date: 1-Mar-2023.
  3. Khairy M, Alawneh A, Barnes A and Rogers T SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture, (441-463)
  4. Gulo C, Sementille A and Tavares J (2022). Optimizing a medical image registration algorithm based on profiling data for real-time performance, Multimedia Tools and Applications, 81:2, (2603-2620), Online publication date: 1-Jan-2022.
  5. Strubytska I and Strubytskyi P (2021). Efficiency of Parallelization Using GPU in Discrete Dynamic Models Construction Process, SN Computer Science, 2:3, Online publication date: 1-May-2021.
  6. Li L and Chen X (2019). Optimization of kernel learning algorithm based on parallel architecture, Computing, 102:8, (1881-1907), Online publication date: 1-Aug-2020.
  7. Biswas B, Ghosh S and Ghosh A (2019). A novel CT image segmentation algorithm using PCNN and Sobolev gradient methods in GPU frameworks, Pattern Analysis & Applications, 23:2, (837-854), Online publication date: 1-May-2020.
  8. Do C, Choi H, Chung S and Kim C (2019). A novel warp scheduling scheme considering long-latency operations for high-performance GPUs, The Journal of Supercomputing, 76:4, (3043-3062), Online publication date: 1-Apr-2020.
  9. Sang J, Lee C, Rego V and King C (2019). Experiences with implementing parallel discrete-event simulation on GPU, The Journal of Supercomputing, 75:8, (4132-4149), Online publication date: 1-Aug-2019.
  10. Kakooei M and Tabatabaei A (2019). A Fast Parallel GPS Acquisition Algorithm Based on Hybrid GPU and Multi-core CPU, Wireless Personal Communications: An International Journal, 104:4, (1355-1366), Online publication date: 1-Feb-2019.
  11. Jing S, Li G, Zeng K, Pan W and Liu C (2018). Efficient parallel algorithm for computing rough set approximation on GPU, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 22:22, (7553-7569), Online publication date: 1-Nov-2018.
  12. Nie B, Yang L, Jog A and Smirni E Fault site pruning for practical reliability analysis of GPGPU applications Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, (749-761)
  13. Li X, Wu C, Dong S, Dy J and Kaeli D Interactive kernel dimension alternative clustering on GPUs Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (885-892)
  14. ACM
    Bliss N, Duff T, Leykin A and Sommars J Monodromy Solver Proceedings of the 2018 ACM International Symposium on Symbolic and Algebraic Computation, (87-94)
  15. Cuenca C, González E, Trujillo A, Esclarín J, Mazorra L, Alvarez L, Martínez-Mera J, Tahoces P and Carreira J (2018). Fast and accurate circle tracking using active contour models, Journal of Real-Time Image Processing, 14:4, (793-802), Online publication date: 1-Apr-2018.
  16. Jiang B (2018). Real-time multi-resolution edge detection with pattern analysis on graphics processing unit, Journal of Real-Time Image Processing, 14:2, (293-321), Online publication date: 1-Feb-2018.
  17. Wan H, Gao X, Long X and Jiang B Introducing parallel computing concepts in computer system related courses 2017 IEEE Frontiers in Education Conference (FIE), (1-7)
  18. Salvador J, Ruiz Z and Garcia-Rodriguez J (2017). A Review of Infrastructures to Process Big Multimedia Data, International Journal of Computer Vision and Image Processing, 7:3, (54-64), Online publication date: 1-Jul-2017.
  19. Klionskiy D, Kaplun D, Kupriyanov M, Dorokhov A, Geppener V and Golubkov A (2017). Vibrational and hydroacoustic signal processing in the frequency domain and its software-hardware implementation, Pattern Recognition and Image Analysis, 27:3, (588-598), Online publication date: 1-Jul-2017.
  20. (2017). High Performance Computing education in an Indian engineering institute, Journal of Parallel and Distributed Computing, 105:C, (73-82), Online publication date: 1-Jul-2017.
  21. ACM
    Georgiev P, Lane N, Mascolo C and Chu D Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, (306-318)
  22. Abdelkafi O, Idoumghar L, Lepagnot J and Paillaud J MEmory Genetic Algorithm Hybridized for Zeolites 2017 IEEE Congress on Evolutionary Computation (CEC), (233-240)
  23. Wang G, Cai X, Ju L, Zang C, Zhao M and Jia Z Shared last-level cache management for GPGPUs with hybrid main memory Proceedings of the Conference on Design, Automation & Test in Europe, (25-30)
  24. Spataro D, D'Ambrosio D, Filippone G, Rongo R, Spataro W and Marocco D (2017). The new SCIARA-fv3 numerical model and acceleration by GPGPU strategies, International Journal of High Performance Computing Applications, 31:2, (163-176), Online publication date: 1-Mar-2017.
  25. Shehab E, Algergawy A and Sarhan A (2017). Accelerating relational database operations using both CPU and GPU co-processor, Computers and Electrical Engineering, 57:C, (69-80), Online publication date: 1-Jan-2017.
  26. Fernández E, Aguerre J, Beckers B and Besuievsky G Optimizing window shape for daylighting Proceedings of the Eurographics Workshop on Urban Data Modelling and Visualisation, (37-43)
  27. ACM
    Yoon M, Kim K, Lee S, Ro W and Annavaram M (2016). Virtual thread, ACM SIGARCH Computer Architecture News, 44:3, (609-621), Online publication date: 12-Oct-2016.
  28. Torun M, Yilmaz O and Akansu A (2016). FPGA, GPU, and CPU implementations of Jacobi algorithm for eigenanalysis, Journal of Parallel and Distributed Computing, 96:C, (172-180), Online publication date: 1-Oct-2016.
  29. Maximo A (2016). Efficient finite impulse response filters in massively-parallel recursive systems, Journal of Real-Time Image Processing, 12:3, (603-611), Online publication date: 1-Oct-2016.
  30. Su L, Huang Y, Gibeaut J and Li L (2016). The index array approach and the dual tiled similarity algorithm for UAS hyper-spatial image processing, Geoinformatica, 20:4, (859-878), Online publication date: 1-Oct-2016.
  31. ACM
    Gounalakis O, Lytos A and Dasygenis M Leveraging Parallelization Opportunities by an Online CAD Tool Proceedings of the SouthEast European Design Automation, Computer Engineering, Computer Networks and Social Media Conference, (25-31)
  32. ACM
    Silberstein M, Kim S, Huh S, Zhang X, Hu Y, Wated A and Witchel E (2016). GPUnet, ACM Transactions on Computer Systems, 34:3, (1-31), Online publication date: 17-Sep-2016.
  33. ACM
    Cruz L and Ramos E (2016). General Template Units for the Finite Volume Method in Box-Shaped Domains, ACM Transactions on Mathematical Software, 43:1, (1-32), Online publication date: 29-Aug-2016.
  34. Khan A, Al-Mouhamed M, Fatayer A and Mohammad N (2016). Optimizing the Matrix Multiplication Using Strassen and Winograd Algorithms with Limited Recursions on Many-Core, International Journal of Parallel Programming, 44:4, (801-830), Online publication date: 1-Aug-2016.
  35. Alhadeff A, Leon S, Celes W and Paulino G (2016). Massively parallel adaptive mesh refinement and coarsening for dynamic fracture simulations, Engineering with Computers, 32:3, (533-552), Online publication date: 1-Jul-2016.
  36. ACM
    Jog A, Kayiran O, Pattnaik A, Kandemir M, Mutlu O, Iyer R and Das C (2016). Exploiting Core Criticality for Enhanced GPU Performance, ACM SIGMETRICS Performance Evaluation Review, 44:1, (351-363), Online publication date: 30-Jun-2016.
  37. Roźciszewski P, Czarnul P, Lewandowski R and Schally-Kacprzak M (2016). KernelHive, Concurrency and Computation: Practice & Experience, 28:9, (2586-2607), Online publication date: 25-Jun-2016.
  38. Yoon M, Kim K, Lee S, Ro W and Annavaram M Virtual thread Proceedings of the 43rd International Symposium on Computer Architecture, (609-621)
  39. ACM
    Abdelfattah A, Keyes D and Ltaief H (2016). KBLAS, ACM Transactions on Mathematical Software, 42:3, (1-31), Online publication date: 15-Jun-2016.
  40. ACM
    Jog A, Kayiran O, Pattnaik A, Kandemir M, Mutlu O, Iyer R and Das C Exploiting Core Criticality for Enhanced GPU Performance Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, (351-363)
  41. Lamas-Rodríguez J, Heras D, Argüello F, Kainmueller D, Zachow S and Bóo M (2016). GPU-accelerated level-set segmentation, Journal of Real-Time Image Processing, 12:1, (15-29), Online publication date: 1-Jun-2016.
  42. Kawakatsu T, Kinoshita A, Takasu A and Adachi J Divide-and-Conquer Parallelism for Learning Mixture Models Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVIII - Volume 9940, (23-47)
  43. Codreanu V, Dröge B, Williams D, Yasar B, Yang P, Liu B, Dong F, Surinta O, Schomaker L, Roerdink J and Wiering M (2016). Evaluating automatically parallelized versions of the support vector machine, Concurrency and Computation: Practice & Experience, 28:7, (2274-2294), Online publication date: 1-May-2016.
  44. ACM
    Pereira P, Albuquerque H, Marques H, Silva I, Carvalho C, Cordeiro L, Santos V and Ferreira R Verifying CUDA programs using SMT-based context-bounded model checking Proceedings of the 31st Annual ACM Symposium on Applied Computing, (1648-1653)
  45. ACM
    Zeno L, Mendelson A and Silberstein M GPUpIO Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, (63-71)
  46. Mansouri F, Huet S and Houzet D (2016). A domain-specific high-level programming model, Concurrency and Computation: Practice & Experience, 28:3, (750-767), Online publication date: 10-Mar-2016.
  47. Sengupta P, Nguyen J, Kwan J, Menon P, Heien E and Rundle J (2015). Accelerating earthquake simulations on general-purpose graphics processors, Concurrency and Computation: Practice & Experience, 27:17, (5460-5471), Online publication date: 10-Dec-2015.
  48. ACM
    Neelima B and Li J Introducing high performance computing concepts into engineering undergraduate curriculum Proceedings of the Workshop on Education for High-Performance Computing, (1-8)
  49. ACM
    Jo Y, Kim S and Bae D Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (1261-1270)
  50. Agosta G, Barenghi A, Di Federico A and Pelosi G (2015). OpenCL performance portability for general-purpose computation on graphics processor units, Concurrency and Computation: Practice & Experience, 27:14, (3633-3660), Online publication date: 25-Sep-2015.
  51. Soares T, Xavier M, Pigozzo A, Campos R, Santos R and Lobosco M Performance Evaluation of a Human Immune System Simulator on a GPU Cluster Proceedings of the 13th International Conference on Parallel Computing Technologies - Volume 9251, (458-468)
  52. ACM
    Bailey M Fundamentals seminar ACM SIGGRAPH 2015 Courses, (1-129)
  53. Grelck C Single Assignment C (SAC) Central European Functional Programming School, (207-282)
  54. Martínez-del-Amor M, García-Quismondo M, Macías-Ramos L, Valencia-Cabrera L, Riscos-Núòez A and Pérez-Jiménez M (2015). Simulating P Systems on GPU Devices, Fundamenta Informaticae, 136:3, (269-284), Online publication date: 1-Jul-2015.
  55. Orts-Escolano S, Garcia-Rodriguez J, Serra-Perez J, Jimeno-Morenilla A, Garcia-Garcia A, Morell V and Cazorla M (2015). 3D model reconstruction using neural gas accelerated on GPU, Applied Soft Computing, 32:C, (87-100), Online publication date: 1-Jul-2015.
  56. Moore N, Leeser M and King L (2015). Kernel Specialization Provides Adaptable GPU Code for Particle Image Velocimetry, IEEE Transactions on Parallel and Distributed Systems, 26:4, (1049-1058), Online publication date: 1-Apr-2015.
  57. Xue-Xin Liu , Kuangya Zhai , Zao Liu , Kai He , Tan S and Wenjian Yu (2015). Parallel Thermal Analysis of 3-D Integrated Circuits With Liquid Cooling on CPU-GPU Platforms, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23:3, (575-579), Online publication date: 1-Mar-2015.
  58. (2015). Novel 3D GPU based numerical parallel diffusion algorithms in cylindrical coordinates for health care simulation, Mathematics and Computers in Simulation, 109:C, (1-19), Online publication date: 1-Mar-2015.
  59. ACM
    Sarkar S and Mitra S A Profile Guided Approach to Optimize Branch Divergence While Transforming Applications for GPUs Proceedings of the 8th India Software Engineering Conference, (176-185)
  60. Li C, Yang Y, Lin Z and Zhou H Automatic data placement into GPU on-chip memory resources Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (23-33)
  61. ACM
    Khairy M, Zahran M and Wassal A Efficient utilization of GPGPU cache hierarchy Proceedings of the 8th Workshop on General Purpose Processing using GPUs, (36-47)
  62. Pereira M and Cruvinel P (2015). A model for soil computed tomography based on volumetric reconstruction, Wiener filtering and parallel processing, Computers and Electronics in Agriculture, 111:C, (151-163), Online publication date: 1-Feb-2015.
  63. Abdellah M, Eldeib A and Sharawi A (2015). High performance GPU-Based fourier volume rendering, Journal of Biomedical Imaging, 2015, (2-2), Online publication date: 1-Jan-2015.
  64. Kim G, Lee M, Jeong J and Kim J Multi-GPU System Design with Memory Networks Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, (484-495)
  65. White A and Lee S (2014). Derivation of optimal input parameters for minimizing execution time of matrix-based computations on a GPU, Parallel Computing, 40:10, (628-645), Online publication date: 1-Dec-2014.
  66. ACM
    Souza A, Macedo M and Apolinário A Multi-frame adaptive non-rigid registration for markerless augmented reality Proceedings of the 13th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry, (7-16)
  67. ACM
    Silberstein M, Ford B and Witchel E (2014). GPUfs, Communications of the ACM, 57:12, (68-79), Online publication date: 26-Nov-2014.
  68. ACM
    Taskov B Optimizing large scale CUDA applications using input data specific optimizations Proceedings of the 11th European Conference on Visual Media Production, (1-6)
  69. ACM
    Zhang Y and Eick C Novel clustering and analysis techniques for mining spatio-temporal data Proceedings of the 1st ACM SIGSPATIAL PhD Workshop, (1-5)
  70. Kim S, Huh S, Hu Y, Zhang X, Witchel E, Wated A and Silberstein M GPUnet Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation, (201-216)
  71. ACM
    Jo Y, Kim S and Bae D GPU-based matrix multiplication methods for social networks analysis Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, (309-313)
  72. Dai Y, He D, Fang Y and Yang L (2014). Accelerating 2D orthogonal matching pursuit algorithm on GPU, The Journal of Supercomputing, 69:3, (1363-1381), Online publication date: 1-Sep-2014.
  73. Mansouri F, Huet S and Houzet D A Visual Programming Model to Implement Coarse-Grained DSP Applications on Parallel and Heterogeneous Clusters Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805, (141-152)
  74. ACM
    Jablin J, Jablin T, Mutlu O and Herlihy M Warp-aware trace scheduling for GPUs Proceedings of the 23rd international conference on Parallel architectures and compilation, (163-174)
  75. ACM
    Silberstein M (2014). GPUs: High-performance Accelerators for Parallel Applications, Ubiquity, 2014:August, (1-13), Online publication date: 1-Aug-2014.
  76. ACM
    Olaya J and Romero R Runtime Pipeline Scheduling System for Heterogeneous Architectures Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, (1-7)
  77. ACM
    Chaparala A, Novoa C and Qasem A A SIMD Solution for the Quadratic Assignment Problem with GPU Acceleration Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, (1-8)
  78. Chang L, El-Araby E, Dang V and Dao L (2014). GPU acceleration of nonlinear diffusion tensor estimation using CUDA and MPI, Neurocomputing, 135:C, (328-338), Online publication date: 5-Jul-2014.
  79. Oden L, Klenk B and Fröning H Energy-efficient collective reduce and allreduce operations on distributed GPUs Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (483-492)
  80. ACM
    Whetstone B, Limpasuvan V and Larkins D GPU Acceleration of the Advanced Regional Prediction System (ARPS) Proceedings of the 2014 ACM Southeast Regional Conference, (1-6)
  81. Gomez L, Cappello F, Carro L, DeBardeleben N, Fang B, Gurumurthi S, Pattabiraman K, Rech P and Reorda M GPGPUs Proceedings of the conference on Design, Automation & Test in Europe, (1-9)
  82. ACM
    Jog A, Bolotin E, Guz Z, Parker M, Keckler S, Kandemir M and Das C Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications Proceedings of Workshop on General Purpose Processing Using GPUs, (1-8)
  83. Jog A, Bolotin E, Guz Z, Parker M, Keckler S, Kandemir M and Das C Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications Proceedings of Workshop on General Purpose Processing Using GPUs, (1-8)
  84. Ozsoy A, Swany M and Chauhan A (2014). Optimizing LZSS compression on GPGPUs, Future Generation Computer Systems, 30:C, (170-178), Online publication date: 1-Jan-2014.
  85. Niemeyer K and Sung C (2014). Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs, Journal of Computational Physics, 256:C, (854-871), Online publication date: 1-Jan-2014.
  86. Stanek S, Tavanapong W, Wong J, Oh J, Nawarathna R, Muthukudage J and de Groen P (2013). SAPPHIRE, Computer Methods and Programs in Biomedicine, 112:3, (407-421), Online publication date: 1-Dec-2013.
  87. Paz I, Hernández Gress N and González Mendoza M Pattern Recognition with Spiking Neural Networks Proceedings of the 12th Mexican International Conference on Advances in Soft Computing and Its Applications - Volume 8266, (279-288)
  88. ACM
    Cassagnes A, Chen Y and Ohashi H Heterogeneous COS pricing of rainbow options Proceedings of the 6th Workshop on High Performance Computational Finance, (1-7)
  89. Kayıran O, Jog A, Kandemir M and Das C Neither more nor less Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, (157-166)
  90. Miranda N, Chávez E, Piccoli M and Reyes N Very Fast All k-Nearest Neighbors in Metric and Non Metric Spaces without Indexing Proceedings of the 6th International Conference on Similarity Search and Applications - Volume 8199, (300-311)
  91. Martínez-Zarzuela M, Gómez C, Díaz-Pernas F, Fernández A and Hornero R (2013). Cross-Approximate Entropy parallel computation on GPUs for biomedical signal analysis. Application to MEG recordings, Computer Methods and Programs in Biomedicine, 112:1, (189-199), Online publication date: 1-Oct-2013.
  92. ACM
    Proctor A, Stevens C and Cho S GPU-Optimized Hybrid Neighbor/Cell List Algorithm for Coarse-Grained MD Simulations of Protein and RNA Folding and Assembly Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics, (633-640)
  93. ACM
    Cinque L, Dondi P and Lombardi L Automatic selection of regions of interest in a video by a depth-color image matting Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications, (1-8)
  94. ACM
    Jog A, Kayiran O, Mishra A, Kandemir M, Mutlu O, Iyer R and Das C (2013). Orchestrated scheduling and prefetching for GPGPUs, ACM SIGARCH Computer Architecture News, 41:3, (332-343), Online publication date: 26-Jun-2013.
  95. ACM
    Jog A, Kayiran O, Mishra A, Kandemir M, Mutlu O, Iyer R and Das C Orchestrated scheduling and prefetching for GPGPUs Proceedings of the 40th Annual International Symposium on Computer Architecture, (332-343)
  96. ACM
    Rech P, Pilla L, Silvestri F, Navaux P and Carro L Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, (13-20)
  97. Demir I and Westermann R Progressive high-quality response surfaces for visually guided sensitivity analysis Proceedings of the 15th Eurographics Conference on Visualization, (21-30)
  98. Calazan R, Nedjah N and de Macedo Mourelle L Three alternatives for parallel GPU-based implementations of high performance particle swarm optimization Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I, (241-252)
  99. ACM
    Jog A, Kayiran O, Chidambaram Nachiappan N, Mishra A, Kandemir M, Mutlu O, Iyer R and Das C (2013). OWL, ACM SIGPLAN Notices, 48:4, (395-406), Online publication date: 23-Apr-2013.
  100. ACM
    Jog A, Kayiran O, Chidambaram Nachiappan N, Mishra A, Kandemir M, Mutlu O, Iyer R and Das C (2013). OWL, ACM SIGARCH Computer Architecture News, 41:1, (395-406), Online publication date: 29-Mar-2013.
  101. ACM
    Jog A, Kayiran O, Chidambaram Nachiappan N, Mishra A, Kandemir M, Mutlu O, Iyer R and Das C OWL Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, (395-406)
  102. Prakash A, Chaudhury A and Ramachandran R (2013). Parallel simulation of population balance model-based particulate processes using multicore CPUs and GPUs, Modelling and Simulation in Engineering, 2013, (2-2), Online publication date: 1-Jan-2013.
  103. ACM
    Khan M, Basu P, Rudy G, Hall M, Chen C and Chame J (2013). A script-based autotuning compiler system to generate high-performance CUDA code, ACM Transactions on Architecture and Code Optimization, 9:4, (1-25), Online publication date: 1-Jan-2013.
  104. Jiménez J and Ruiz de Miras J (2012). Fast box-counting algorithm on GPU, Computer Methods and Programs in Biomedicine, 108:3, (1229-1242), Online publication date: 1-Dec-2012.
  105. Rietmann M, Messmer P, Nissen-Meyer T, Peter D, Basini P, Komatitsch D, Schenk O, Tromp J, Boschi L and Giardini D Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  106. Li P, Li G and Gopalakrishnan G Parametric flows Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-10)
  107. ACM
    Hawe G, Coates G, Wilson D and Crouch R (2012). Agent-based simulation for large-scale emergency response, ACM Computing Surveys, 45:1, (1-51), Online publication date: 1-Nov-2012.
  108. O'Rourke J and Burns J CUDA-Enabled Optimisation of Technical Analysis Parameters Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications, (221-227)
  109. Mache J and Karavanic K (2012). Teaching parallelism with GPUS and a Game of life assignment, Journal of Computing Sciences in Colleges, 28:1, (200-202), Online publication date: 1-Oct-2012.
  110. Sabino T, Andrade P, Gonzales Clua E, Montenegro A and Pagliosa P A hybrid GPU rasterized and ray traced rendering pipeline for real time rendering of per pixel effects Proceedings of the 11th international conference on Entertainment Computing, (292-305)
  111. ACM
    Tan J and Fu X RISE Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (191-200)
  112. Steuwer M, Gorlatch S, Buß M and Breuer S Using the SkelCL library for high-level GPU programming of 2d applications Proceedings of the 18th international conference on Parallel processing workshops, (370-380)
  113. Cotronis Y, Konstantinidis E, Louka M and Missirlis N Parallel SOR for solving the convection diffusion equation using GPUs with CUDA Proceedings of the 18th international conference on Parallel Processing, (575-586)
  114. ACM
    Zhang J, Kamga C, Gong H and Gruenwald L U2SOD-DB Proceedings of the ACM SIGKDD International Workshop on Urban Computing, (163-171)
  115. ACM
    Joshi P, Bourges-Sévenier M, Russell K and Mo Z Graphics programming for the web ACM SIGGRAPH 2012 Courses, (1-75)
  116. ACM
    Shao S, Liu X, Zhou M, Zhan J, Liu X, Chu Y and Chen H A GPU-based implementation of an enhanced GEP algorithm Proceedings of the 14th annual conference on Genetic and evolutionary computation, (999-1006)
  117. ACM
    Qin A, Raimondo F, Forbes F and Ong Y An improved CUDA-based implementation of differential evolution on GPU Proceedings of the 14th annual conference on Genetic and evolutionary computation, (991-998)
  118. Wang K, Huai Y, Lee R, Wang F, Zhang X and Saltz J (2012). Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems, Proceedings of the VLDB Endowment, 5:11, (1543-1554), Online publication date: 1-Jul-2012.
  119. ACM
    Kim J, Seo S, Lee J, Nah J, Jo G and Lee J SnuCL Proceedings of the 26th ACM international conference on Supercomputing, (341-352)
  120. Sah S, Vanek J, Roh Y and Wasnik R GPU accelerated real time rotation, scale and translation invariant image registration method Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I, (224-233)
  121. Igounet P, Alfaro P, Usera G and Ezzatti P GPU acceleration of the caffa3d.MB model Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV, (530-542)
  122. Rocha P, Xavier M, Pigozzo A, de M. Quintela B, Macedo G, dos Santos R and Lobosco M A three-dimensional computational model of the innate immune system Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I, (691-706)
  123. Calazan R, Nedjah N and de Macedo Mourelle L Swarm grid Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I, (148-160)
  124. Lee D, Dinov I, Dong B, Gutman B, Yanovsky I and Toga A (2012). CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms, Computer Methods and Programs in Biomedicine, 106:3, (175-187), Online publication date: 1-Jun-2012.
  125. Rao V, Agrawal N and Maity S C-DAC's efforts Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?, (1-4)
  126. Burkitt M, Walker D, Romano D and Fazeli A (2012). Constructing Complex 3D Biological Environments from Medical Imaging Using High Performance Computing, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9:3, (643-654), Online publication date: 1-May-2012.
  127. Bustamam A, Burrage K and Hamilton N (2012). Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9:3, (679-692), Online publication date: 1-May-2012.
  128. Jaros J and Pospichal P A fair comparison of modern CPUs and GPUs running the genetic algorithm under the knapsack benchmark Proceedings of the 2012t European conference on Applications of Evolutionary Computation, (426-435)
  129. ACM
    Duchowski A, Price M, Meyer M and Orero P Aggregate gaze visualization with real-time heatmaps Proceedings of the Symposium on Eye Tracking Research and Applications, (13-20)
  130. ACM
    Fazackerley S, McAvoy S and Lawrence R GPU accelerated AES-CBC for database applications Proceedings of the 27th Annual ACM Symposium on Applied Computing, (873-878)
  131. Liu X, Tan S, Wang H and Yu H A GPU-accelerated envelope-following method for switching power converter simulation Proceedings of the Conference on Design, Automation and Test in Europe, (1349-1354)
  132. Liu X, Tan S and Wang H Parallel statistical analysis of analog circuits by GPU-accelerated graph-based approach Proceedings of the Conference on Design, Automation and Test in Europe, (852-857)
  133. Cai Y, Li G, Wang H, Zheng G and Lin S (2012). Development of parallel explicit finite element sheet forming simulation system based on GPU architecture, Advances in Engineering Software, 45:1, (370-379), Online publication date: 1-Mar-2012.
  134. ACM
    Lupo C, Wood Z and Victorino C Cross teaching parallelism and ray tracing Proceedings of the 43rd ACM technical symposium on Computer Science Education, (523-528)
  135. Jaros J, Treeby B and Rendell A Use of multiple GPUs on shared memory multiprocessors for ultrasound propagation simulations Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127, (43-52)
  136. Kraus J and Förster M Efficient AMG on heterogeneous systems Facing the Multicore-Challenge II, (133-146)
  137. Kawanami K and Fujimoto N GPU accelerated computation of the longest common subsequence Facing the Multicore-Challenge II, (84-95)
  138. Ivanov L (2012). The right balance, Journal of Computing Sciences in Colleges, 27:3, (115-121), Online publication date: 1-Jan-2012.
  139. ACM
    Bailey M and Cunningham S Introduction to computer graphics SIGGRAPH Asia 2011 Courses, (1-58)
  140. ACM
    Nehab D, Maximo A, Lima R and Hoppe H GPU-efficient recursive filtering and summed-area tables Proceedings of the 2011 SIGGRAPH Asia Conference, (1-12)
  141. ACM
    Narasiman V, Shebanow M, Lee C, Miftakhutdinov R, Mutlu O and Patt Y Improving GPU performance via large warps and two-level warp scheduling Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, (308-317)
  142. ACM
    Nehab D, Maximo A, Lima R and Hoppe H (2011). GPU-efficient recursive filtering and summed-area tables, ACM Transactions on Graphics, 30:6, (1-12), Online publication date: 1-Dec-2011.
  143. ACM
    Barenghi A, Bertoni G, Breveglieri L, Pelosi G and Palomba A Fault attack to the elliptic curve digital signature algorithm with multiple bit faults Proceedings of the 4th international conference on Security of information and networks, (63-72)
  144. ACM
    Meng J, Morozov V, Kumaran K, Vishwanath V and Uram T GROPHECY Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  145. Zhang H, Yan T, Wong M and Patel S Accelerating aerial image simulation with GPU Proceedings of the International Conference on Computer-Aided Design, (178-184)
  146. ACM
    Zhang J Speeding up large-scale geospatial polygon rasterization on GPGPUs Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems, (10-17)
  147. Cabarle F, Adorna H, Martínez-del-Amor M and Pérez-Jiménez M Spiking neural P system simulations on a high performance GPU platform Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II, (99-108)
  148. Nery A, Nedjah N, França F and Jozwiak L Massively parallel identification of intersection points for GPGPU ray tracing Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II, (14-23)
  149. Samra S, El-Mahdy A, Gomaa W, Wada Y and Shoukry A Efficient parallel implementations of controlled optimization of traffic phases Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I, (270-281)
  150. ACM
    Mitchell C, Mache J and Karavanic K Learning CUDA Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, (201-202)
  151. Yi Y, Lai C, Petrov S and Keutzer K Efficient parallel CKY parsing on GPUs Proceedings of the 12th International Conference on Parsing Technologies, (175-185)
  152. Matuszak M, Miekisz J and Schreiber T Smooth conditional transition paths in dynamical gaussian networks Proceedings of the 34th Annual German conference on Advances in artificial intelligence, (204-215)
  153. ACM
    Burkitt M, Walker D, Romano D and Fazeli A Modelling sperm behaviour in a 3D environment Proceedings of the 9th International Conference on Computational Methods in Systems Biology, (141-149)
  154. Dondi P, Lombardi L and Cinque L RDVideo Proceedings of the 16th international conference on Image analysis and processing - Volume Part II, (158-167)
  155. Konstantinidis E and Cotronis Y Accelerating the red/black SOR method using GPUs with CUDA Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (589-598)
  156. Jung H, Yi Y and Ha S Automatic CUDA code synthesis framework for multicore CPU and GPU architectures Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, (579-588)
  157. ACM
    Kim J, Kim H, Lee J and Lee J (2011). Achieving a single compute device image in OpenCL for multiple GPUs, ACM SIGPLAN Notices, 46:8, (277-288), Online publication date: 7-Sep-2011.
  158. Filelis-Papadopoulos C, Gravvanis G, Matskanidis P and Giannoutakis K (2011). On the GPGPU parallelization issues of finite element approximate inverse preconditioning, Journal of Computational and Applied Mathematics, 236:3, (294-307), Online publication date: 1-Sep-2011.
  159. Cabarle F, Adorna H and Martínez M A spiking neural p system simulator based on CUDA Proceedings of the 12th international conference on Membrane Computing, (87-103)
  160. ACM
    Pedemonte M, Alba E and Luna F Bitwise operations for GPU implementation of genetic algorithms Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, (439-446)
  161. ACM
    Pospichal P, Murphy E, O'Neill M, Schwarz J and Jaros J Acceleration of grammatical evolution using graphics processing units Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, (431-438)
  162. ACM
    Ernst D Preparing students for future architectures with an exploration of multi- and many-core performance Proceedings of the 16th annual joint conference on Innovation and technology in computer science education, (57-62)
  163. Wittek P and Darányi S Introducing scalable quantum approaches in language representation Proceedings of the 5th international conference on Quantum interaction, (2-12)
  164. Pacifici L, Nalli D, Skouteris D and Laganà A Time dependent quantum reactive scattering on GPU Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III, (428-441)
  165. Passerat-Palmbach J, Mazel C and Hill D Pseudo-Random Number Generation on GP-GPU Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation, (1-8)
  166. Grelck C Single assignment C (SAC) high productivity meets high performance Proceedings of the 4th Summer School conference on Central European Functional Programming School, (207-278)
  167. ACM
    Gou C and Gaydadjiev G Elastic pipeline Proceedings of the 8th ACM International Conference on Computing Frontiers, (1-11)
  168. Cárdenas-Montes M, Vega-Rodríguez M, Rodríguez-Vázquez J and Gómez-Iglesias A Effect of the block occupancy in GPGPU over the performance of particle swarm algorithm Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I, (310-319)
  169. Bußler M, Rick T, Kelle-Emden A, Hentschel B and Kuhlen T Interactive particle tracing in time-varying tetrahedral grids Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization, (71-80)
  170. ACM
    Thall A Fast Mersenne prime testing on the GPU Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, (1-8)
  171. ACM
    Kim J, Kim H, Lee J and Lee J Achieving a single compute device image in OpenCL for multiple GPUs Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, (277-288)
  172. ACM
    Guo J, Thiyagalingam J and Scholz S Breaking the GPU programming barrier with the auto-parallelising SAC compiler Proceedings of the sixth workshop on Declarative aspects of multicore programming, (15-24)
  173. Thiyagalingam J, Goodman D, Schnabel J, Trefethen A and Grau V (2011). On the usage of GPUs for efficient motion estimation in medical image sequences, Journal of Biomedical Imaging, 2011, (1-15), Online publication date: 1-Jan-2011.
  174. ACM
    Bailey M and Cunningham S Introduction to computer graphics ACM SIGGRAPH ASIA 2010 Courses, (1-100)
  175. ACM
    Gopalakrishnan G and Kirby R Top ten ways to make formal methods for HPC practical Proceedings of the FSE/SDP workshop on Future of software engineering research, (137-142)
  176. ACM
    Li G and Gopalakrishnan G Scalable SMT-based verification of GPU kernel functions Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, (187-196)
  177. ACM
    Zhang J, You S and Gruenwald L Indexing large-scale raster geospatial data using massively parallel GPGPU computing Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, (450-453)
  178. ACM
    Zhang J Towards personal high-performance geospatial computing (HPC-G) Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems, (3-10)
  179. ACM
    Anderson N, Mache J and Watson W Learning CUDA Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, (183-188)
  180. Rudy G, Khan M, Hall M, Chen C and Chame J A programming language interface to describe transformations and code generation Proceedings of the 23rd international conference on Languages and compilers for parallel computing, (136-150)
  181. Zhu K, Butenuth M and d'Angelo P Comparison of dense stereo using CUDA Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II, (398-410)
  182. Breitbart J Static GPU threads and an improved scan algorithm Proceedings of the 2010 conference on Parallel processing, (373-380)
  183. Pigozzo A, Lobosco M and Dos Santos R Parallel implementation of a computational model of the human immune system Proceedings of the 2010 conference on Parallel processing, (217-224)
  184. Gladkov D, Tapia J and D'Souza R Preliminary work on graphics processing unit based direct simulation Monte Carlo Proceedings of the 2010 Conference on Grand Challenges in Modeling & Simulation, (59-65)
  185. van Werkhoven B, Maassen J and Seinstra F Towards user transparent parallel multimedia computing on GPU-Clusters Proceedings of the 2010 international conference on Computer Architecture, (28-39)
  186. ACM
    Briseid S, Dokken T and Hagen T Heterogeneous spline surface intersections Proceedings of the 26th Spring Conference on Computer Graphics, (141-148)
Contributors

Recommendations