ABSTRACT
For the sake of software compatibility, simulations are often parallelized without much code rewriting. Performances can be further improved by optimizing codes so that to use the maximum power offered by parallel architectures. While this approach can provide some speed-up, performance of parallelized codes can be strongly limited a priori because traditional algorithms have been designed for sequential technologies. Thus, additional increase of performance should ultimately rely on some redesign of algorithms.
Here, we redesign an algorithm that has traditionally been used to simulate the folding properties of polymers. We address the issue of performance in the context of biological applications, more particularly in the active field of chromosome modelling. Due to the strong confinement of chromosomes in the cells, simulation of their motion is slowed down by the laborious search for the next valid states to progress. Our redesign, that we call the Possible Futures Algorithm (PFA), relies on the parallel computation of possible evolutions of the same state, which effectively increases the probability to obtain a valid state at each step. We apply PFA on a GPU-based architecture, allowing us to optimally reduce the latency induced by the computation overhead of possible futures. We show that compared to the initial sequential model the acceptance rate of new states significantly increases without impacting the execution time. In particular, the stronger the confinement of the chromosome, the more efficient PFA becomes, making our approach appealing for biological applications.
While most of our results were obtained using Fermi architecture GPUs from NVIDIA, we highlight improved performance on the cutting-edge Kepler architecture K20 GPUs.
- G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference, pages 483--485. ACM, 1967. Google ScholarDigital Library
- L. Dagum and R. Menon. Openmp: an industry standard api for shared-memory programming. Computational Science & Engineering, IEEE, 5(1):46--55, 1998. Google ScholarDigital Library
- P.-G. de Gennes. Scaling concept in polymer physics. IthacaNY: Cornell University Press, third edition, 1988.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- G. Fudenberg and L. A. Mirny. Higher-order chromatin structure: bridging physics and biology. Current Opinion in Genetics & Development, 22(2):115--124, 2012.Google ScholarCross Ref
- J. L. Gustafson. Reevaluating amdahl's law. Communications of the ACM, 31(5):532--533, 1988. Google ScholarDigital Library
- B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: a mapreduce framework on graphics processors. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 260--269. ACM, 2008. Google ScholarDigital Library
- M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. Computer, 41(7):33--38, 2008. Google ScholarDigital Library
- I. Junier, R. K. Dale, C. Hou, F. Képès, and A. Dean. CTCF-mediated transcriptional regulation through cell type-specific chromosome organization in the -globin locus. Nucleic Acids Research, 2012.Google ScholarCross Ref
- I. Junier, O. Martin, and F. Képès. Spatial and topological organization of dna chains induced by gene co-localization. PLoS computational biology, 6(2):e1000678, 2010.Google Scholar
- J. Langowski and D. Heermann. Computational modeling of the chromatin fiber. Seminars in Cell & Developmental Biology, 18(5):659--667, 2007.Google ScholarCross Ref
- J. K. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. Computer, 17(1):6--22, 1984. Google ScholarDigital Library
- Message Passing Interface Forum. Document for a standard message-passing interface. Technical report, University of Tennessee, 1993. Google ScholarDigital Library
- NVIDIA. Nvidia's next generation cuda compute architecture: Kepler gk110. Technical report, 2012.Google Scholar
- D. Reith, L. Mirny, and P. Virnau. Gpu based molecular dynamics simulations of polymer rings in concentrated solution: Structure and scaling. Progress of Theoretical Physics Supplement, 191:135--145, 2011.Google ScholarCross Ref
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 73--82. ACM, 2008. Google ScholarDigital Library
- J. E. Smith. A study of branch prediction strategies. In 25 years of the international symposia on Computer architecture (selected papers), pages 202--215. ACM, 1998. Google ScholarDigital Library
- T. R. Strick, M.-N. Dessinges, G. Charvin, N. H. Dekker, J.-F. Allemand, D. Bensimon, and V. Croquette. Stretching of macromolecules and proteins. Reports on Progress in Physics, 66:1, 2003.Google ScholarCross Ref
Index Terms
- Parallel stepwise stochastic simulation: harnessing GPUs to explore possible futures states of a chromosome folding model thanks to the possible futures algorithm (PFA)
Recommendations
Gillespie's Stochastic Simulation Algorithm on MIC coprocessors
To investigate the behavior of biochemical systems, many runs of Gillespie's Stochastic Simulation Algorithm (SSA) are generally needed, causing excessive computational costs on Central Processing Units (CPUs). Since all SSA runs are independent, the ...
GPU Acceleration for Simulating Massively Parallel Many-Core Platforms
Emerging massively parallel architectures such as a general-purpose processor plus many-core programmable accelerators are creating an increasing demand for novel methods to perform their architectural simulation. Most state-of-the-art simulation ...
Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting
GPGPU-5: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing UnitsModern system-on-chips are evolving towards complex and heterogeneous platforms with general purpose processors coupled with massively parallel manycore accelerator fabrics (e.g. embedded GPUs). Platform developers are looking for efficient full-system ...
Comments