skip to main content
10.1145/1646468.1646481acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Ensemble dispatching on an IBM Blue Gene/L for a bioinformatics knowledge environment

Published:16 November 2009Publication History

ABSTRACT

This paper discusses our work providing support for processing a large number of short tasks within the context of our development of a collaborative bioinformatics knowledge environment for structural biologists, environmental microbiologists, and evolutionary biologists. We have designed and implemented a new ensemble-based task dispatching system that we have deployed on a Blue Gene/L system in conjunction with the Blue Gene's High Throughput Computing (HTC) capability. Unlike our prior general database-backed HTC task dispatching system, the ensemble-based task dispatching system is able to efficiently process and dispatch large numbers of very short tasks to over a thousand cores. We also investigate the scalability of the IBM Blue Gene/L at HTC in general, identifying and eliminating processor-reboot inefficincies for very short tasks for specific applications, making the Blue Gene/L a feasible processing system for this bioinformatics workload.

References

  1. Amazon Web Services. http://www.amazon.com/aws/.Google ScholarGoogle Scholar
  2. H. Andres Lagar-Cavilla, J. Whitney, A. Scannell, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. Impromptu Clusters for Near-Interactive Cloud-Based Services. Department of Computer Science, University of Toronto, Technical Report, June 2008.Google ScholarGoogle Scholar
  3. B. Bode, D. Halstead, R. Kendall, Z. Lei, W. Hall, and D. Jackson. The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters. Usenix, 4th Annual Linux Showcase and Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Borthakur. The Hadoop Distributed File System: Architecture and Design. Hadoop Project Website, 2007.Google ScholarGoogle Scholar
  5. J. Cope, M. Oberg, H. Tufo, T. Voran, and M. Woitaszek. High Throughput Grid Computing with an IBM Blue Gene/L. In IEEE International Conference on Cluster Computing, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Desai. Cobalt: An Open Source Platform for HPC System Software Research. Edinburgh BG/L System Software Workshop, 2005.Google ScholarGoogle Scholar
  7. J. Evans, L. Sheneman, and J. Foster. Relaxed Neighbor-Joining: A Fast Distance-Based Phylogenetic Tree Construction Method. Journal of Molecular Evolution, 62:785--792, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  8. I. Foster. Globus Toolkit Version 4: Software for Service-Oriented Systems. In IFIP International Conference on Network and Parallel Computing, pages 2--13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Foster, C. Kesselman, and S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 15:200--222, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good. On the Use of Cloud Computing for Scientific Workflows. In SWBES, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Hui, Y. Huashan, and L. Xiaoming. A Lightweight Execution Framework for Massive Independent Tasks. In Many-Task Computing on Grids and Supercomputers, November 2008.Google ScholarGoogle Scholar
  12. A. Peters, A. King, T. Budnik, P. McCarthy, P. Michaud, M. Mundy, J. Sexton, and G. Stewart. Asynchronous Task Dispatch for High Throughput Computing for the eServer IBM Blue Gene Supercomputer. In IEEE International Symposium on Parallel and Distributed Processing, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Price. FastTree. http://www.microbesonline.org/fasttree/.Google ScholarGoogle Scholar
  14. I. Raicu and I. Foster. Many-Task Computing for Grids and Supercomputers. IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS08), 2008.Google ScholarGoogle Scholar
  15. I. Raicu, Z. Zhang, M. Wilde, I. Foster, P. Beckman, K. Iskra, and B. Clifford. Toward Loosely Coupled Programming on Petascale Systems. Proceedeings of the 2008 ACM/IEEE conference on Supercomputing, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Stamatakis, T. Ludwig, and H. Meier. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics, 21:456--463, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the condor experience. Concurrency - Practice and Experience, 17(2--4):323--356, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ensemble dispatching on an IBM Blue Gene/L for a bioinformatics knowledge environment

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MTAGS '09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
            November 2009
            131 pages
            ISBN:9781605587141
            DOI:10.1145/1646468

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 November 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader