Scientific workflows have become increasingly popular as a new computing paradigm for scientists to design and execute complex and distributed scientific processes to enable and accelerate many scientific discoveries. Although several scientific workflow management systems (SWFMSs) have been developed, there is a great need for an integrated scientific workflow system that enables the design and execution of higher-level scientific workflows, which integrate heterogeneous scientific workflows enacted by existing SWFMSs. On one hand, science is becoming increasingly collaborative today, requiring an integrated solution that combines the features and capabilities of different SWFMSs, which are typically developed and optimized towards one single discipline. One the other hand, such an integrated environment can immediately leverage existing and emerging techniques and strengths of various SWFMSs and their supported execution environments, such as Cluster, Grid, and Cloud. The main contributions of this dissertation are: (1) We propose a scientific workflow system, called GENOMEFLOW, to design, develop, and execute higher-level scientific workflows, whose workflow tasks are themselves scientific workflows enacted by existing SWFMSs; (2) We propose a workflow scheduling algorithm, called GSA, to enable the parallel execution of such heterogeneous scientific workflows in their native heterogeneous environments; and (3) We implemented GENOMEFLOW towards the life science community and developed several GENOMEFLOW scientific workflows to demonstrate the capabilities of our system for genome data analysis applications.
Recommendations
A Survey of Data-Intensive Scientific Workflow Management
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Scientific Workflow Partitioning in Multisite Cloud
Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805Scientific workflows allow scientists to conduct experiments that manipulate data with multiple computational activities using Scientific Workflow Management Systems SWfMSs. As the scale of the data increases, SWfMSs need to support workflow execution ...
Evaluating Distributed Platforms for Protein-Guided Scientific Workflow
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery EnvironmentComplex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, ...