Techniques for efficient execution of large-scale scientific workflows in distributed environments

Selim Kalayci, Florida International University

Abstract

Scientific exploration demands heavy usage of computational resources for large-scale and deep analysis in many different fields. The complexity or the sheer scale of the computational studies can sometimes be encapsulated in the form of a workflow that is made up of numerous dependent components. Due to its decomposable and parallelizable nature, different components of a scientific workflow may be mapped over a distributed resource infrastructure to reduce time to results. However, the resource infrastructure may be heterogeneous, dynamic, and under diverse administrative control. Workflow management tools are utilized to help manage and deal with various aspects in the lifecycle of such complex applications. One particular and fundamental aspect that has to be dealt with as smooth and efficient as possible is the run-time coordination of workflow activities (i.e. workflow orchestration). Our efforts in this study are focused on improving the workflow orchestration process in such dynamic and distributed resource environments. We tackle three main aspects of this process and provide contributions in each of them. Our first contribution involves increasing the scalability and site autonomy in situations where the mapped components of a workflow span across several heterogeneous administrative domains. We devise and implement a generic decentralization framework for orchestration of workflows under such conditions. Our second contribution is involved with addressing the issues that arise due to the dynamic nature of such environments. We provide generic adaptation mechanisms that are highly transparent and also substantially less intrusive with respect to the rest of the workflow in execution. Our third contribution is to improve the efficiency of orchestration of large-scale parameter-sweep workflows. By exploiting their specific characteristics, we provide generic optimization patterns that are applicable to most instances of such workflows. We also discuss implementation issues and details that arise as we provide our contributions in each situation.

Subject Area

Engineering|Computer science

Recommended Citation

Kalayci, Selim, "Techniques for efficient execution of large-scale scientific workflows in distributed environments" (2014). ProQuest ETD Collection for FIU. AAI3705075.
https://digitalcommons.fiu.edu/dissertations/AAI3705075

Share

COinS