Characterizing workflow-based activity on a production e-infrastructure using provenance data.

S. Madougou, S. Shahand, M. Santcroos, B. van Schaik, A. Benabdelkader, A. van Kampen, S. Olabarriaga

Research output: Contribution to journalArticleAcademicpeer-review

8 Citations (Scopus)

Abstract

Grid computing and workflow management systems emerged as solutions to the challenges arising from the processing and storage of shear volumes of data generated by modern simulations and data acquisition devices. Workflow management systems usually document the process of the workflow execution either as structured provenance information or as log files. Provenance is recognized as an important feature in workflow management systems, however there are still few reports on its usage in practical cases. In this paper we present the provenance system implemented in our platform, and then use the information captured by this system during 8 months of platform operation to analyze the platform usage and to perform multilevel error pattern analysis. We make use of the large amount of structured data using the explanatory potential of statistical approaches to find properties of workflows, jobs and resources that are related to workflow failure. Such an analysis enables us to characterize workflow executions on the infrastructure and understand workflow failures. The approach is generic and applicable to other e-infrastructures to gain insight into operational incidents.
Original languageEnglish
Pages (from-to)1931-1942
JournalFuture Generation Computer Systems
Volume29
Issue number8
DOIs
Publication statusPublished - 2013

Cite this