fbpx
Wikipedia

Scientific workflow system

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.[1]

Applications edit

Distributed scientists can collaborate on conducting large scale scientific experiments and knowledge discovery applications using distributed systems of computing resources, data sets, and devices. Scientific workflow systems play an important role in enabling this vision.

More specialized scientific workflow systems provide a visual programming front end enabling users to easily construct their applications as a visual graph by connecting nodes together, and tools have also been developed to build such applications in a platform-independent manner.[2] Each directed edge in the graph of a workflow typically represents a connection from the output of one application to the input of the next. A sequence of such edges may be called a pipeline.

A bioinformatics workflow management system is a specialized scientific workflow system focused on bioinformatics.

Scientific workflows edit

The simplest computerized scientific workflows are scripts that call in data, programs, and other inputs and produce outputs that might include visualizations and analytical results. These may be implemented in programs such as R or MATLAB, using a scripting language such as Python with a command-line interface, or more recently using open-source web applications such as Jupyter Notebook.

There are many motives for differentiating scientific workflows from traditional business process workflows. These include:

  • providing an easy-to-use environment for individual application scientists themselves to create their own workflows.
  • providing interactive tools for the scientists enabling them to execute their workflows and view their results in real-time.
  • simplifying the process of sharing and reusing workflows between the scientists.
  • enabling scientists to track the provenance of the workflow execution results and the workflow creation steps.

By focusing on the scientists, the focus of designing scientific workflow system shifts away from the workflow scheduling activities, typically considered by grid computing environments for optimizing the execution of complex computations on predefined resources, to a domain-specific view of what data types, tools and distributed resources should be made available to the scientists and how can one make them easily accessible and with specific Quality of Service requirements [3]

Scientific workflows are now recognized[by whom?] as a crucial element of the cyberinfrastructure, facilitating e-Science. Typically sitting on top of a middleware layer, scientific workflows are a means by which scientists can model, design, execute, debug, re-configure, and re-run their analysis and visualization pipelines. Part of the established scientific method is to create a record of the origins of a result, how it was obtained, experimental methods used, machine calibrations and parameters, etc. It is the same in e-Science, except provenance data are a record of the workflow activities invoked, services and databases accessed, data sets used, and so forth. Such information is useful for a scientist to interpret their workflow results and for other scientists to establish trust in the experimental result.[4]

Sharing workflows edit

Social networking communities such as myExperiment have been developed to facilitate sharing and collaborative development of scientific workflows. Galaxy provide collaborative mechanisms for editing and publication of workflow definitions and workflow results directly on the Galaxy installation.

Analysis edit

A key assumption underlying all scientific workflow systems is that the scientists themselves will be able to use a workflow system to develop their applications based on visual flowcharting, logic diagramming, or, as a last resort, writing code to describe the workflow logic. Powerful workflow systems make it easy for non-programmers to first sketch out workflow steps using simple flowcharting tools, and then hook in various data acquisition, analysis, and reporting tools. For maximum productivity, details of the underlying programming code should normally be hidden.

Workflow analysis techniques can be used to analyze the properties of such workflows to verify certain properties before executing them. An example of a theoretical formal analysis framework for the verification and profiling of the control-flow aspects of scientific workflows and their data flow aspects for the Discovery Net system is described in the paper, "The design and implementation of a workflow analysis tool" by Curcin et al.[5]

The authors note that introducing program analysis and verification into the workflow world requires detailed understanding of execution semantics of workflow language, including execution properties of nodes and arcs in the workflow graph, understanding functional equivalencies between workflow patterns, and many other issues. Doing such analysis is difficult, and addressing these issues requires building on formal methods used in computer science research (e.g. Petri nets) and building on these formal methods to develop user-level tools to reason about the properties of both workflows and workflow systems. The lack of such tools in the past stopped automated workflow management solutions from maturing from nice-to-have academic toys to production-level tools used outside the narrow circle of early adopters and workflow enthusiasts.

Notable systems edit

Notable scientific workflow systems include:[6]

  • Anduril, bioinformatics and image analysis
  • Apache Airavata, a general purpose workflow management system[7]
  • Apache Airflow, a general purpose workflow management system
  • Apache Taverna, widely used in bioinformatics, astronomy, biodiversity
  • BioBIKE, a cloud-based bioinformatics platform
  • Bioclipse, a graphical workbench, with a scripting environment that lets you perform complex actions as a kind of workflow.
  • Collective Knowledge, a Python-based general workflow and experiment crowdsourcing framework with JSON API and cross-platform package manager
  • Common Workflow Language, a community-developed YAML-based workflow language, supported by multiple engine implementations.
  • Cuneiform, a functional workflow language.
  • Discovery Net, one of the earliest examples of a scientific workflow system
  • Galaxy, initially targeted at genomics
  • GenePattern, a powerful scientific workflow system that provides access to hundreds of genomic analysis tools.[8]
  • Kepler, a scientific workflow management system
  • KNIME, an open-source data analytics platform
  • Pegasus, an open-source scientific workflow management system[9]
  • OnlineHPC, online scientific workflow designer and high performance computing toolkit
  • Orange, open source data visualization and analysis
  • Pipeline Pilot, graphical programming with many tools to address Cheminformatics workflows [10]
  • Swift parallel scripting language, a scripting language with many of the capabilities of scientific workflow systems built-in.
  • VisTrails, a scientific workflow system developed in Python

More than 280 computational data analysis workflow systems have been identified,[11] although the distinction between data analysis workflows and scientific workflows is fluid, as not all analysis workflow systems are used for scientific purposes.

See also edit

References edit

  1. ^ Sun, LiewChee; P, AtkinsonMalcolm; GaleaMichelle; Fong, AngTan; MartinPaul; Van, HemertJano I. (2016-12-12). "Scientific Workflows". ACM Computing Surveys. 49 (4): 1–39. doi:10.1145/3012429. hdl:20.500.11820/774ef69e-a499-4bd2-a609-09f050e682ae. S2CID 9408644.
  2. ^ D. Johnson; et al. (December 2009). "A middleware independent Grid workflow builder for scientific applications" (PDF). 2009 5th IEEE International Conference on E-Science Workshops. pp. 86–91. doi:10.1109/ESCIW.2009.5407993. ISBN 978-1-4244-5946-9. S2CID 3339794.
  3. ^ Kyriazis, Dimosthenis; Tserpes, Konstantinos; Menychtas, Andreas; Litke, Antonis; Varvarigou, Theodora (2008). "An innovative workflow mapping mechanism for Grids in the frame of Quality of Service". Future Generation Computer Systems. 24 (6): 498–511. doi:10.1016/j.future.2007.07.009.
  4. ^ Automatic capture and efficient storage of e-Science experiment provenance. Concurrency Computat.: Pract. Exper. 2008; 20:419–429
  5. ^ Curcin, V.; Ghanem, M.; Guo, Y. (2010). "The design and implementation of a workflow analysis tool". Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 368 (1926): 4193–4208. Bibcode:2010RSPTA.368.4193C. doi:10.1098/rsta.2010.0157. PMID 20679131.
  6. ^ Barker, Adam; Van Hemert, Jano (2008), "Scientific Workflow: A Survey and Research Directions", Parallel Processing and Applied Mathematics, 7th International Conference, PPAM 2007, Revised Selected Papers, Lecture Notes in Computer Science, vol. 4967, Gdansk, Poland: Springer Berlin / Heidelberg, pp. 746–753, CiteSeerX 10.1.1.105.4605, doi:10.1007/978-3-540-68111-3_78, ISBN 978-3-540-68105-2
  7. ^ Marru, Suresh; Gardler, Ross; Slominski, Aleksander; Douma, Ate; Perera, Srinath; Weerawarana, Sanjiva; Gunathilake, Lahiru; Herath, Chathura; Tangchaisin, Patanachai; Pierce, Marlon; Mattmann, Chris; Singh, Raminder; Gunarathne, Thilina; Chinthaka, Eran (2011-11-18). Proceedings of the 2011 ACM workshop on Gateway computing environments - GCE '11. p. 21. doi:10.1145/2110486.2110490. ISBN 9781450311236. S2CID 18341808.
  8. ^ Reich, Michael; Liefeld, Ted; Gould, Joshua; Lerner, Jim; Tamayo, Pablo; Mesirov, Jill P (2006). "GenePattern 2.0". Nature Genetics. 38 (5): 500–501. doi:10.1038/ng0506-500. PMID 16642009. S2CID 5503897.
  9. ^ Deelman, Ewa; Vahi, Karan; Juve, Gideon; Rynge, Mats; Callaghan, Scott; Maechling, Philip J.; Mayani, Rajiv; Chen, Weiwei; Ferreira da Silva, Rafael; Livny, Miron; Wenger, Kent (May 2015). "Pegasus, a workflow management system for science automation". Future Generation Computer Systems. 46: 17–35. doi:10.1016/j.future.2014.10.008.
  10. ^ "BIOVIA Pipeline Pilot | Scientific Workflow Authoring Application for Data Analysis". Accelrys.com. Retrieved 2016-12-04.
  11. ^ "Existing Workflow systems". Common Workflow Language wiki. from the original on 2019-10-17.

External links edit

  • Yu, Jia; Buyya, Rajkumar (2005). "A taxonomy of scientific workflow systems for grid computing". ACM SIGMOD Record. 34 (3): 44. CiteSeerX 10.1.1.63.3176. doi:10.1145/1084805.1084814. S2CID 538714.
  • Scientific workflow systems - can one size fit all? paper in CIBEC'08 comparing the features of multiple scientific workflow systems.
  • List of software tools related to scientific workflows on the DataONE website

scientific, workflow, system, scientific, workflow, system, specialized, form, workflow, management, system, designed, specifically, compose, execute, series, computational, data, manipulation, steps, workflow, scientific, application, contents, applications, . A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps or workflow in a scientific application 1 Contents 1 Applications 2 Scientific workflows 3 Sharing workflows 4 Analysis 5 Notable systems 6 See also 7 References 8 External linksApplications editDistributed scientists can collaborate on conducting large scale scientific experiments and knowledge discovery applications using distributed systems of computing resources data sets and devices Scientific workflow systems play an important role in enabling this vision More specialized scientific workflow systems provide a visual programming front end enabling users to easily construct their applications as a visual graph by connecting nodes together and tools have also been developed to build such applications in a platform independent manner 2 Each directed edge in the graph of a workflow typically represents a connection from the output of one application to the input of the next A sequence of such edges may be called a pipeline A bioinformatics workflow management system is a specialized scientific workflow system focused on bioinformatics Scientific workflows editThe simplest computerized scientific workflows are scripts that call in data programs and other inputs and produce outputs that might include visualizations and analytical results These may be implemented in programs such as R or MATLAB using a scripting language such as Python with a command line interface or more recently using open source web applications such as Jupyter Notebook There are many motives for differentiating scientific workflows from traditional business process workflows These include providing an easy to use environment for individual application scientists themselves to create their own workflows providing interactive tools for the scientists enabling them to execute their workflows and view their results in real time simplifying the process of sharing and reusing workflows between the scientists enabling scientists to track the provenance of the workflow execution results and the workflow creation steps By focusing on the scientists the focus of designing scientific workflow system shifts away from the workflow scheduling activities typically considered by grid computing environments for optimizing the execution of complex computations on predefined resources to a domain specific view of what data types tools and distributed resources should be made available to the scientists and how can one make them easily accessible and with specific Quality of Service requirements 3 Scientific workflows are now recognized by whom as a crucial element of the cyberinfrastructure facilitating e Science Typically sitting on top of a middleware layer scientific workflows are a means by which scientists can model design execute debug re configure and re run their analysis and visualization pipelines Part of the established scientific method is to create a record of the origins of a result how it was obtained experimental methods used machine calibrations and parameters etc It is the same in e Science except provenance data are a record of the workflow activities invoked services and databases accessed data sets used and so forth Such information is useful for a scientist to interpret their workflow results and for other scientists to establish trust in the experimental result 4 Sharing workflows editSocial networking communities such as myExperiment have been developed to facilitate sharing and collaborative development of scientific workflows Galaxy provide collaborative mechanisms for editing and publication of workflow definitions and workflow results directly on the Galaxy installation Analysis editA key assumption underlying all scientific workflow systems is that the scientists themselves will be able to use a workflow system to develop their applications based on visual flowcharting logic diagramming or as a last resort writing code to describe the workflow logic Powerful workflow systems make it easy for non programmers to first sketch out workflow steps using simple flowcharting tools and then hook in various data acquisition analysis and reporting tools For maximum productivity details of the underlying programming code should normally be hidden Workflow analysis techniques can be used to analyze the properties of such workflows to verify certain properties before executing them An example of a theoretical formal analysis framework for the verification and profiling of the control flow aspects of scientific workflows and their data flow aspects for the Discovery Net system is described in the paper The design and implementation of a workflow analysis tool by Curcin et al 5 The authors note that introducing program analysis and verification into the workflow world requires detailed understanding of execution semantics of workflow language including execution properties of nodes and arcs in the workflow graph understanding functional equivalencies between workflow patterns and many other issues Doing such analysis is difficult and addressing these issues requires building on formal methods used in computer science research e g Petri nets and building on these formal methods to develop user level tools to reason about the properties of both workflows and workflow systems The lack of such tools in the past stopped automated workflow management solutions from maturing from nice to have academic toys to production level tools used outside the narrow circle of early adopters and workflow enthusiasts Notable systems editNotable scientific workflow systems include 6 Anduril bioinformatics and image analysis Apache Airavata a general purpose workflow management system 7 Apache Airflow a general purpose workflow management system Apache Taverna widely used in bioinformatics astronomy biodiversity BioBIKE a cloud based bioinformatics platform Bioclipse a graphical workbench with a scripting environment that lets you perform complex actions as a kind of workflow Collective Knowledge a Python based general workflow and experiment crowdsourcing framework with JSON API and cross platform package manager Common Workflow Language a community developed YAML based workflow language supported by multiple engine implementations Cuneiform a functional workflow language Discovery Net one of the earliest examples of a scientific workflow system Galaxy initially targeted at genomics GenePattern a powerful scientific workflow system that provides access to hundreds of genomic analysis tools 8 Kepler a scientific workflow management system KNIME an open source data analytics platform Pegasus an open source scientific workflow management system 9 OnlineHPC online scientific workflow designer and high performance computing toolkit Orange open source data visualization and analysis Pipeline Pilot graphical programming with many tools to address Cheminformatics workflows 10 Swift parallel scripting language a scripting language with many of the capabilities of scientific workflow systems built in VisTrails a scientific workflow system developed in PythonMore than 280 computational data analysis workflow systems have been identified 11 although the distinction between data analysis workflows and scientific workflows is fluid as not all analysis workflow systems are used for scientific purposes See also editBioinformatics workflow management systems e Science Grid computing Workflow engineReferences edit Sun LiewChee P AtkinsonMalcolm GaleaMichelle Fong AngTan MartinPaul Van HemertJano I 2016 12 12 Scientific Workflows ACM Computing Surveys 49 4 1 39 doi 10 1145 3012429 hdl 20 500 11820 774ef69e a499 4bd2 a609 09f050e682ae S2CID 9408644 D Johnson et al December 2009 A middleware independent Grid workflow builder for scientific applications PDF 2009 5th IEEE International Conference on E Science Workshops pp 86 91 doi 10 1109 ESCIW 2009 5407993 ISBN 978 1 4244 5946 9 S2CID 3339794 Kyriazis Dimosthenis Tserpes Konstantinos Menychtas Andreas Litke Antonis Varvarigou Theodora 2008 An innovative workflow mapping mechanism for Grids in the frame of Quality of Service Future Generation Computer Systems 24 6 498 511 doi 10 1016 j future 2007 07 009 Automatic capture and efficient storage of e Science experiment provenance Concurrency Computat Pract Exper 2008 20 419 429 Curcin V Ghanem M Guo Y 2010 The design and implementation of a workflow analysis tool Philosophical Transactions of the Royal Society A Mathematical Physical and Engineering Sciences 368 1926 4193 4208 Bibcode 2010RSPTA 368 4193C doi 10 1098 rsta 2010 0157 PMID 20679131 Barker Adam Van Hemert Jano 2008 Scientific Workflow A Survey and Research Directions Parallel Processing and Applied Mathematics 7th International Conference PPAM 2007 Revised Selected Papers Lecture Notes in Computer Science vol 4967 Gdansk Poland Springer Berlin Heidelberg pp 746 753 CiteSeerX 10 1 1 105 4605 doi 10 1007 978 3 540 68111 3 78 ISBN 978 3 540 68105 2 Marru Suresh Gardler Ross Slominski Aleksander Douma Ate Perera Srinath Weerawarana Sanjiva Gunathilake Lahiru Herath Chathura Tangchaisin Patanachai Pierce Marlon Mattmann Chris Singh Raminder Gunarathne Thilina Chinthaka Eran 2011 11 18 Proceedings of the 2011 ACM workshop on Gateway computing environments GCE 11 p 21 doi 10 1145 2110486 2110490 ISBN 9781450311236 S2CID 18341808 Reich Michael Liefeld Ted Gould Joshua Lerner Jim Tamayo Pablo Mesirov Jill P 2006 GenePattern 2 0 Nature Genetics 38 5 500 501 doi 10 1038 ng0506 500 PMID 16642009 S2CID 5503897 Deelman Ewa Vahi Karan Juve Gideon Rynge Mats Callaghan Scott Maechling Philip J Mayani Rajiv Chen Weiwei Ferreira da Silva Rafael Livny Miron Wenger Kent May 2015 Pegasus a workflow management system for science automation Future Generation Computer Systems 46 17 35 doi 10 1016 j future 2014 10 008 BIOVIA Pipeline Pilot Scientific Workflow Authoring Application for Data Analysis Accelrys com Retrieved 2016 12 04 Existing Workflow systems Common Workflow Language wiki Archived from the original on 2019 10 17 External links editYu Jia Buyya Rajkumar 2005 A taxonomy of scientific workflow systems for grid computing ACM SIGMOD Record 34 3 44 CiteSeerX 10 1 1 63 3176 doi 10 1145 1084805 1084814 S2CID 538714 Scientific workflow systems can one size fit all paper in CIBEC 08 comparing the features of multiple scientific workflow systems List of software tools related to scientific workflows on the DataONE website Retrieved from https en wikipedia org w index php title Scientific workflow system amp oldid 1167053550, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.