Abstract:Background: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. Results: We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. Conclusions: We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.

Scientific Workflow Management and the Kepler System.

A Kepler scientific workflow to facilitate and standardize marine monitoring sensor parsing and dynamic adaption

Incorporating Semantics in Scientific Workflow Authoring

Scientific Workflow Approach (kepler) for Carbon Flux Data Processing

Workflows for microarray data processing in the Kepler environment

C-SWF: A Lightweight Scientific Workflow System for Astronomical Data Processing

Kepler Science Operations

A Web Based Workflow System For Distributed Atmospheric Data Processing

Formal Definition and Implementation of Reproducibility Tenets for Computational Workflows

The future of scientific workflows

Scientific Data Processing Framework for Hadoop MapReduce

An Architectural Model for a Grid based Workflow Management Platform in Scientific Applications

A Scientific Workflow System Based on GOS

A Survey of Data-Intensive Scientific Workflow Management

The Key Techniques of Scientific Workflow System

Exascale Workflow Applications and Middleware: An ExaWorks Retrospective

Workflow environments for advanced cyberinfrastructure platforms

ExaWorks Software Development Kit: A Robust and Scalable Collection of Interoperable Workflow Technologies

Scientific Workflow: Modeling Methods and Management System

Enabling Scalable Scientific Workflow Management in the Cloud.

Survey of geospatial data processing technology based on scientific workflow