Purposes of BioStatFlow

Purposes of BioStatFlow

1- Facilitate the access to statistical tools

The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified (mainly around proteomics and metabolomics for the default workflow, see References), BioStatFlow is based on the typical workflow as shown below:

Steinfath M. et al. (2008)

A set of analysis is first proposed as a static sequence in order to normalize the dataset. At this stage, users have to follow the order of the sequence. Because of experimental issues in the technical equipment, the levels of some analytical variables cannot be determined or that different experiments need to be compared, missing value estimation and data scaling are helpful pre-processing steps. This is the default use case (default workflow). Then, users can choose any of additional methods depending on the dataset and the corresponding experimental design (i.e. factors), in order i) to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions between factors, iv) to discriminate groups, and so on.

The entrance to each treatment takes the output of previous treatment.

If a treatment generates a data table (matrix) as an output, it will be used as input to the next step. Otherwise, if the treatment only generates results (texts and images) but does not change the input array, this latter will be directly taken as output.

Each treatment can be written as an R script (most common) or as a PERL script, embedding binary tools (like Matlab compiled scripts).

2 - Quickly implement & Give access the new statistical methods

BioStatFlow allows bioinformaticians to easily integrate a new method of statistical analysis in a workflow (see Workflow and Statistical Analysis catalogs, and an example: PCA), or even create their own workflows. Based on recent R packages, it allows bioinformaticians to quickly implement the new statistical methods, and therefore for biologists to access these ones.

3 - Disseminate the results of statistical analyses

BioStatFlow helps disseminate the results of statistical analyses by saving them in a persistent session so that they can be fully restored. One can thus provide the session identifier when publishing results, by communicating the URL based on the template “biostatflow.org?session=<sessionID>” (see Session management).

Table of Contents

Purposes of BioStatFlow

1- Facilitate the access to statistical tools

2 - Quickly implement & Give access the new statistical methods

3 - Disseminate the results of statistical analyses