The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified (mainly around proteomics and metabolomics for the default workflow, see References), BioStatFlow is based on the typical workflow as shown below:
Steinfath M. et al. (2008) |
A set of analysis is first proposed as a static sequence in order to normalize the dataset. At this stage, users have to follow the order of the sequence. Because of experimental issues in the technical equipment, the levels of some analytical variables cannot be determined or that different experiments need to be compared, missing value estimation and data scaling are helpful pre-processing steps. This is the default use case (default workflow). Then, users can choose any of additional methods depending on the dataset and the corresponding experimental design (i.e. factors), in order i) to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions between factors, iv) to discriminate groups, and so on. The entrance to each treatment takes the output of previous treatment. If a treatment generates a data table (matrix) as an output, it will be used as input to the next step. Otherwise, if the treatment only generates results (texts and images) but does not change the input array, this latter will be directly taken as output. Each treatment can be written as an R script (most common) or as a PERL script, embedding binary tools (like Matlab compiled scripts). |