A cartoon representation of the SBSE analysis procedure. Panel A: The top row illustrates how each of the six observations (a-f), represented by coloured blocks, are sequentially evaluated by the algorithm. The bottom row represents the -log(p-value) associated with each hypothesis update. Column (a) represents the first evaluation, column (b) the second evaluation etc. Using the hypothetical data described in the main text, the algorithm first encounters a grey box (i.e. an up-regulated gene not containing a seed sequence match) and estimates that there is a one-half chance that the observed differential expression profile can be best explained by the differential distribution of the seed motif. Next, a green box (i.e. a down-regulated gene containing a seed sequence match) is encountered and the hypothesis is updated accordingly. The algorithm continues to update the hypothesis until all of the data has been processed and a p-value calculated for each subsequent observation. The estimates for the complete dataset are combined and summarised as illustrated in Panel B: Each of the six differentially expressed genes, sorted from most up-regulated to most down-regulated (i.e. left-to-right), are represented by the x-axes, with a green shaded column indicating the presence of the miRNA seed motif and grey shading indicating the absence of the seed motif. Each row (a-f) represents the hypotheses evaluated at each step of the analysis procedure as described for panel A. The black vertical lines in each row of the central section of the plot indicate the optimal division of the data at that juncture. The upper-most section (U) of the plot summarises the -log of the estimated p-values. The optimum partition of data is indicated by a faint vertical dashed blue line (i*) emerging from the point of the most significant p-value. The right-most section (R) of the plot also summarises the -log of the estimated p-values associated with each hypothesis update. The faint horizontal blue line (j*) indicates the most significant p-value and indicates those transcripts considered important in our estimate of i*. Both the uppermost and rightmost plots use the same scaled axes and may be used to best partition the data for further focussed analyses. In this theoretical expression profile, the most significant differential distribution of the miRNA seed motif is best estimated using data from the top four transcripts and, by inference, any direct miRNA effect restricted to the transcript represented by column six which is located to the right of i*, the largest enrichment score. Note that the order in which each observation is incorporated into the analysis is dictated by the absolute ranked vector and that for large and normally distributed datasets the main section of the summary plot will form a triangle as the algorithm processes the data from most to least dysregulated transcript.
Wilson and Plucinski BMC Genomics 2011 12:250 doi:10.1186/1471-2164-12-250