|
Resolution: standard / high Figure 1.
A cartoon representation of the SBSE analysis procedure. Panel A: The top row illustrates how each of the six observations (a-f), represented by coloured
blocks, are sequentially evaluated by the algorithm. The bottom row represents the
-log(p-value) associated with each hypothesis update. Column (a) represents the first
evaluation, column (b) the second evaluation etc. Using the hypothetical data described in the main text, the algorithm first encounters
a grey box (i.e. an up-regulated gene not containing a seed sequence match) and estimates that there
is a one-half chance that the observed differential expression profile can be best
explained by the differential distribution of the seed motif. Next, a green box (i.e. a down-regulated gene containing a seed sequence match) is encountered and the hypothesis
is updated accordingly. The algorithm continues to update the hypothesis until all
of the data has been processed and a p-value calculated for each subsequent observation.
The estimates for the complete dataset are combined and summarised as illustrated
in Panel B: Each of the six differentially expressed genes, sorted from most up-regulated to most
down-regulated (i.e. left-to-right), are represented by the x-axes, with a green shaded column indicating
the presence of the miRNA seed motif and grey shading indicating the absence of the
seed motif. Each row (a-f) represents the hypotheses evaluated at each step of the
analysis procedure as described for panel A. The black vertical lines in each row
of the central section of the plot indicate the optimal division of the data at that
juncture. The upper-most section (U) of the plot summarises the -log of the estimated p-values. The optimum partition
of data is indicated by a faint vertical dashed blue line (i*) emerging from the point of the most significant p-value. The right-most section
(R) of the plot also summarises the -log of the estimated p-values associated with each
hypothesis update. The faint horizontal blue line (j*) indicates the most significant p-value and indicates those transcripts considered
important in our estimate of i*. Both the uppermost and rightmost plots use the same
scaled axes and may be used to best partition the data for further focussed analyses.
In this theoretical expression profile, the most significant differential distribution
of the miRNA seed motif is best estimated using data from the top four transcripts
and, by inference, any direct miRNA effect restricted to the transcript represented
by column six which is located to the right of i*, the largest enrichment score. Note
that the order in which each observation is incorporated into the analysis is dictated
by the absolute ranked vector and that for large and normally distributed datasets
the main section of the summary plot will form a triangle as the algorithm processes
the data from most to least dysregulated transcript.
Wilson and Plucinski BMC Genomics 2011 12:250 doi:10.1186/1471-2164-12-250 |