Gaussian process fit on expression profile of gene Cyp1b1 in the experimental mouse data. Figure 5: A GP fitted on the centred profile of the gene Cyp1b1 (probeID 1416612_at in the GSE10562 dataset) with different settings of the lengthscale hyperparameter ℓ2. The blue crosses represent zero-mean hybridised gene expression in time (log2 ratios between treatment and control) and the shaded area indicates the point-wise mean plus/minus two times the standard deviation (95% confidence region). (a) Mean function is constant as ℓ2 → ∞ (0 inverse lengthscale in eq. (14)) and all of the observed data variance is attributed to noise (). (b) The lengthscale is manually set to a local-optimum large value (ℓ2 = 30) and thus the mean function roughly fits the data-points. The observed data variance is equally attributed to signal () and noise (). Consequently, the GP features high uncertainty in its predictive curve. (c) The lengthscale is manually set to a local-optimum small value (ℓ2 = 15.6) and thus the mean function tighly fits the data-points with high certainty. The interpretation from the covariance function in this case is that the profile contains a minimal amount of noise and that most of the observed data variance is attributed to the underlying signal (). (d) The contour of the corresponding LML function plotted by an exhaustive search of ℓ2 and SNR values. The two main local-optima are indicated by the green dots and a third optimum that corresponds to the 1st panel appears almost as flat in the contour and its vicinity encompasses the whole lengthscale axis for very small values of SNR (i.e. given that SNR ≈ 0, the lengthscale is trivial).
Kalaitzis and Lawrence BMC Bioinformatics 2011 12:180 doi:10.1186/1471-2105-12-180