Researchers dealing with gene microarray data are faced with daunting quantities of data in which lie hidden information that is important, including transcription factor activity profiles. We developed a model-based technique, HVDM (Hidden Variable Dynamic Modelling), which uses data from a small training set of known transcription factor (TF) targets plus a single anchoring degradation measurement to deduce the activity profile of the transcription factor – the hidden variable in the system. Using this activity profile, other targets of the same TF can then be identified by running the same model. Both stages rely on time course expression data obtained from microarrays. The sampling rate can be irregular, replicates are not required and measurement errors are explicitly taken into account so that results are ranked according to confidence, a must when dealing with noisy data. We tested HVDM on the DNA damage response network, focusing on p53, an important transcription factor. An independent experiment confirmed the accuracy of our predictions .
We have now generated an R/Bioconductor version of HVDM, called rHVDM. The original implementation, written in C, used time consuming algorithms both for the optimization step (Nelder-Mead) and the confidence intervals determination (Markov Chain Monte Carlo). In contrast, rHVDM uses a fast, gradient-based, optimisation step (Levenberg-Marquardt) from which accurate confidence intervals can also be obtained. As a result, a thousand genes can be screened in about five minutes on a standard current personal computer. Additionally, rHVDM includes an HTML report generator which allows visual quality assessment at each stage in the process.
rHVDM is applicable to large time course expression data sets, where identification and further exploitation of hidden variables can reveal critical information about network dynamics.
rHVDM can be downloaded from the bioconductor website http://bioconductor.org/packages/2.0/bioc/html/rHVDM.html webcite.