How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis
1 Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstraße 43, Berlin, D-10115, Germany
2 Institute for Theoretical Biology, Charité Universitätsmedizin, Invalidenstraße 43, Berlin, D-10115, Germany
3 Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, Vienna, A-1090, Austria
4 Faculty of Biology and Freiburg Initiative in Systems Biology, University of Freiburg, Schänzlestraße 1, Freiburg, D-79104, Germany
5 Global Change Research Center – CzechGlobe, Academy of Sciences of the Czech Republic, Belidla 986/4a, 60300 Brno, Czech Republic
BMC Bioinformatics 2013, 14:133 doi:10.1186/1471-2105-14-133Published: 21 April 2013
The transcriptomes of several cyanobacterial strains have been shown to exhibit diurnal oscillation patterns reflecting the diurnal phototrophic lifestyle of the organisms. The analysis of such genome-wide transcriptional oscillations is often facilitated by the use of clustering algorithms in conjunction with a number of pre-processing steps. Biological interpretation is usually focused on the time and phase of expression of the resulting groups of genes. However, the use of microarray technology in such studies requires the normalization of pre-processing data, with unclear impact on the qualitative and quantitative features of the derived information on the number of oscillating transcripts and their respective phases.
A microarray based evaluation of diurnal expression in the cyanobacterium Synechocystis sp. PCC 6803 is presented. As expected, the temporal expression patterns reveal strong oscillations in transcript abundance. We compare the Fourier transformation-based expression phase before and after the application of quantile normalization, median polishing, cyclical LOESS, and least oscillating set (LOS) normalization. Whereas LOS normalization mostly preserves the phases of the raw data, the remaining methods introduce systematic biases. In particular, quantile-normalization is found to introduce a phase-shift of 180°, effectively changing night-expressed genes into day-expressed ones. Comparison of a large number of clustering results of differently normalized data shows that the normalization method determines the result. Subsequent steps, such as the choice of data transformation, similarity measure, and clustering algorithm, only play minor roles. We find that the standardization and the DFT transformation are favorable for the clustering of time series in contrast to the log2 mean ratio transformation. We use the cluster-wise functional enrichment of a clustering derived by LOS normalization, clustering using flowClust, and DFT transformation to derive the diurnal biological program of Synechocystis sp.
Application of quantile normalization, median polishing, and also cyclic LOESS normalization of the presented cyanobacterial dataset lead to increased numbers of oscillating genes and the systematic shift of the expression phase. The LOS normalization minimizes the observed detrimental effects. As previous analyses employed a variety of different normalization methods, a direct comparison of results must be treated with caution.