Department of Chemical Engineering and Material Science, Michigan State University, East Lansing, MI 48824, USA

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA

Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA

Biomedical Engineering Department, Boston University, Boston, 02215, USA

Abstract

The detection and analysis of steady-state gene expression has become routine. Time-series microarrays are of growing interest to systems biologists for deciphering the dynamic nature and complex regulation of biosystems. Most temporal microarray data only contain a limited number of time points, giving rise to short-time-series data, which imposes challenges for traditional methods of extracting meaningful information. To obtain useful information from the wealth of short-time series data requires addressing the problems that arise due to limited sampling. Current efforts have shown promise in improving the analysis of short time-series microarray data, although challenges remain. This commentary addresses recent advances in methods for short-time series analysis including simplification-based approaches and the integration of multi-source information. Nevertheless, further studies and development of computational methods are needed to provide practical solutions to fully exploit the potential of this data.

Background

Microarray technology has enabled the interrogation of gene expression data in a global and parallel fashion, and has become the most popular platform in the era of systems biology

Time-series microarrays capture multiple expression profiles at discrete time points (i.e., minutes, hours, or days) of a continuous cellular process. These data can characterize the complex dynamics and regulation in the form of differential gene-expressions as a function of time. Numerous time-series microarray experiments have been performed to study such biological processes as the biological rhythms or circadian clock of

A significant challenge in dealing with time-series data comes from the limited sampling or number of time points taken, giving rise to short time-series data. In the growing pool of temporal microarray datasets, a typical time-series record has fewer than ten time-points

Limited sampling accentuates the difficulties associated with static or standard time-series analyses. First, the problems arising due to high dimensionality accompanied by a small sample size, such as matrix singularity and model over-fitting

Improving short time-series analysis requires addressing the problems that arise due to limited sampling. Recent efforts by investigators to overcome the difficulties associated with limited sampling include decreasing the complexity of continuous time-series data based on simplification strategies

The general process of

**The general process of time-series expression analysis starts with data collection from microarray experiments.** The data then undergoes

Simplification strategies

Simplification strategies reduce time-series data from continuous to discrete representations prior to analysis. These strategies usually transform the raw temporal profiles into a set of symbols

Simplifications methods have a side benefit in reducing the noise in the original data to some degree when decreasing the dimension of the time-series data, thus making the subsequent analysis more robust to noise. This was demonstrated by Sacchi et al.

A key challenge with simplification strategies is how to pre-define these

Incorporating multi-source information

Incorporating multi-source information, including prior knowledge (i.e., pathway information)

Different types of prior knowledge have been used to improve the computational analysis of short time-series data. They include applying a prior noise distribution to the expression data

In addition, pre-defined gene sets involving specific pathways or functional categories have focused on pattern changes of sets of genes rather than individual genes and helped to enhance our understanding of cellular processes _{2 }on the physiology of

A key challenge with integrating different datasets is the heterogeneity of the data, that is, each set may have a unique set of sampling rates, time-scales, cell types, and sample populations, as well as varying measurement noise levels, etc. The heterogeneity across the datasets increases the difficulty in extracting meaningful results. To maximize the usefulness and minimize the heterogeneity of the publicly available data, stricter standardization methods should be defined and imposed on procedures such as data collection and pre-processing. Indeed, standards such as MIAME (Minimum information about a microarray experiment), MIAPE (Minimum information about a preoteomics experiment), MSI (Metabolomics standards initiative), MIMIx (Minimum information required for reporting a molecular interaction experiment) have been proposed and implemented for presenting and exchanging gene expression

Conclusion

In summary, analysis of short time-series microarrays is still at an early stage. Most studies using short time-series data have applied methods that had been developed for static or long time-series microarray data, and which tend to perform poorly with limited temporal sampling. Current efforts, including simplification approaches and the integration of multi-source information, have shed promising light on improving the analysis of short time-series microarray data.

Future studies could combine both of these strategies to simultaneously decrease the complexity of continuous time-series representations, yet minimize the information loss with the simplification-based approaches by increasing the information content of the data. Gene-module-level analysis could be a potential solution, in which the concept of modularity not only plays a central role in incorporating multi-source biological information, but also reflect a simplification strategy focusing on groups of genes rather than individual ones. Gene-module-level analysis could efficiently combine both strategies.

A recent study by Hirose et al

Thus far, the predominant focus has still been on lower levels of analyses, such as detecting differently expressed genes or clustering genes with similar temporal profiles, whereas few higher levels of analysis, i.e. network construction, have been reported. With the rapid growth in availability of short time-series data, more theoretical and technical studies are urgently needed to provide practical solutions to exploit fully the potential of this wealth of data.

Acknowledgements

We thank Professor Neil T. Wright for providing critical comments on the content, and the editors for their valuable comments and suggestions in improving the paper. C.C is supported in part by the National Institute of Health (1R01GM079688-01), National Science Foundation (BES 0425821), and the MSU Foundation on the Center for Systems Biology.