Log on / register
Feedback | Support | My details

This article is part of the supplement: 2006 International Workshop on Multiscale Biological Imaging, Data Mining and Informatics .

Open AccessResearch

Automatic image analysis for gene expression patterns of fly embryos

Hanchuan Peng1 email, Fuhui Long1 email, Jie Zhou2 email, Garmay Leung3 email, Michael B Eisen3,4 email and Eugene W Myers1 email

1Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA

2Department of Computer Science, Northern Illinois University, DeKalb, IL 60115, USA

3Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA

4Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

author email corresponding author email

BMC Cell Biology 2007, 8(Suppl 1):S7doi:10.1186/1471-2121-8-S1-S7

Published: 10 July 2007

Abstract

Background

Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns.

Results

We have developed algorithms and associated software that extracts a feature representation of a gene expression pattern from an ISH image, that clusters genes sharing the same spatio-temporal pattern of expression, that suggests transcription factor binding (TFB) site motifs for genes that appear to be co-regulated (based on the clustering), and that automatically identifies the anatomical regions that express a gene given a training set of annotations. In fact, we developed three different feature representations, based on Gaussian Mixture Models (GMM), Principal Component Analysis (PCA), and wavelet functions, each having different merits with respect to the tasks above. For clustering image patterns, we developed a minimum spanning tree method (MSTCUT), and for proposing TFB sites we used standard motif finders on clustered/co-expressed genes with the added twist of requiring conservation across the genomes of 8 related fly species. Lastly, we trained a suite of binary-classifiers, one for each anatomical annotation term in a controlled vocabulary or ontology that operate on the wavelet feature representation. We report the results of applying these methods to the Berkeley Drosophila Genome Project (BDGP) gene expression database.

Conclusion

Our automatic image analysis methods recapitulate known co-regulated genes and give correct developmental-stage classifications with 99+% accuracy, despite variations in morphology, orientation, and focal plane suggesting that these techniques form a set of useful tools for the large-scale computational analysis of fly embryonic gene expression patterns.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.