Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Open Badges Software

Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations

Rajalakshmi Gurunathan12, Bernard Van Emden13, Sethuraman Panchanathan2 and Sudhir Kumar13*

Author Affiliations

1 Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA

2 Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287-8809, USA

3 School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA

For all author emails, please log on.

BMC Bioinformatics 2004, 5:202  doi:10.1186/1471-2105-5-202

Published: 16 December 2004



Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns.


The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches.


We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns.