Identifying spatially similar gene expression patterns in early stage fruit fly embryo images: binary feature versus invariant moment digital representations
1 Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA
2 Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287-8809, USA
3 School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA
BMC Bioinformatics 2004, 5:202 doi:10.1186/1471-2105-5-202Published: 16 December 2004
Modern developmental biology relies heavily on the analysis of embryonic gene expression patterns. Investigators manually inspect hundreds or thousands of expression patterns to identify those that are spatially similar and to ultimately infer potential gene interactions. However, the rapid accumulation of gene expression pattern data over the last two decades, facilitated by high-throughput techniques, has produced a need for the development of efficient approaches for direct comparison of images, rather than their textual descriptions, to identify spatially similar expression patterns.
The effectiveness of the Binary Feature Vector (BFV) and Invariant Moment Vector (IMV) based digital representations of the gene expression patterns in finding biologically meaningful patterns was compared for a small (226 images) and a large (1819 images) dataset. For each dataset, an ordered list of images, with respect to a query image, was generated to identify overlapping and similar gene expression patterns, in a manner comparable to what a developmental biologist might do. The results showed that the BFV representation consistently outperforms the IMV representation in finding biologically meaningful matches when spatial overlap of the gene expression pattern and the genes involved are considered. Furthermore, we explored the value of conducting image-content based searches in a dataset where individual expression components (or domains) of multi-domain expression patterns were also included separately. We found that this technique improves performance of both IMV and BFV based searches.
We conclude that the BFV representation consistently produces a more extensive and better list of biologically useful patterns than the IMV representation. The high quality of results obtained scales well as the search database becomes larger, which encourages efforts to build automated image query and retrieval systems for spatial gene expression patterns.