New concepts for building vocabulary for cell image ontologies
Biochemical Science Division, National Institute of Standards and Technology Gaithersburg MD 20899, USA
BMC Bioinformatics 2011, 12:487 doi:10.1186/1471-2105-12-487Published: 21 December 2011
There are significant challenges associated with the building of ontologies for cell biology experiments including the large numbers of terms and their synonyms. These challenges make it difficult to simultaneously query data from multiple experiments or ontologies. If vocabulary terms were consistently used and reused across and within ontologies, queries would be possible through shared terms. One approach to achieving this is to strictly control the terms used in ontologies in the form of a pre-defined schema, but this approach limits the individual researcher's ability to create new terms when needed to describe new experiments.
Here, we propose the use of a limited number of highly reusable common root terms, and rules for an experimentalist to locally expand terms by adding more specific terms under more general root terms to form specific new vocabulary hierarchies that can be used to build ontologies. We illustrate the application of the method to build vocabularies and a prototype database for cell images that uses a visual data-tree of terms to facilitate sophisticated queries based on a experimental parameters. We demonstrate how the terminology might be extended by adding new vocabulary terms into the hierarchy of terms in an evolving process. In this approach, image data and metadata are handled separately, so we also describe a robust file-naming scheme to unambiguously identify image and other files associated with each metadata value. The prototype database http://sbd.nist.gov/ webcite consists of more than 2000 images of cells and benchmark materials, and 163 metadata terms that describe experimental details, including many details about cell culture and handling. Image files of interest can be retrieved, and their data can be compared, by choosing one or more relevant metadata values as search terms. Metadata values for any dataset can be compared with corresponding values of another dataset through logical operations.
Organizing metadata for cell imaging experiments under a framework of rules that include highly reused root terms will facilitate the addition of new terms into a vocabulary hierarchy and encourage the reuse of terms. These vocabulary hierarchies can be converted into XML schema or RDF graphs for displaying and querying, but this is not necessary for using it to annotate cell images. Vocabulary data trees from multiple experiments or laboratories can be aligned at the root terms to facilitate query development. This approach of developing vocabularies is compatible with the major advances in database technology and could be used for building the Semantic Web.