Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Software

Annotation and query of tissue microarray data using the NCI Thesaurus

Nigam H Shah1*, Daniel L Rubin1, Inigo Espinosa2, Kelli Montgomery2 and Mark A Musen1

Author Affiliations

1 Stanford Medical Informatics, School of Medicine, Stanford University, Stanford, CA 94305, USA

2 Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8:296  doi:10.1186/1471-2105-8-296

Published: 8 August 2007

Abstract

Background

The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.

Results

We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.

Conclusion

We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.