Annotation and query of tissue microarray data using the NCI Thesaurus
1 Stanford Medical Informatics, School of Medicine, Stanford University, Stanford, CA 94305, USA
2 Department of Pathology, School of Medicine, Stanford University, Stanford, CA 94305, USA
BMC Bioinformatics 2007, 8:296 doi:10.1186/1471-2105-8-296Published: 8 August 2007
The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult.
We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use.
We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.