Anatomical information is crucial to human biomedical research but not all research is based on human tissues. However, exploiting discoveries in model organisms such as the mouse at a systems level, involving metabolic and developmental networks in tissues, requires the identification of the links between human and model organism anatomies. The question is: can we exploit similarities between mouse and human to automatically associate data between them?
We start with the current anatomy ontologies for mouse (the Mouse Anatomy Nomenclature ) and for human. Consider the arterial system (EHDAA.1024) and its counterpart in the mouse ontology (EMAPA.16371):
Human/ extraembyonic component/ vascular component/ arterial system Mouse/ extraembryonic component/ cardiovascular system/ arterial system
Are the two tissues similar? These may be structurally different paths made up of lexically similar terms or structurally similar paths made up of some different terms. In fact, either tissue path in one species may have no corresponding tissue path in the other species.
We use language processing to normalize the ontologies' paths. This includes regularizing spelling variants, removing stop words, stemming and lemmatizing content words, and treating the descriptors in an individual node label as a set. Next, tissue pairs above a similarity threshold are assessed structurally. This involves viewing the ontologies as graphs with directed but unlabelled edges .
At similarity thresholds above 90%, we found the percentage of structurally compatible matches varied between 84.7% and 92.8%.