Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods
1 School of ITEE, The University of Queensland, Brisbane, Australia
2 Bone Dysplasia Research Group, UQ Centre for Clinical Research (UQCCR), University of Queensland, Brisbane, Australia
3 Genetic Health Queensland, Royal Brisbane and Women’s Hospital, Herston, Brisbane, Australia
BMC Bioinformatics 2012, 13:265 doi:10.1186/1471-2105-13-265Published: 15 October 2012
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly combines qualities and anatomical entities. The first step in this process is the segmentation of the phenotype descriptions into their atomic elements.
We present a two-phase hybrid segmentation method that combines a series individual classifiers using different aggregation schemes (set operations and simple majority voting). The approach is tested on a corpus comprised of skeletal phenotype descriptions emerged from the Human Phenotype Ontology. Experimental results show that the best hybrid method achieves an F-Score of 97.05% in the first phase and F-Scores of 97.16% / 94.50% in the second phase.
The performance of the initial segmentation of anatomical entities and qualities (phase I) is not affected by the presence / absence of external resources, such as domain dictionaries. From a generic perspective, hybrid methods may not always improve the segmentation accuracy as they are heavily dependent on the goal and data characteristics.