This article is part of the supplement: Selected articles from The 8th Annual Biotechnology and Bioinformatics Symposium (BIOT-2011)

Open Access Research

Graph-based signal integration for high-throughput phenotyping

Jorge R Herskovic1*, Devika Subramanian2, Trevor Cohen1, Pamela A Bozzo-Silva13, Charles F Bearden1 and Elmer V Bernstam14

Author affiliations

1 School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA

2 Department of Computer Science, Rice University, Houston, TX, USA

3 Escuela de Medicina, Universidad de Los Andes, Santiago, Chile

4 Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13(Suppl 13):S2  doi:10.1186/1471-2105-13-S13-S2

Published: 24 August 2012



Electronic Health Records aggregated in Clinical Data Warehouses (CDWs) promise to revolutionize Comparative Effectiveness Research and suggest new avenues of research. However, the effectiveness of CDWs is diminished by the lack of properly labeled data. We present a novel approach that integrates knowledge from the CDW, the biomedical literature, and the Unified Medical Language System (UMLS) to perform high-throughput phenotyping. In this paper, we automatically construct a graphical knowledge model and then use it to phenotype breast cancer patients. We compare the performance of this approach to using MetaMap when labeling records.


MetaMap's overall accuracy at identifying breast cancer patients was 51.1% (n=428); recall=85.4%, precision=26.2%, and F1=40.1%. Our unsupervised graph-based high-throughput phenotyping had accuracy of 84.1%; recall=46.3%, precision=61.2%, and F1=52.8%.


We conclude that our approach is a promising alternative for unsupervised high-throughput phenotyping.