Open Access Open Badges Methodology article

Learning virulent proteins from integrated query networks

Eithon Cadag1*, Peter Tarczy-Hornoch2 and Peter J Myler23

Author affiliations

1 Ayasdi Inc, Palo Alto, CA, USA 94301

2 Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA 98195

3 Seattle Biomedical Research Institute, Seattle, WA, USA 98109

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2012, 13:321  doi:10.1186/1471-2105-13-321

Published: 2 December 2012



Methods of weakening and attenuating pathogens’ abilities to infect and propagate in a host, thus allowing the natural immune system to more easily decimate invaders, have gained attention as alternatives to broad-spectrum targeting approaches. The following work describes a technique to identifying proteins involved in virulence by relying on latent information computationally gathered across biological repositories, applicable to both generic and specific virulence categories.


A lightweight method for data integration is used, which links information regarding a protein via a path-based query graph. A method of weighting is then applied to query graphs that can serve as input to various statistical classification methods for discrimination, and the combined usage of both data integration and learning methods are tested against the problem of both generalized and specific virulence function prediction.


This approach improves coverage of functional data over a protein. Moreover, while depending largely on noisy and potentially non-curated data from public sources, we find it outperforms other techniques to identification of general virulence factors and baseline remote homology detection methods for specific virulence categories.