Computational reconstruction of proteome-wide protein interaction networks between HTLV retroviruses and Homo sapiens
1 Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
2 Software College, Shenyang Normal University, Shenyang 110034, China
BMC Bioinformatics 2014, 15:245 doi:10.1186/1471-2105-15-245Published: 18 July 2014
Human T-cell leukemia viruses (HTLV) tend to induce some fatal human diseases like Adult T-cell Leukemia (ATL) by targeting human T lymphocytes. To indentify the protein-protein interactions (PPI) between HTLV viruses and Homo sapiens is one of the significant approaches to reveal the underlying mechanism of HTLV infection and host defence. At present, as biological experiments are labor-intensive and expensive, the identified part of the HTLV-human PPI networks is rather small. Although recent years have witnessed much progress in computational modeling for reconstructing pathogen-host PPI networks, data scarcity and data unavailability are two major challenges to be effectively addressed. To our knowledge, no computational method for proteome-wide HTLV-human PPI networks reconstruction has been reported.
In this work we develop Multi-instance Adaboost method to conduct homolog knowledge transfer for computationally reconstructing proteome-wide HTLV-human PPI networks. In this method, the homolog knowledge in the form of gene ontology (GO) is treated as auxiliary homolog instance to address the problems of data scarcity and data unavailability, while the potential negative knowledge transfer is automatically attenuated by AdaBoost instance reweighting. The cross validation experiments show that the homolog knowledge transfer in the form of independent homolog instances can effectively enrich the feature information and substitute for the missing GO information. Moreover, the independent tests show that the method can validate 70.3% of the recently curated interactions, significantly exceeding the 2.1% recognition rate by the HT-Y2H experiment. We have used the method to reconstruct the proteome-wide HTLV-human PPI networks and further conducted gene ontology based clustering of the predicted networks for further biomedical research. The gene ontology based clustering analysis of the predictions provides much biological insight into the pathogenesis of HTLV retroviruses.
The Multi-instance AdaBoost method can effectively address the problems of data scarcity and data unavailability for the proteome-wide HTLV-human PPI interaction networks reconstruction. The gene ontology based clustering analysis of the predictions reveals some important signaling pathways and biological modules that HTLV retroviruses are likely to target.