Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Classification of viral zoonosis through receptor pattern analysis

Se-Eun Bae12 and Hyeon Seok Son12*

Author Affiliations

1 Laboratory of Computational Biology & Bioinformatics, Institute of Health and Environment, Graduate School of Public Health, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-742, Korea

2 Interdisciplinary Graduate Program in Bioinformatics, College of Natural Science, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-742, Korea

For all author emails, please log on.

BMC Bioinformatics 2011, 12:96  doi:10.1186/1471-2105-12-96

The electronic version of this article is the complete one and can be found online at:

Received:7 September 2010
Accepted:13 April 2011
Published:13 April 2011

© 2011 Bae and Son; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.



Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environmental factors. Viruses can be categorized according to genotype (ssDNA, dsDNA, ssRNA and dsRNA viruses). Among them, the RNA viruses exhibit particularly high mutation rates and are especially problematic for this reason. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species. In this study, we sought to predict zoonotic propensity through the analysis of receptor characteristics. We hypothesized that the major barrier to interspecies virus transmission is that receptor sequences vary among species--in other words, that the specific amino acid sequence of the receptor determines the ability of the viral envelope protein to attach to the cell.


We analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species and used a statistical discriminant analysis to predict the likelihood of transmission among species.


This study is an attempt to predict zoonosis through simple computational analysis of receptor sequence differences. Our method may be useful in predicting the zoonotic potential of newly discovered viral strains.


Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins [1]. Zoonosis can occur not only through direct transmission, but also through intermediate reservoirs or other environmental factors [2-4]. The zoonotic viruses can be categorized according to genotype; of the various classes of viruses, the RNA viruses exhibit the highest mutation rates [5]. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species [6,7]. The high mutation rate of envelope proteins [5] hinders the development of accurate vaccines, as does the great ability of the RNA viruses to infect host species in order to exploit host proteins for viral reproduction [8].

Lacking the ability to self-replicate, viruses must utilize the replication apparatus of their host cells [9]. Viral infection of a cell begins with attachment of the virus to the cell surface [6,10,11]. During attachment to the cell membrane, the viral envelope protein (a structural protein) interacts with the host-cell receptor protein(s) [12]. In non-envelope viruses, the capsid plays this role. The cell receptors that play a major role in viral attachment are predominantly membrane proteins of the immunoglobin superfamily [13-15]. The identification of virus-binding cellular receptors was rapidly accelerated in the late 1980s owing to developments in the use of monoclonal antibodies and molecular cloning techniques [15]. The various receptors that have been found are surface matrix structures containing carbohydrate, lipid, and protein moieties [1,16,17]. In some cases, viral attachment also exploits co-receptors. For example, HIV, which uses the CD4 molecule as its receptor, uses the CXCR4 and CCR5 co-receptors to strengthen the effectiveness of infection [1,14,18,19]. Similarly, hepatitis C virus utilizes CD81 as a receptor and LDLR as a co-receptor [20].

Since the host-cell range of a specific virus is predetermined by its ability to recognize specific receptors, the similarities between the receptors of its primary reservoir host cell and the potential human host cell play a major role in determining the likelihood of viral zoonosis. Here, we analysed zoonotic and non-zoonotic RNA viruses along with their cellular receptors in human and (non-human) primary reservoir species to extract the receptor characteristics common to zoonosis. Viruses not previously reported to infect humans were classified as non-zoonotic viruses. We excluded all viruses known to utilize co-receptors; i.e., only virus-receptor interactions occurring through virus tropism and pathogenesis were considered [5,21]. The receptors and viruses examined in this study are listed in Table 1.

Table 1. Similarity scores of host receptor pairs.

We hypothesized that the major barrier to the transmission of viruses between species is the difference in cellular receptor sequences. In other words, the specific amino acid sequence of the receptor should be the major determinant of the ability of the viral envelope protein to attach to the cell. Ordinary sequence alignment protocol tells us overall sequence similarity which we thought useful but insufficient because most receptors are membrane proteins and membrane proteins consist of distinctive hydrophobic and hydrophilic parts. Therefore, we analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species to predict the likelihood of transmission across species, including humans. To our best knowledge, this study is the first attempt to predict zoonosis through a simple analysis of receptor sequence similarities and differences. This method may be useful in predicting the zoonotic potential of newly discovered viral strains.

Results and Discussion

The pair-wise receptor sequence similarities (gSi,1, gSi,2, and gSi,3) between host-species pairs for each virus family are shown in Table 1. For logical comparisons, each virus contains at least one infected host (the primary reservoir, designated as "#" in Table 1). As shown in Table 1, the similarity scores for the infected group (g = 1) were high, ranging from 0.790 to 0.988 for 1Si,1, from 0.841 to 0.996 for 1Si,2, and 0.794 to 0.962 for 1Si,3. All pair-wise comparisons in group 1 (human vs. primary reservoir, primary reservoir vs. host, and human vs. host) yielded high similarity scores, indicating a high similarity among receptor sequences. The similarity scores were comparatively low in the non-infection group (g = 2), ranging from 0.092 to 0.440 for 2Si,1, from 0.108 to 0.432 for 2Si,2, and from 0.130 to 0.416 for 2Si,3. For group 2, both the primary host species and non-infected species are listed to illustrate the differences in similarity. In pair-wise comparisons, all the non-infection cases yielded low similarity values, i.e., the receptor sequences differed significantly from each other.

We assume that a low similarity in receptor sequences disfavors infection despite the existence of a common receptor. For example, enterovirus infects only Sus scrofa (pig); it does not infect Rattus norvegicus (rat) or Homo sapiens (human) because of the high transmission barrier. Similarly, for leukovirus, only Gallus gallus (chicken) is infected as a primary reservoir; because of the high transmission barrier, R. norvegicus and H. sapiens are not infected. These results imply that for non-infection cases, species barriers exist, and the propensity to cross the barrier is determined by the sequence similarity between the potential and primary host receptors.

Similarity scores for rabies virus were low between Canis lupus familiaris (domestic dog) and Bos Taurus (domestic cow) (2Si,1 = 0.280, 2Si,2 = 0.373, and 2Si,3 = 0.366) and also between B. taurus and H. sapiens (2Si,1 = 0.267, 2Si,2 = 0.371, and 2Si,3 = 0.416) but were high between C. l. familiaris and H. sapiens (1Si,1 = 0.947, 1Si,2 = 0.985, and 1Si,3 = 0.962). Clearly, C. l. familiaris is the primary reservoir, and transmission of the disease to H. sapiens is possible only because of the high human/dog receptor similarity. Thus, for particular viruses, transmission of disease may be species-selective, although common receptors exist among species. Furthermore, infection specificity may be determined by the species barrier, which results from receptor differences.

The values in Table 1 are plotted in Figure 1 to illustrate the differences among groups. The x- and y-axes denote gSi,1 and gSi,2, respectively, where "g" is the group classification. All pair-wise similarity scores are shown. Groups 1, 2 and 3 are each well separated in the colour-coded two-dimensional space. The results provide clear evidence that the receptor sequences from cases of cross-species infection are well separated from those of other infection cases. From these observations, we conclude that receptor differences are a major contributing factor to the potential of a specific viral strain to cross species barriers for transmission. In other words, the species dependence of infection is indirectly related to the receptor sequence similarity. This finding implies that once the receptor sequences of the primary reservoir and possible hosts are known, we might be able to predict the likelihood of viral disease transmission. The accuracy of these classifications can be judged by subsequent assessment of cases of actual zoonotic transmission to humans.

thumbnailFigure 1. Similarity scores of among groups. Three kinds of pair-wise similarity scores (gSi,1, gSi,2, gSi,3) are plotted in two dimensional space to show clear differences among groups. Groups 1, 2 and 3 are each well separated; the results show clearly that the receptor sequences from cases of cross-species infection are well distinguished from those of other infection cases.

Our analysis revealed significant differences in receptor similarity between infection and non-infection cases. The similarity values, and the experimentally determined group categories were fed into a statistical discriminant analysis to logically predict infection (or zoonosis, in the case of human infection). As described in the Materials and Methods section, the values Di2 (i = 1, 2, 3) were calculated from the data in the Table 1 to yield results of a specific discriminant analysis.

The statistical discriminant analysis was verified using a test set of four viruses that were deliberately excluded from the training set. The viruses whose groups were predicted using the discriminant analysis are shown in Table 2. The first virus, feline immunodeficiency virus (FIV), uses Felis catus (domestic cat) as its primary host and CD4 as its receptor. According to the literature [22,23], FIV infection of humans is rare but has been reported. Our method categorized this case as near-infection (G = 3). The second virus, classical swine fever virus, is known to be non-zoonotic and was classified as such by our method (G = 2). Thirdly, the encephalomyocarditis virus infects S. scrofa but has been known to cause sporadic infections in H. sapiens; it was classified as group 1 (G = 1) by our method. Finally, the Lass virus is known to be zoonotic and was classified as group 1 (G = 1) by our method.

Table 2. Virus group prediction.

In Table 2, the hydrophilic similarity scores (S1) show less consistency, comparing to the hydrophobic scores (S2), with the predictive values (G). From the result, it could be said that the hydrophobic characteristics of receptor sequence might be the key contributor to the prediction. However, this observation should only be carefully interpreted because the variables (S1, S2, S3) are complementary in the statistical process.


Our analysis of viral receptor sequences shows that the likelihood of viral infection correlates with the similarity in sequence of the primary and host receptors. This result is not surprising, because viral infection also inversely correlates with the inhibition of viral coat protein binding to the receptors. Importantly, we were able to establish this relationship at the amino acid sequence level, allowing for the prediction of possible human infection at an early stage of a viral outbreak, before the structures of viral coat proteins and receptors are known. Therefore, once the receptor sequences of primary reservoir and the potential host are known, the likelihood of viral infection can be predicted if the virus does not mutate too abruptly. Our simplistic approach needs further refinement because the complex processes of host tropism of viruses are largely ignored in our current method. For example, the process of host immune response could be included for better prediction of zoonosis. Although further refinements of our methods and analyses of larger databases are needed, this simple conceptual approach may be useful, even now, as a basic tool for the classification of zoonosis of new viral species.


Data collection

Viral infection requires the insertion of viral genes into host cells. Such a process begins with the binding of coat proteins to host receptors, and in some cases, co-receptors [24]. Ten RNA viruses (seven zoonotic viruses and three non-zoonotic viruses) were investigated. Viruses that use co-receptors were excluded from the study. Receptor sequence data for each virus were collected from the National Center for Biotechnology Information webcite, and the research literature was examined to determine the specific species tropism of each virus [[25], webcite]. The viruses, host species, receptors, receptor sequences, and infection information for each host are shown in Table 1. We selected viruses that are each a representative of a different family, with different primary reservoirs. Viruses with unknown or poorly defined host receptors (particularly human receptors) were excluded from the study. Orthologues of the human receptor sequences for the non-zoonotic viruses were collected to allow for clear comparison with zoonosis cases.

Discriminant analysis for data analysis

To calculate sequence similarities among host receptors for each virus, we first conducted a pair-wise sequence alignment using Clustal X [26,27]. We verified the alignment results with BLAST [28] and prank [[29], webcite] and both alignment tools produced same reliable results as Clustal X. From the resulting alignment, we counted the numbers of matched amino acids and calculated three kinds of sequence similarity scores. The total sequence similarity scores were defined as:


where Ntot is the total number of amino acids in one sequence string; ntot is the total number of matched amino acids in the sequence; Nphi and Npho are the numbers of hydrophilic and hydrophobic amino acids in the sequence, respectively; Nothers is the number of deleted amino acids (gaps/insertions in sequence) plus the number of amino acids with undetermined properties; nphi and npho are the numbers of hydrophilic and hydrophobic amino acids matched, respectively; and gSi,1 is the similarity score for hydrophilic residues of the ith row of infection group g. Here, there are only three groups: g = 1, 2, or 3, which are the infection, non-infection, and near-infection groups, respectively. The interspecies infection information was identified and classified among three infection states: group 1 (g = 1) represents infection; group 2 (g = 2) represents non-infection; and group 3 (g = 3) represents near-infection. By definition, if a group 1 species pair includes humans, then the infection is zoonotic. Decisions for grouping were made on the basis of experimental and epidemiological studies reported in the literature [4,30-33].

The variables (shown in Table 1) were arranged in matrices to allow for discriminant analysis, a method of multivariate analysis that can determine the group related to variables [34]. Each group has three columns and l, m, or n rows, depending on the numbers of variable sets. Here, the matrix for group 1 is defined as:

Similarly, 2S and 3S were defined as:


All of the related variables were tabulated as shown in Table 1. From the above matrices, three averages were found for each group:

The averages , , and for group 2 and , , and for group 3 were calculated similarly.

Three covariant matrices were constructed as:





Similar treatments yielded the 2C and 3C matrices, resulting in three covariance matrices (1C, 2C, and 3C). We then created a pool-within-class covariance matrix P. If we define L = 3l-1, M = 3m-1, and N = 3n-1, then:



We next found the inverse matrix I, where I = P-1. Because there were three groups in our study, we predicted the likelihood of infection for a virus of unknown infection condition by calculating the Mahalanobis distance (generally D2 = d1 × C-1 × Di).

Here, expansion of D2 yielded three equations:


where S1, S2, and S3 are the input variables; here, they were similarity variables of a virus of an unknown infection group.

Group classification (G) was identified using the criterion:

For example, if D12 is the minimum among three values from the above set of three equations, then G = 1; i.e., "group 1" is the group classification. To automate the mathematical process described above, we developed a Java computer program named ZOO. To evaluate the accuracy of our method and software, we analysed a test data set (described in the Results & Discussion section).

Authors' contributions

SEB and HSS have developed the methods and have conducted subsequent data analysis. Both authors have drafted, read and approved the manuscript.


We acknowledge the invaluable contribution of the researchers who have made their data publicly available. We thank K.T. No (Yonsei University) for his support. This work was partly supported by the Brain Korea 21 project.


  1. Baranowski E, Ruiz-Jarabo CM, Domingo E: Evolution of Cell Recognition by Viruses.

    Science 2001, 292:1102-1105. PubMed Abstract | Publisher Full Text OpenURL

  2. Schwabe CW: Veterinay medicine and human health. Baltimore, Williams & Wilkins; 1984. OpenURL

  3. Webber R: Communicable disease epidemiology and control.

    Am J Epidemiol 1998, 147:791-792. OpenURL

  4. Hugh-Jones ME, Hubbert WT, Hagstad HV: Zoonoses-recognition, control and prevention. Iowa: Iowa State University Press; 2008.

  5. Schneider-Schaulies J: Cellular receptors for viruses: links to tropism and pathogenesis.

    J Gen Virol 2000, 81:1413-1429. PubMed Abstract | Publisher Full Text OpenURL

  6. Dimmock NJ: Initial Stages in infection with Animal viruses.

    J Gen Virol 1982, 59:1-22. PubMed Abstract | Publisher Full Text OpenURL

  7. Wiley DC, Wilson IA, Skehel JJ: Structural identification of the antibody-binding sites of Hong Kong influenza haemagglutinin and their involvement in antigenic variation.

    Nature 1981, 289:373-378. PubMed Abstract | Publisher Full Text OpenURL

  8. Duffy S, Shackelton LA, Holmes EC: Rates of evolutionary change in viruses: patterns and determinats.

    Nat Rev Genet 2008, 1-10. OpenURL

  9. Horsfall FL Jr, Hardy PH Jr, Davenport FM: The significance of combinations between viruses and host cells.

    Bull N Y Acad 1948, 24:470-475. OpenURL

  10. Dales S: Early Events in Cell-Animal Virus Interactions.

    Bacteriol Rev 1973, 37:103-135. PubMed Abstract | PubMed Central Full Text OpenURL

  11. Lentz TL: The recognition event between virus and host cell receptor: a target for antiviral agents.

    J Gen Virol 1990, 71:751-766. PubMed Abstract | Publisher Full Text OpenURL

  12. Vrublevskaya VV, Korney AN, Smirnow SV, Morenkov OS: Cell-binding properties of glycoprotein B of Aujeszky's disease virus.

    Virus Res 2002, 86:7-19. PubMed Abstract | Publisher Full Text OpenURL

  13. Myszka DG, Sweet RW, Hensley P, Brigham-Burke M, Kwong PD, Hendrickson WA, Wyatt R, Sodroski J, Doyle ML: Energetics of the HIV gp120-CD4 binding reaction.

    PNAS 2000, 97:9026-9031. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  14. Wu L, Gerard NP, Wyatt R, Choe H, Parolin C, Ruffing N, Borsetti A, Cardoso AA, Desjardin E, Newman W, Gerard C, Sodroski J: CD4-induced interaction of primary HIV-1 gp120 glycoproteins with the chemokine receptor CCR-5.

    Nature 1996, 384:179-183. PubMed Abstract | Publisher Full Text OpenURL

  15. Hyypiä T: Virus Host Cell Receptors.

    Encyclopedia of life science 2006, 1-8. OpenURL

  16. Wang JH: Protein recognition by cell surface receptors: physiological receptors versus virus interactions.

    Trends Biochem Sci 2002, 27:122-126. PubMed Abstract | Publisher Full Text OpenURL

  17. Wimmer E: Cellular receptors for animal viruses. Cold Spring Harbor laboratory Press, Cold Spring Harbor, NY; 1994.

  18. Haywood AM: Virus Receptors: Binding, Adhesion Strengthening, and Changes in Viral Structure.

    J Virol 1994, 68:1-5. PubMed Abstract | PubMed Central Full Text OpenURL

  19. Reeves JD, Gallo SA, Ahmad N, Miamidian JL, Harvey PE, Sharron M, Pohlmann S, Sfakianos JN, Derdeyn CA, Blumenthal R, Hunter E, Doms RW: Sensitivity of HIV-1 to entry inhibitors correlates with envelope/coreceptor affinity, receoptor density, and fusion kinetics.

    Proc Natil Acad Sci 2002, 99:16249-16254. Publisher Full Text OpenURL

  20. Pileri P, Uematsu Y, Campagnoli S, Galli G, Falugi F, Petracca R, Weiner AJ, Houghton M, Rosa D, Grandi G, Abrignani S: Binding of Hepatitis C Virus to CD81.

    Science 1998, 282:938-941. PubMed Abstract | Publisher Full Text OpenURL

  21. Gareth MJ, Andrew R, Pybus OG, Holmes EC: Rates of Molecular Evolution in RNA Viruses: A Quantitative Phylogenetic Analysis.

    J Mol Evol 2002, 54:156-165. PubMed Abstract | Publisher Full Text OpenURL

  22. Faure E: Could FIV zoonosis responsible of the breakdown of the pathocenosis which has reduced the European CCR5-Delta32 allele frequencies?

    Virol J 2008, 5:119. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  23. VandeWoude S, Apeterei C: Going Wild: Lessons from Naturally Occurring T-Lymphotropic Lentiviruses.

    Clin Microbiol Rev 2006, 19:728-762. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  24. Berger EA, Murphy PM, Farber JM: Chemokine receptors as HIV-1 coreceptors; roles in viral entry, tropism, and disease.

    Annu, Rev Immunol 1999, 17:657-700. Publisher Full Text OpenURL

  25. Fauguet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA: 8th Reports of the international committee on Taxonomy of viruses. Academic Press;

  26. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.

    Nucleic Acids Research 1997, 25:4876-4882. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  27. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs.

    Nucleic acids res 2003, 31:3497-3500. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  28. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

    Nucleic Acids Res 1997, 25(17):3389-402. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  29. Löytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions.

    PNAS 2005, 102:10557-10562. PubMed Abstract | PubMed Central Full Text OpenURL

  30. Greger M: The Human/Animal Interface: emergence and resurgence of zoonotic infectious diseases.

    Crit Rev Microbiology 2007, 33:243-299. Publisher Full Text OpenURL

  31. Ryou WS: Virology. Life Science publishing; 2007. OpenURL

  32. Woolhouse MEJ, Gowtage-Sequeria S: Host Range and Emerging and Reemerging Pathogens.

    Emerg Infect Dis 2005, 11:1842-1847. PubMed Abstract OpenURL

  33. Baltimore D: Expression of Animal Virus Genomes.

    Bacteriol Rev 1971, 35:235-241. PubMed Abstract | PubMed Central Full Text OpenURL

  34. Fisher RA: The use of multiple measurement in taxonomic problems.

    Ann Eugenics 2 1936, 179-188. Publisher Full Text OpenURL