Log on / register
Feedback | Support | My details
Open AccessResearch article

Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling

Yee Leng Yap1 email, Xue Wu Zhang1 email and Antoine Danchin2 email

HKU-Pasteur Research Centre, Dexter H.C. Man Building, 8 Sassoon Road Pokfulam, Hong Kong

Institute Pasteur, Unité de Génétique des Génomes Bactériens, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France

author email corresponding author email

BMC Bioinformatics 2003, 4:43doi:10.1186/1471-2105-4-43

Published: 20 September 2003

Abstract

Background

The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question. The genomic sequence relationship of SARS-CoV with 30 different single-stranded RNA (ssRNA) viruses of various families was studied using two non-standard approaches. Both approaches began with the vectorial profiling of the tetra-nucleotide usage pattern V for each virus. In approach one, a distance measure of a vector V, based on correlation coefficient was devised to construct a relationship tree by the neighbor-joining algorithm. In approach two, a multivariate factor analysis was performed to derive the embedded tetra-nucleotide usage patterns. These patterns were subsequently used to classify the selected viruses.

Results

Both approaches yielded relationship outcomes that are consistent with the known virus classification. They also indicated that the genome of RNA viruses from the same family conform to a specific pattern of word usage. Based on the correlation of the overall tetra-nucleotide usage patterns, the Transmissible Gastroenteritis Virus (TGV) and the Feline CoronaVirus (FCoV) are closest to SARS-CoV. Surprisingly also, the RNA viruses that do not go through a DNA stage displayed a remarkable discrimination against the CpG and UpA di-nucleotide (z = -77.31, -52.48 respectively) and selection for UpG and CpA (z = 65.79,49.99 respectively). Potential factors influencing these biases are discussed.

Conclusion

The study of genomic word usage is a powerful method to classify RNA viruses. The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.