Identification and investigation of ORFans in the viral world
1 Computer Science and Engineering Dept, 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, USA
2 Bioinformatics/Dept. of Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
BMC Genomics 2008, 9:24 doi:10.1186/1471-2164-9-24Published: 19 January 2008
Genome-wide studies have already shed light into the evolution and enormous diversity of the viral world. Nevertheless, one of the unresolved mysteries in comparative genomics today is the abundance of ORFans – ORFs with no detectable sequence similarity to any other ORF in the databases. Recently, studies attempting to understand the origin and functions of bacterial ORFans have been reported. Here we present a first genome-wide identification and analysis of ORFans in the viral world, with focus on bacteriophages.
Almost one-third of all ORFs in 1,456 complete virus genomes correspond to ORFans, a figure significantly larger than that observed in prokaryotes. Like prokaryotic ORFans, viral ORFans are shorter and have a lower GC content than non-ORFans. Nevertheless, a statistically significant lower GC content is found only on a minority of viruses. By focusing on phages, we find that 38.4% of phage ORFs have no homologs in other phages, and 30.1% have no homologs neither in the viral nor in the prokaryotic world. Phages with different host ranges have different percentages of ORFans, reflecting different sampling status and suggesting various diversities. Similarity searches of the phage ORFeome (ORFans and non-ORFans) against prokaryotic genomes shows that almost half of the phage ORFs have prokaryotic homologs, suggesting the major role that horizontal transfer plays in bacterial evolution. Surprisingly, the percentage of phage ORFans with prokaryotic homologs is only 18.7%. This suggests that phage ORFans play a lesser role in horizontal transfer to prokaryotes, but may be among the major players contributing to the vast phage diversity.
Although the current sampling of viral genomes is extremely low, ORFans and near-ORFans are likely to continue to grow in number as more genomes are sequenced. The abundance of phage ORFans may be partially due to the expected vast viral diversity, and may be instrumental in understanding viral evolution. The functions, origins and fates of the majority of viral ORFans remain a mystery. Further computational and experimental studies are likely to shed light on the mechanisms that have given rise to so many bacterial and viral ORFans.