Why so many unknown genes? Partitioning orphans from a representative transcriptome of the lone star tick Amblyomma americanum
1 Department of Biology, Indiana University, Bloomington, IN, 47405, USA
2 The Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN, 47405, USA
3 Current address: School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
BMC Genomics 2013, 14:135 doi:10.1186/1471-2164-14-135Published: 27 February 2013
Genomic resources within the phylum Arthropoda are largely limited to the true insects but are beginning to include unexplored subphyla, such as the Crustacea and Chelicerata. Investigations of these understudied taxa uncover high frequencies of orphan genes, which lack detectable sequence homology to genes in pre-existing databases. The ticks (Acari: Chelicerata) are one such understudied taxon for which genomic resources are urgently needed. Ticks are obligate blood-feeders that vector major diseases of humans, domesticated animals, and wildlife. In analyzing a transcriptome of the lone star tick Amblyomma americanum, one of the most abundant disease vectors in the United States, we find a high representation of unannotated sequences. We apply a general framework for quantifying the origin and true representation of unannotated sequences in a dataset and for evaluating the biological significance of orphan genes.
Expressed sequence tags (ESTs) were derived from different life stages and populations of A. americanum and combined with ESTs available from GenBank to produce 14,310 ESTs, over twice the number previously available. The vast majority (71%) has no sequence homology to proteins archived in UniProtKB. We show that poor sequence or assembly quality is not a major contributor to this high representation by orphan genes. Moreover, most unannotated sequences are functional: a microarray experiment demonstrates that 59% of functional ESTs are unannotated. Lastly, we attempt to further annotate our EST dataset using genomic datasets from other members of the Acari, including Ixodes scapularis, four other tick species and the mite Tetranychus urticae. We find low homology with these species, consistent with significant divergence within this subclass.
We conclude that the abundance of orphan genes in A. americanum likely results from 1) taxonomic isolation stemming from divergence within the tick lineage and limited genomic resources for ticks and 2) lineage-specific genes needing functional genomic studies to evaluate their association with the unique biology of ticks. The EST sequences described here will contribute substantially to the development of tick genomics. Moreover, the framework provided for the evaluation of orphan genes can guide analyses of future transcriptome sequencing projects.