Identification, characterization, and utilization of single copy genes in 29 angiosperm genomes
Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Beijing 100193, PR China
BMC Genomics 2014, 15:504 doi:10.1186/1471-2164-15-504Published: 21 June 2014
Single copy genes are common across angiosperm genomes. With the sufficiently high quality sequenced genomes, the identification of large-scale single copy genes among multiple species is possible. Although some characteristics have been reported, our study provides novel insights into single copy genes.
We identified single copy genes across 29 angiosperm genomes. A significant negative correlation was found between the number of duplicate blocks and the number of single copy genes. We found that a considerable number of single copy genes are located in organelles, showing a preference for binding and catalytic activity. The analysis of effective number of codons (Nc) illustrates that single copy genes have a stronger codon bias than non-single copy genes in eudicots. The relative high expression level of single copy genes was partially confirmed by the RNA-seq data, rather than the Codon Adaptation Index (CAI). Unlike in most other species, a strongly negatively correlation occurs between Nc and GC3 among single copy genes in grass genomes. When compared to all non-single copy genes, single copy genes indicate more conservation (as indicated by Ka and Ks values). But our alternative splicing (AS) results reveal that selective constraints are weaker in single copy genes than in low copy family genes (1–10 in-paralogs) and stronger than high copy family genes (>10 in-paralogs). Using concatenated shared single copy genes, we obtained a well-resolved phylogenetic tree. With the addition of intron sequences, the branch support is improved, but striking incongruences are also evident. Therefore, it is noteworthy that inclusion of intron sequences seems more appropriate for the phylogenetic reconstruction at lower taxonomic levels.
Our analysis provides insight into the evolutionary characteristics of single copy genes across 29 angiosperm genomes. The results suggest that there are key differences in evolutionary constraints between single copy genes and non-single copy genes. And to some extent, these evolutionary constraints show some species-specific differences, especially between eudicots and monocots. Our preliminary evidence also suggests that the concatenated shared single copy genes are well suited for use in resolving phylogenetic relationships.