Table 4

Taxonomic distribution of unigene blastx hits in the nr database

Best blastx hit

Lowest common ancestor for blastx hits


Taxonomic category

Number of unigenes

Percent of unigenes with hits

Number of unigenes

Percent of Unigenes with hits


Eukaryotes

33,776

97.2%

32,059

92.3%

Green plants

33,406

96.2%

31,373

90.3%

"Green algae"

175

0.5%

78

0.2%

Land plants

33,231

95.7%

30,822

88.7%

"Bryophytes"

394

1.1%

2,197

6.3%

Vascular plants

32,837

94.5%

16,731

48.2%

Lycophytes

74

0.2%

13

0.0%

Ferns

928

2.7%

435

1.3%

Seed plants

31,835

91.6%

16,015

46.1%

Gymnosperms

8,000

23.0%

866

2.5%

Angiosperms

23,835

68.6%

10,572

30.4%

Animals

288

0.8%

63

0.2%

Fungi

0

0.0%

4

0.0%

Other eukaryotes

77

0.2%

12

0.0%

Bacteria

22

0.1%

91

0.3%

Artificial sequences, hits don't pass threshold, or taxon not assigned

20

0.1%

216

0.6%


Unigenes were searched in the NCBI nr protien database using blastx with an e-value threshold of 1e-10, keeping the best ten hits. Of the 56,256 unigenes, 34,740 (61.8%) had a positive hit. The lowest common ancestor (LCA) assignment for a sequence was calculated using the LCA algorithm implemented in MEGAN v3.9 [61] based on at least three blastx hits with a bitscore greater than 75 and within 10% of the best bitscore. Note: the predicted proteins from Selaginella moellendorffii are not currently included in the nr database and thus are not reflected in these results.

Der et al. BMC Genomics 2011 12:99   doi:10.1186/1471-2164-12-99

Open Data