Table 2

Correspondence to mammalian genes and estimated efficiencies of cloning of start codons of EST assemblies
Unique Gene ID (without HomoloGene ID) Unique HomoloGene ID Assemblies matched to protein sequences Assemblies estimated to include start codons
Contigs Singlets Contigs Singlets
Human 13,691 (754) 12,911 64,011 12,056 51,955 47,229 9,635 37,594
Mouse 12,955 (730) 12,137 63,444 12,028 51,416 45,539 9,588 35,951
Cattle 13,445 (1935) 11,341 63,718 12,035 51,683 47,118 9,634 37,484
Dog 12,293 (763) 11,410 62,815 11,871 50,944 37,193 8,090 29,103
Pig 14,275 63,169 11,917 51,252 46,063 9,396 36,667

Numbers of genes that had unique NCBI Gene IDs and corresponded to contigs and singlets generated by assembly of expressed sequence tags (ESTs) are indicated. Also shown are the numbers that had unique Gene IDs in the NCBI HomoloGene database (a database of orthologs among species) and corresponded to the contigs and singlets generated. Numbers in parentheses indicate numbers of gene IDs that had no corresponding HomoloGene IDs. HomoloGene IDs in pigs are not indicated, because there is no HomoloGene ID database for pig genes.

EST assemblies were estimated to contain start codons if the length upstream of the matches (BLAST score >50) in the assemblies was greater than that between the start base of the coding sequence and the matched region of the corresponding gene. Numbers of assemblies (contigs and singlets) corresponding to protein sequences in humans, mice, cattle, dogs, and pigs are also shown.

Uenishi et al.

Uenishi et al. BMC Genomics 2012 13:581   doi:10.1186/1471-2164-13-581

Open Data