Table 1

Predicted clusters of readthrough proteins

Cluster description

Codon

Size

Example organism (locus)


Selenocysteine

Formate dehydrogenase α subunit

TGA

45

Escherichia coli (b1474)

Selenide water dikinase

TGA

12

Haemophilus influenzae (HI0200m)

Glycine reductase complex selenoprotein A

TGA

6

Treponema denticola (TDE0745)

Glycine reductase complex selenoprotein B

TGA

6

Treponema denticola (TDE0078)

Heterodisulfide reductase subunit A

TGA

6

Methanococcus jannaschii (MJ1190m)

Coenzyme F420-reducing hydrogenase δ subunit

TGA

5

Methanococcus jannaschii (MJ1190a)

Formylmethanofuran dehydrogenase subunit B

TGA

4

Methanococcus jannaschii (MJ1194m)

Glutaredoxin-like

TGA

3

Carboxydothermus hydrogenoformans (CHY_0740)

Thioredoxin

TGA

3

Geobacter sulfurreducens (GSU3446)

Coenzyme F420-reducing hydrogenase α subunit

TGA

3

Methanococcus jannaschii (MJ0029)

HesB family

TGA

3

Desulfovibrio vulgaris (DVU_1382)

HesB family

TGA

2

Methanococcus maripaludis (MMP0252 + upstream)

Fe-S oxidoreductase

TGA

2

Desulfotalea psychrophila (DP1009)

DsbA-like

TGA

2

Desulfovibrio desulfuricans (Dde_1263 + upstream)

Periplasmic [NiFeSe] hydrogenase large subunit

TGA

2

Desulfovibrio vulgaris (DVU_1918)

Pyrrolysine

Monomethylamine methyltransferase

TAG

7

Methanosarcina acetivorans (MA0144)

Dimethylamine methyltransferase

TAG

7

Methanosarcina acetivorans (MA0532)

Trimethylamine methyltransferase

TAG

6

Methanosarcina acetivorans (MA0528)

Transcriptional regulator, TetR family

TAG

2

Methanosarcina acetivorans (MA2902)

Unknown

Cytochrome c family protein

TGA

2

Geobacter sulfurreducens (GSU2937 + GSU2936)

Hypothetical protein

TAG

2

Geobacter sulfurreducens (GSU2293 + downstream)


A plus sign in a locus indicates that the genomic coordinates of the iORF can be described by a concatenation of two genes or regions. For example, "GSU2293 + downstream" means that the iORF consists of the gene GSU2293 and its downstream sequence. HesB family was not clustered into one family, because their sequences were too short and diverged.

Fujita et al. BMC Bioinformatics 2007 8:225   doi:10.1186/1471-2105-8-225

Open Data