Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Missing genes in the annotation of prokaryotic genomes

Andrew S Warren12*, Jeremy Archuleta2, Wu-chun Feng2 and João Carlos Setubal12*

Author Affiliations

1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA, USA

2 Department of Computer Science, Virginia Tech, Blacksburg, VA, USA

For all author emails, please log on.

BMC Bioinformatics 2010, 11:131  doi:10.1186/1471-2105-11-131

Published: 15 March 2010

Additional files

Additional file 1:

Table S1. Criteria for classifying ORFs. An ORF must meet all the requirements for a particular category to be classified in that category.

Format: PDF Size: 41KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Table S2. Details for all missing genes. Includes NCBI Refseq ID for the replicon of origin, unique (per replicon) gene ID, Start bp coordinate, Stop bp coordinate, Length (AA), α score, Cluter ID, boolean whether the rep-sequence has a hit to nr-aa, boolean whether it has a hit to InterPro, Taxonomic Order, Family Genus, and whether the sequence has a predicted upstream RBS.

Format: PDF Size: 135KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Table S3. Details for all missing gene groups. Includes cluster id, average α-score, average length, average percent identity, whether the representative sequence had a hit against nr-aa, whether the sequence had a domain result from interproscan, the e-value of the hit to nr-aa, the percent identity of hit to nr-aa, the number of replicons in the group, number of chromsomes, number of plasmids, average MUMi value between families in the group, and whether the multiple alignment of the group indicated a region of ultra-conservation.

Format: PDF Size: 35KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

AA sequences file. Amino acid sequences for the missing genes.

Format: TXT Size: 134KB Download file

Open Data

Additional file 5:

NT sequences file. Representative nucleotide sequences for each missing gene group.

Format: TXT Size: 91KB Download file

Open Data

Additional file 6:

InterPro domain results. InterProScan results for the representative amino acid sequence for each group.

Format: PDF Size: 13KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data