BMC Genomics

official impact factor 4.21

Open Access Database

Specialized microbial databases for inductive exploration of microbial genome sequences

Gang Fang1,3, Christine Ho1, Yaowu Qiu1, Virginie Cubas1, Zhou Yu1, Cédric Cabau1, Frankie Cheung1, Ivan Moszer2,3 and Antoine Danchin1,3*

Author Affiliations

1 HKU-Pasteur Research Centre, Dexter HC Man Building, 8, Sassoon Road, Pokfulam, Hong Kong, China

2 Plate-forme Intégration et Analyse Génomiques, Genopole, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France

3 Unité de Génétique des Génomes Bactériens, CNRS URA2171, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France

For all author emails, please log on.

BMC Genomics 2005, 6:14 doi:10.1186/1471-2164-6-14

Published: 7 February 2005

Abstract

Background

The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects.

Methods

The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented.

Results

Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html webcite, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns.

Conclusion

This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison.