Comparative analysis of mycobacterium and related actinomycetes yields insight into the evolution of mycobacterium tuberculosis pathogenesis
1 Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
2 DOE Joint Genome Institute, Walnut Creek, CA, USA
3 Department of Biomedical Engineering, Boston University, Boston, MA, USA
4 Departments of Microbiology and National Emerging Infectious Diseases Laboratories, Boston University, Boston, MA, USA
5 VIB Department of Plant Systems Biology, Ghent University, Technologiepark 927, 9052 Ghent, Belgium
6 Stanford University, Palo Alto, CA, USA
7 FLIR, Chem-Bio Detection, 505 Coast Boulevard South, Suite 309, La Jolla, CA 92037, USA
8 Department of Systems Biology, Harvard Medical School, 200 Longwood Ave., Boston, MA 02115, USA
9 The Broad Institute, 7 Cambridge Center, Cambridge, MA 02142, USA
BMC Genomics 2012, 13:120 doi:10.1186/1471-2164-13-120Published: 28 March 2012
The sequence of the pathogen Mycobacterium tuberculosis (Mtb) strain H37Rv has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other Mtb strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of Mtb pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of Mtb and M. bovis, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes, and Bifidobacterium longum.
Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four.
Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of Mtb in a comparative context, and are available online and through TBDB.org.