Open Access Research article

IS-seq: a novel high throughput survey of in vivo IS6110 transposition in multiple Mycobacterium tuberculosis genomes

Alejandro Reyes12, Andrea Sandoval2, Andrés Cubillos-Ruiz2, Katherine E Varley1, Ivan Hernández-Neuta3, Sofía Samper45, Carlos Martín56, María Jesús García7, Viviana Ritacco8, Lucelly López9, Jaime Robledo1011, María Mercedes Zambrano2, Robi D Mitra1* and Patricia Del Portillo113*

Author Affiliations

1 Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, 63108, USA

2 Molecular Genetics, Corporación Corpogen, Bogotá, DC, Colombia

3 Molecular Biotechnology, Corporación Corpogen, Bogotá, DC, Colombia

4 Hospital Universitario Miguel Servet. IIS Aragón, Zaragoza, Spain

5 CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain

6 Departamento de Microbiología, Medicina Preventiva y Salud Pública, Universidad de Zaragoza, Zaragoza, Spain

7 Departamento de Medicina Preventiva, Facultad de Medicina, Universidad Autónoma de Madrid, Madrid, Spain

8 Instituto Nacional de Enfermedades Infecciosas Carlos G Malbrán, Buenos Aires, Argentina

9 Departamento de epidemiología, Universidad de Antioquia, Medellín, Colombia

10 Laboratorio de micobacterias, Corporación para Investigaciones Biológicas y, Universidad Pontificia Bolivariana, Medellín, Colombia

11 Centro Colombiano de Investigación en Tuberculosis (CCITB), Medellín, Colombia

For all author emails, please log on.

BMC Genomics 2012, 13:249  doi:10.1186/1471-2164-13-249

Published: 15 June 2012

Additional files

Additional file 1:

Table S1. RFLP, barcode and sequencing information for strains . Table S2 Loci with identified IS 6110 insertions. Table S3 2kb and 50Kb window analysis for IS distribution . Table S4 Loci with insertion sites in multiple strains . Table S5 Characteristic insertion sites associated with specific M. tuberculosis lineages. Table S6 Classification of strains based on spoligotyping, RFLP and IS-seq . Table S7 Classification of sequenced M. tuberculosis genomes using IS-seq data . Table S8 Association between characteristic IS 6110 and biological traits by mutual information. Table S9 Barcodes used in primers and adapters. Table S10 Primers used for insertion site validation.

Format: XLS Size: 378KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Figure S1. Distribution of Barcodes in Sequenced Samples. Out of approximately 14 million reads (2 x 35/60 bp), 13,243,263 contained a barcode from the adapter (A) and 8,313,986 contained the barcode and the IS 6110 specific primer (B). Different colors represent different barcodes used. The barcode in the IS 6110 specific primer was more evenly distributed (332,559 ± 84,930 reads per barcode) than the barcode in the adapter (551,803 ± 322, 632 reads per RFLP band). In the latter case the outlier barcodes corresponded to CCGG and CACGA that can potentially generate a hairpin with the adapter sequence, thus hampering the ligation reaction.

Format: PDF Size: 1.6MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Figure S2. - Detection of Insertion Sites Using IS-seq. The difference between observed (Number of insertions determined by IS-seq) and expected (Number of RFLP bands) sites is plotted against the coverage (reads per strain) obtained with IS-seq. Error bars indicate standard deviation. Red line shows the limit at which lower coverage resulted in reduced specificity of detection.

Format: PDF Size: 384KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Figure S3. Rarefaction Analysis of Insertion Sites. Random subsampling of the number of insertion sites were performed at different depths and plotted against the number of genes interrupted. Plots for all the insertions observed (green) or for unique insertions (blue) were performed. Non-linear fits for an exponential decay function were estimated (orange and black lines). Coefficient of determination (R2) for each regression is shown as well as the estimator for the K parameter, where K represents the maximum theoretical limit for each regression model, which corresponds to the lower and higher limits of the maximum number of genes, predicted to be susceptible to IS6110 in vivo.

Format: PDF Size: 193KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data