Open Access Highly Accessed Methodology article

De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley

Burkhard Steuernagel1, Stefan Taudien2, Heidrun Gundlach3, Michael Seidel3, Ruvini Ariyadasa1, Daniela Schulte1, Andreas Petzold2, Marius Felder2, Andreas Graner1, Uwe Scholz1, Klaus FX Mayer3, Matthias Platzer2 and Nils Stein1*

Author Affiliations

1 Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Correnstr. 3, D-06466 Gatersleben, Germany

2 Leibniz Institute for Age Research, Fritz Lipmann Institute (FLI), Beutenbergstr. 11, D-07745 Jena, Germany

3 MIPS/IBIS, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Ingolstädter Landstr. 1, D-85764 Neuherberg, Germany

For all author emails, please log on.

BMC Genomics 2009, 10:547  doi:10.1186/1471-2164-10-547

Published: 20 November 2009

Additional files

Additional file 1:

Table S1. Clone and sequencing read data. Various information about Clones, raw data and assemblies.

Format: XLS Size: 84KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Table S2. BlastN of 454 reads against BAC reference sequences. Raw 454 reads were mapped on Sanger reference sequences vis BlastN. The table number of covered and uncovered positions on the reference.

Format: XLS Size: 23KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Figure S1. Heat maps of N50 lengths of different assemblies. Heat maps visualizing the assembly results of 454 sequences of the four complete reference BACs (top: set 1; bottom: set 2) by MIRA under different combinations of hss (hash saving steps, X-axis) and bph (bases per hash, Y-axis). BACs from left to right are: 184G09, 259I16, 631P08, 711N16. Black fields indicate the hss/bph combinations resulting in the highest N50 values for the respective BAC. Dark to light gray fields mark values producing a contig with >90%, >50% and <50% of these values, respectively. White fields represent meaningless combinations (hss > bph).

Format: PNG Size: 83KB Download file

Open Data

Additional file 4:

Figure S2. Heat maps of N80 lengths of different assemblies. Heat maps visualizing the assembly results of 454 sequences of the four complete reference BACs (top: set 1; bottom: set 2) by MIRA under different combinations of hss (hash saving steps, X-axis) and bph (bases per hash, Y-axis). BACs from left to right are: 184G09, 259I16, 631P08, 711N16. Black fields indicate the hss/bph combinations resulting in the highest N80 values for the respective BAC. Dark to light gray fields mark values producing a contig with >90%, >50% and <50% of these values, respectively. White fields represent meaningless combinations (hss > bph).

Format: PNG Size: 83KB Download file

Open Data

Additional file 5:

Figure S3. Heat maps of N90 lengths of different assemblies. Heat maps visualizing the assembly results of 454 sequences of the four complete reference BACs (top: set 1; bottom: set 2) by MIRA under different combinations of hss (hash saving steps, X-axis) and bph (bases per hash, Y-axis). BACs from left to right are: 184G09, 259I16, 631P08, 711N16. Black fields indicate the hss/bph combinations resulting in the highest N90 values for the respective BAC. Dark to light gray fields mark values producing a contig with >90%, >50% and <50% of these values, respectively. White fields represent meaningless combinations (hss > bph).

Format: PNG Size: 71KB Download file

Open Data

Additional file 6:

Figure S4. Tupleplots of best MIRA assemblis versus Sanger reference sequence. Tupleplots show comparisons of best MIRA assembly contigs > 1 kb (y-axis) to the complete Sanger reference sequences (x-axis). a) 184G09 Set1/AY268139; b) 184G09 Set2/AY268139; c) 259I16 Set1/AF474373; d) 259I16 Set2/AF474373; e) 631P08 Set1/DQ249273; f) 631P08 Set2/DQ249273; g) 711N16 Set1/AF427791 (pos. 1...112.920); h) 711N16 Set2/AF427791 (pos. 1...112.920).

Format: PNG Size: 443KB Download file

Open Data

Additional file 7:

Figure S5. Tupleplots of default MIRA assemblis versus Sanger reference sequence. Tupleplots show comparisons of best MIRA assembly contigs > 1 kb (y-axis) to the complete Sanger reference sequences (x-axis). i) 184G09 Set1/AY268139; j) 184G09 Set2/AY268139; k) 259I16 Set1/AF474373; l) 259I16 Set2/AF474373; m) 631P08 Set1/DQ249273; n) 631P08 Set2/DQ249273; o) 711N16 Set1/AF427791 (pos. 1...112.920); p) 711N16 Set2/AF427791 (pos. 1...112.920).

Format: PNG Size: 455KB Download file

Open Data

Additional file 8:

Table S3. Sequencing errors in Mira assemblies. Assessment of sequencing error rate by comparing contigs of the best MIRA assemblies with the Sanger reference sequences.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Table S4. Assembly status of outliers within BAC set1 and insert sizes estimated by fingerprinting. Simple statistics about the outliers in BAC set1 show reasons in four of six cases. In those cases low coverage led to a high number of contigs. The low coverage resulted from either a very low number of reads or a very large insert size of the BAC.

Format: XLS Size: 17KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Figure S6. Coverage Diagram of BAC 551K24. Coverage of the best MIRA assembly contigs by 454 reads from BAC 551K24, set1. Black vertical lines separate the contigs; red horizontal lines indicate the median and the twofold median, respectively.

Format: PNG Size: 40KB Download file

Open Data

Additional file 11:

Figure S5. Coverage Diagram of BAC 569H14. Coverage of the best MIRA assembly contigs by 454 reads from BAC 569H14, set1. Black vertical lines separate the contigs; red horizontal lines indicate the median and the twofold median, respectively.

Format: PNG Size: 45KB Download file

Open Data

Additional file 12:

Table S5. Coverages (completeness) of genes and gene parts calculated by comparison of best MIRA assembly contigs to the protein database of B. distachyon, O. sativa and S. bicolor using Genome Threader.

Format: XLS Size: 71KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

Table S6. Number of misassemblies/chimeric contigs using different MIRA assembly metrics. Misassemblies and chimeric contigs were counted by visual inspection of tuple-plots against Sanger reference sequences. Largest contig, largest N50, N80 and N90 contigs and default parameter setting were regarded as possible metrics.

Format: XLS Size: 25KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 14:

Table S7. Barcodes for BACs of set2. The table shows the list of barcodes that were used for tagging different BACs in set 2.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data