Open Access Highly Accessed Research article

Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing

Shrinivasrao P Mane1, Clive Evans1, Kristal L Cooper1, Oswald R Crasta1, Otto Folkerts1, Stephen K Hutchison2, Timothy T Harkins3, Danielle Thierry-Mieg4, Jean Thierry-Mieg4 and Roderick V Jensen5*

Author Affiliations

1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA

2 454 Life Sciences, Inc., 20 Commercial Street, Branford, CT 06405, USA

3 Roche Applied Science, Indianapolis, IN 46250, USA

4 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA

5 Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA

For all author emails, please log on.

BMC Genomics 2009, 10:264  doi:10.1186/1471-2164-10-264

Published: 12 June 2009

Additional files

Additional file 1:

Supplementary Materials. Contains detailed Supplementary Methods for the preparation of the samples for Transcriptome Sequencing, the Supplementary Analysis for determining the scaling properties of the depth of coverage curves, the Supplementary Figure comparing the relative accuracy of the ExpressSeq results with different microarray platforms, and the Supplementary Table containing the guide to the different sequencing runs and data files.

Format: DOC Size: 176KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 2:

ExpressSeq Read Counts for the A and B Samples. An Excel Workbook provides the NM "hit" counts for each of the 22 sequencing regions on the 11 full GS FLX sequencing plates described in Supplementary Table 1 in Additional file 1. The Excel file contains 4 Worksheets providing the hit counts for the sequencing runs for the A and B samples processed using either the TSEQ or ODT protocols. The first 2 columns of each Worksheet contain the RefSeq NM number and description. The third column provides the RefSeq transcript length and the subsequent columns give the numbers of reads that hit each transcript with evalues < 1.0e-20 for each sequencing region.

Format: XLS Size: 11.3MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

All Standard Exon Junctions. This compressed tab delimited text file contains the genomic locations (by chromosome number and position) of each of the 137,899 standard exon junctions identified by the stringent AceView alignment of the 3.6 million reads for the MAQC A and B samples generated in this study. In addition, the supporting reads are listed for each junction using the unique alphanumeric names for each sequencing run. [See the Supplementary Table of Additional file 1 for a guide.]

Format: GZ Size: 5.6MB Download file

Open Data

Additional file 4:

New Cassette Exons and Skipped Exons. The first Worksheet of this MS Excel file contains the genomic locations (by chromosome number and position) and the identities of the supporting reads for each end of the 912 new cassette exons found inside of a RefSeq intron identified by the stringent AceView alignment of the 3.6 million reads for the MAQC A and B samples generated in this study. The second Worksheet contains the locations of the 249 novel candidate cassette exons that had not been previously identified by either RefSeq or AceView. The third and fourth Worksheets provide the genomic locations and supporting reads of the 504 new skipped exons absent in RefSeq and the 192 novel skipped exons missing from both RefSeq and AceView. The sequences for the supporting reads for each exon boundary can be found in the Short Read Archives Accession Number [NCBI:SRA003647] with the assistance of the Supplementary Table in the Additional file 1. The last two Worksheets provide a list of diseases associated with the gene loci where these new cassette and skipped exons are found.

Format: XLS Size: 438KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data