An approach to comparing tiling array and high throughput sequencing technologies for genomic transcript mapping
1 Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA
2 Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305, USA
3 Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
4 Department of Computer Science, Yale University, New Haven, CT 06520, USA
BMC Research Notes 2009, 2:150 doi:10.1186/1756-0500-2-150Published: 24 July 2009
There are two main technologies for transcriptome profiling, namely, tiling microarrays and high-throughput sequencing. Recently there has been a tremendous amount of excitement about the latter because of the advent of next-generation sequencing technologies and its promises. Consequently, the question of the moment is how these two technologies compare. Here we attempt to develop an approach to do a fair comparison of transcripts identified from tiling microarray and MPSS sequencing data.
This comparison is a challenging task because the sequencing data is discrete while the tiling array data is continuous. We use the published rice and Arabidopsis datasets which provide currently best matched sets of arrays and sequencing experiments using a slightly earlier generation of sequencing, the MPSS tag sequencing technology. After scoring the arrays consistently in both the organisms, a first pass comparison reveals a surprisingly small overlap in transcripts of 22% and 66% respectively, in rice and Arabidopsis. However, when we do the analysis in detail, we find that this is an underestimate. In particular, when we map the probe intensities onto the sequencing tags and then look at their intensity distribution, we see that they are very similar to exons. Furthermore, restricting our comparison to only protein-coding gene loci revealed a very good overlap between the two technologies.
Our approach to compare genome tiling microarray and MPSS sequencing data suggests that there is actually a reasonable overlap in transcripts identified by the two technologies. This overlap is distorted by the scoring and thresholding in the tiling array scoring procedure.