<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-333</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Deep analysis of cellular transcriptomes &#8211; LongSAGE versus classic MPSS</p>
         </title>
         <aug>
            <au id="A1" ce="yes">
               <snm>Hene</snm>
               <fnm>Lawrence</fnm>
               <insr iid="I1"/>
               <email>lawrenceh@hotmail.com</email>
            </au>
            <au id="A2" ce="yes">
               <snm>Sreenu</snm>
               <mi>B</mi>
               <fnm>Vattipally</fnm>
               <insr iid="I1"/>
               <email>sreenu.vattipally@imm.ox.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Vuong</snm>
               <mi>T</mi>
               <fnm>Mai</fnm>
               <insr iid="I1"/>
               <email>mai.vuong@ndm.ox.ac.uk</email>
            </au>
            <au id="A4">
               <snm>Abidi</snm>
               <mi>I</mi>
               <fnm>S Hussain</fnm>
               <insr iid="I1"/>
               <email>hussain.abidi@ccc.ox.ac.uk</email>
            </au>
            <au id="A5">
               <snm>Sutton</snm>
               <mi>K</mi>
               <fnm>Julian</fnm>
               <insr iid="I1"/>
               <email>julian.sutton@ndm.ox.ac.uk</email>
            </au>
            <au id="A6">
               <snm>Rowland-Jones</snm>
               <mi>L</mi>
               <fnm>Sarah</fnm>
               <insr iid="I1"/>
               <email>sarah.rowland-jones@ndm.ox.ac.uk</email>
            </au>
            <au id="A7" ca="yes">
               <snm>Davis</snm>
               <mi>J</mi>
               <fnm>Simon</fnm>
               <insr iid="I1"/>
               <email>simon.davis@ndm.ox.ac.uk</email>
            </au>
            <au ca="yes" id="A8">
               <snm>Evans</snm>
               <mi>J</mi>
               <fnm>Edward</fnm>
               <insr iid="I1"/>
               <email>edward.evans@ndm.ox.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, The University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DS, UK</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>333</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/333</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17892551</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-333</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>09</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>24</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>24</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Hene et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' <it>e.g</it>. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (<it>i.e</it>. as MPSS tag length increases).</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>In recent years, a number of techniques have emerged for large-scale gene expression analysis. Most are designed to compare the expression of many genes between cell types or under a number of different conditions. However, there has also been interest in techniques capable of identifying the complete transcriptome of a given cell or tissue. 'Closed' architecture systems, such as microarrays, are less suited to this application because they are limited by the extent to which global transcriptome coverage has been achieved. Even in organisms such as <it>Homo sapiens </it>where a complete genome sequence is now available, there remains uncertainty regarding the actual number of transcribed regions. This is true in the case of conventional genes and even more so if regions thought to yield polyadenylated non-coding RNAs are included <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Thus, at the present time, it would in principle be necessary to represent the whole genome on an array in order to test for all possible transcripts, which presents two major difficulties. First, there is the shear number of probes required to fully cover the human genome using tiling arrays: 51,874,388 probes on 134 arrays were required even for non-overlapping coverage of non-repetitive regions in a study undertaken in 2004 <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Second, there are the technical difficulties associated with designing consistently good probes covering the whole genome (discussed in, <it>e.g</it>., <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>). It may, therefore, be some time before all human genes can be confidently sampled in a conventional laboratory setting using such methodologies.</p>
         <p>Much use has therefore been made of 'open' gene-expression profiling methods requiring no <it>a priori </it>knowledge of the genes likely to be of interest <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Many of these techniques are based on the sequencing of short tags created from pooled transcripts. Until recently, tag-based expression profiling technologies had a key advantage over more traditional 'open' technologies such as expressed sequence tag (EST) or cDNA sequencing insofar as they efficiently and relatively inexpensively sample large numbers of transcripts. In SAGE, between 12 and 20 transcripts are sampled per sequencing reaction, compared to one EST or a fraction of a cDNA, whilst in MPSS all tags in a library (usually >1 million) are sequenced simultaneously. New sequencing techniques, such as LCM-454 technology <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, may allow rapid sequencing of very large EST libraries <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, but these may lack the quantitative nature of tag-based techniques because production and capture of the ESTs are likely to be length and/or sequence dependent. These technologies could, however, be used to sequence extremely large SAGE libraries.</p>
         <p>An additional advantage of 'open' technologies is that sensitivity can be improved to a great extent simply by increasing library size, allowing the identification of very weakly expressed transcripts. Such transcripts may be expressed at levels much less than one copy per cell because they are only present at, <it>e.g</it>., very specific points in the cell cycle or in response to particular levels of cellular stress that only apply to subsets of the cell population. One caveat to this is that background noise in the data, <it>e.g</it>. due to contaminating species, degradation or mis-priming, may limit the maximum sensitivity that can be achieved (for an example of how this can affect comparative analyses, see <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>). In contrast, for microarrays, sensitivity is limited by the inherent signal:noise ratio of the read-out technology itself, rather than only biological noise. Best estimates of the sensitivity for cDNA or long oligonucleotide arrays vary from 50 to 400 transcripts per million, whereas, using the same type of analysis, species present at an average count as low as 5 transcripts per million could be reliably identified as being differentially expressed in large-scale tag-sequencing experiments <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Estimates of detection sensitivity for short oligonucleotide arrays have not been calculated in the same manner, but others have claimed the reliable detection of transcripts expressed as weakly as 6&#8211;20 per million using Affymetrix GeneChips <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. However, this sensitivity was somewhat dependent on comparisons to a 'mismatched control' oligonucleotide, the results of which were compromised by variable cross-reactivity with the mismatched oligonucleotide.</p>
         <p>The disadvantage of sequencing very short tags is that it compromises identification of the transcripts corresponding to each tag. Ideally, every tag would map uniquely to both the genome and the transcriptome, and every transcript would be represented by at least one tag. Short sequence tag-based profiling was pioneered by Velculescu <it>et al</it>. in the form of conventional SAGE <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, which produces 14 bp tags from the 3'-most occurrence of an "anchoring" restriction site (usually <it>Nla</it>III) in polyadenylated transcripts. This might be thought to be sufficient to map uniquely to the transcriptome <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, but because transcript sequences are non-random, such tags are often too short to distinguish similar sequences. In addition, genomic mapping of the tags usually generates multiple hits, making the identification of novel genes extremely difficult.</p>
         <p>Other tag-based methodologies, especially those for gene identification and establishing transcriptional start points, have since been developed that generate longer tags from the 3' or 5' ends of transcripts, or both (reviewed in <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). Until now, the most common techniques that have been used for tag-based global expression analysis are LongSAGE <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and massively parallel signature sequencing (MPSS) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Little, if any, data has been generated or made available in the public databases with the newer sequencing technologies, such as LCM-454, which would in principle allow rapid production of extremely large EST and SAGE libraries <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> or Solexa's SBS technology <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, which has been adapted to tag-based expression profiling <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>LongSAGE is a modification of the standard SAGE protocol using a different type II restriction enzyme (<it>Mme</it>I rather than <it>BsmF</it>I) to generate a 21 bp tag at the anchor site (which remains <it>Nla</it>III). MPSS generated 20 bp tags anchored at the 3'-most <it>Dpn</it>II sites in transcripts, in a similar manner to SAGE. The unique feature of MPSS was the proprietary, bead-based sequencing technology, which was more efficient than standard Sanger sequencing and yielded far larger tag counts. As both methods significantly increase tag length compared to conventional SAGE, they were expected to improve the prospects for unique genome and transcriptome tag mapping, as suggested by the pilot-scale use of LongSAGE for genome annotation <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. As a proprietary technology that the parent company has ceased to offer, new MPSS libraries can no longer be generated. Nevertheless, large amounts of MPSS data are still being made available (see <it>e.g</it>. <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>).</p>
         <p>Although the ability of the MPSS and LongSAGE methods to identify abundant or differentially expressed genes has been compared, their capacity to provide complete transcriptome coverage has not. The number of transcripts expressed in a single cell can vary considerably depending on cell type, among other factors, but it has been estimated that a 'typical' human somatic cell contains ~400,000 mRNA molecules <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Given that a particular transcript species could be present at less than one copy per cell, <it>i.e</it>. less than two tags per million (tpm), full transcriptome coverage using tag-based methods can only be guaranteed if libraries containing several times this many tags are fully sequenced. Due to the efficiency of MPSS sequencing, it became feasible to sequence well in excess of 1 million tags per sample at a fraction of the cost of sequencing a similar number of LongSAGE tags. It has seemed, therefore, that MPSS was the technology most likely to offer the depth of sampling required for whole transcriptome coverage, but this has not been adequately tested.</p>
         <p>We previously analysed a CD8<sup>+ </sup>T-cell clone using conventional SAGE <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and found that a library of only 71,174 tags contained sequences corresponding to most, if not all, the transcripts encoding the surface molecules from that cell. However, some of these tags were found only once in the library and it is likely that transcripts from many other functional classes were not sampled at all. Similar-sized libraries have been generated from other leukocyte populations <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, and extensive microarray analysis has identified large numbers of transcripts differentially expressed among leukocyte subsets <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Herein, we compare the ability of SAGE and MPSS data to provide, as far as is currently feasible, access to the entire transcriptome of a T cell.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Systematic limitations</p>
            </st>
            <p>Before undertaking a direct comparison of the two transcriptome-profiling methods, we consider the systematic limitations of the methods, as previously done in a generalised way <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. First, for a restriction site-based tagging method to detect a given transcript, the transcript must contain that site. Using <it>Nla</it>III (as in SAGE) or <it>Dpn</it>II (as in MPSS), which each have four-base recognition sites, the recognition site ought to be present, on average, every 256 base pairs. However, some transcripts will not have these sites and both SAGE and MPSS are expected to be similarly affected. There are 13,665,294 and 410,369 <it>Nla</it>III sites in the human genome and in all the human sequences in Release 19 of the RefSeq database <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, respectively. The numbers for <it>Dpn</it>II sites are 7,112,355 and 253,936, so this site is rarer, suggesting that the ability of MPSS to tag more transcripts is in this way compromised. In RefSeq, excluding predicted transcripts from the genome, the proportion of cDNAs lacking the LongSAGE recognition site is less than 0.6% (144/24,261) whereas the proportion lacking the MPSS site is substantially higher, at ~2.3% (552). In terms of the total pool of transcripts, these numbers are relatively small, but cannot be overlooked if the entire transcriptome of a cell is to be identified. A better strategy would involve a combination of sites: only 39 of the 24,261 human sequences in RefSeq Release 19 lack both <it>Nla</it>III and <it>Dpn</it>II recognition sites.</p>
            <p>A second limitation of tag-based methods is the difficulty of matching each tag to a unique transcript. The single most important benefit of open expression technologies is their ability to identify previously uncharacterised genes, which requires that novel tags can be linked to sequenced transcripts or, if they have not been previously identified, to the genome. Analysis of <it>Nla</it>III and <it>Dpn</it>II sites in the human genome demonstrates the effect of tag length on transcript identification [see Additional file <supplr sid="S1">1</supplr>]. The vast majority (>95%) of all potential LongSAGE and MPSS tags are unique in the genome and transcriptome, compared with only 9% of potential conventional (14 bp) SAGE tags, indicating that these technologies significantly reduce the problem of unique transcript identification. These results reinforce and extend the results of Unneberg <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, obtained before LongSAGE was in use, which suggested that tags of at least 17 bp would be needed to find unique matches among human Unigene clusters. Nevertheless, the identification of novel genes using LongSAGE and MPSS tags is not straightforward, because apparently novel tags may arise via sequencing errors or genetic polymorphisms. The combination of LongSAGE and MPSS data should provide a powerful approach for identifying new transcriptional loci, since regions of the genome that are not known to code for any genes, but which contain matches to tags derived by both methods, would be highly likely to encode novel transcripts.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Effect of tag length on frequency of matches to the genome and transcriptome. Additional figure showing a histogram of the frequencies of every tag found in the Ensembl genome and transcriptome for various combinations of tagging enzyme and tag length.</p>
               </text>
               <file name="1471-2164-8-333-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Library production</p>
            </st>
            <p>LongSAGE and MPSS libraries were prepared from a single sample of RNA extracted from a CD4<sup>+ </sup>T-cell clone (clone 29) activated with beads coated in anti-CD3 and anti-CD28 antibodies. Clone 29 was derived from the peripheral blood mononuclear cells of a subject given a modified vaccinia virus Ankara (MVA) vaccine containing a polyprotein made from HIV-1 gag fused to a string of cytotoxic T-cell epitopes as part of a vaccine trial <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. It was established in culture with IL-7 and the overlapping HIV-1 gag peptides, KRWIILGLNKIVRMY and GEIYKRWIILGLNKI, but was shown to respond specifically to the former peptide. The likely clonality of the population was confirmed by analysis of TCR V&#946; chain usage, which showed exclusive expression of V&#946;17 (JS, SRJ and SHIA, unpublished). A single library of 503,431 LongSAGE tags, and 3 libraries containing a total of 4,274,992 MPSS tags, were sequenced.</p>
            <p>FACS analysis indicated that, prior to library generation, the activated T-cell clone expressed CD4, CD28, CD45 and CD69, but not CD27 or CD62L (data not shown). The LongSAGE data perfectly matched the FACS results and revealed the expression of each of the classical T-cell markers, <it>i.e</it>. all TCR/CD3 components, CD2, CD4, CD5, CD6, CD11a (LFA-1a), CD43, CD45 and CD53. The MPSS library, however, lacked tags corresponding to both CD3&#947; and CD69. The LongSAGE CD69 transcript tag derived from the 3' untranslated region (UTR), upstream of the only <it>Dpn</it>II site in the full length cDNA. Between the <it>Nla</it>III site and the <it>Dpn</it>II site there is a potential polyadenylation signal, suggesting that alternative polyadenylation could be responsible for the absence of a CD69 MPSS tag. Even though CD3&#947; is the most weakly expressed transcript of those tested here, the lack of any MPSS tags derived from transcripts of CD3&#947; is very surprising, given the supposedly increased depth of the MPSS libraries compared to SAGE. Taken in isolation, this finding could have implied that there is an additional region of the CD3&#947; 3' UTR containing a potential MPSS tag that is not recorded in the main DNA sequence databases. However, in a second MPSS library made from the same mRNA sample the CD3&#947; tag was represented at 9.5 tags per million. Thus, the tag is produced and, given its expression level, should be found in every library of this size provided that every transcript is equally likely to be sampled. This provided the first indication of MPSS sampling problems, despite the size of MPSS libraries.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of known genes</p>
            </st>
            <p>Ideally, the level of expression of every distinct transcript identified by the two methods would be compared. However, ambiguities in tag to gene mapping and differences in tag anchoring sites mean that different populations of potential tags will be sampled in each case, making such comparisons non-trivial. Therefore, a set of test transcripts that contain both <it>Nla</it>III and <it>Dpn</it>II sites, and for which the potential tags at all such sites are unique in both the human genome and the Ensembl transcriptome, was extracted from Ensembl <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. This set was called UTBS (Unique Transcripts for Both Sites) and consisted of 8,132 transcripts. The Spearman correlation coefficient for expression of UTBS transcripts in the two libraries was 0.66. As expected, this correlation is significantly higher than that obtained for comparisons of the MPSS library with other, <it>i.e</it>. non-CD4<sup>+ </sup>T cell-derived LongSAGE libraries; for example comparison with an activated CD8<sup>+ </sup>T cell-derived LongSAGE library (83,553 tags; SHIA <it>et al</it>., unpublished) yielded a correlation of 0.55. Importantly, however, the correlation between libraries produced from the one RNA sample using the two methods was far lower than that for LongSAGE libraries produced from distinct cell populations. The coefficient obtained for a comparison of our activated CD4<sup>+ </sup>T-cell LongSAGE library with the activated CD8<sup>+ </sup>T-cell library referred to above, for example, is 0.76 and when our library is compared to a second LongSAGE library of similar size generated from the same cells in the "resting" state, <it>i.e</it>. prior to activation with anti-CD3 and anti-CD28 antibody coated beads (501,343 tags; MTV <it>et al</it>., unpublished), the correlation coefficient is 0.88.</p>
            <p>Given that both methods are believed to be generally reproducible <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>, the larger-than-expected differences between the LongSAGE and MPSS libraries generated from the same RNA sample is suggestive of a systematic bias intrinsic to one or other of the methods. To identify the source of this bias, and to establish which of the methods is the more reliable, the libraries were compared at the levels of sampling depth and breadth.</p>
         </sec>
         <sec>
            <st>
               <p>Depth of sampling</p>
            </st>
            <p>The rate of addition of novel tag sequences to the library provides a measure of whether a given library is large enough to identify every potential tag sequence in the initial sample, since, when all existing tags have been sequenced, this rate should approach zero. As expected, given their relative sizes, this appears to be the case for the MPSS but not the LongSAGE library. However, the rate of novel tag addition is likely to be artificially increased in the LongSAGE library due to sequencing error accumulation <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>; MPSS has been reported to have much lower error rates than LongSAGE, <it>i.e</it>. ~0.25% <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> versus ~0.7% per base <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. A simple filter was used to remove tag sequences generated by errors: <it>i.e</it>. only tags that matched either the genome or the known transcriptome were kept. Some genuine tags carrying polymorphisms unrepresented in the databases, or for which no cDNA sequence is available and a splice junction or polyadenylation occurs within the tag, are likely to be removed. However, as these are comparatively rare events, this is not expected to have a large effect on library complexity <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Removal of the error-derived tags dramatically reduces the rate of novel tag addition in the LongSAGE library (Fig. <figr fid="F1">1</figr>), although it is still not asymptotic. The abundances of tags corresponding to UTBS transcripts were used to examine the effect of library size on the sampling of known genes (Fig. <figr fid="F2">2</figr>). It is clear from this that the MPSS library has sampled virtually all UTBS transcripts present, and that the rate of transcript discovery by LongSAGE falls to a very low level, suggesting that the LongSAGE library is probably large enough to identify most known transcripts within the transcriptome of this cell.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Effect of total number of tags sequenced on number of distinct tag sequences identified</p>
               </caption>
               <text>
                  <p><it>Effect of total number of tags sequenced on number of distinct tag sequences identified</it>. LongSAGE (<b><it>A</it></b>) and MPSS (<b><it>B</it></b>) libraries produced from an activated CD4<sup>+ </sup>T-cell clone were sampled at various sizes to examine the effect of library size on the number of distinct tag sequences identified. If the library is large enough to sample all available tags, then increasing the library size will not increase the number of sequences detected. Closed diamonds represent all tags in the library. Open circles represent only those tags that exactly match either the genome or the transcriptome (<it>i.e</it>. excluding possible sequencing errors but also polymorphisms and some tags crossing splice junctions).</p>
               </text>
               <graphic file="1471-2164-8-333-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Number of transcripts in the UTBS dataset identified by LongSAGE and MPSS</p>
               </caption>
               <text>
                  <p><it>Number of transcripts in the UTBS dataset identified by LongSAGE and MPSS</it>. The UTBS dataset consists of transcripts containing both <it>Nla</it>III and <it>Dpn</it>II restriction sites and for which all extracted tags are unique in both the transcriptome and the genome. The LongSAGE (<b><it>A</it></b>) and MPSS (<b><it>B</it></b>) libraries were sampled at various sizes and the numbers of transcripts from the UTBS dataset for which tags were identified were calculated.</p>
               </text>
               <graphic file="1471-2164-8-333-2"/>
            </fig>
            <p>The analysis using all tags matching the genome (Fig. <figr fid="F1">1</figr>) versus that based on the UTBS transcript set (Fig. <figr fid="F2">2</figr>) provide different answers to the question of how many tags need to be sequenced in order to sample the entire transcriptome. Both methods suggest that the MPSS library is large enough to sample all the readable sequence species present on the microbeads (<it>i.e</it>. the number of unique sequences identified has reached its maximum). On the other hand, while the all-tag analysis suggests that a LongSAGE library needs to be substantially larger than 500,000 tags to sample all transcripts in the cDNA pool, the analysis of known genes does not. This difference is not surprising because known genes are likely, on average, to be expressed at a higher level than novel transcripts, aiding their initial identification <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. However, it is also possible that many LongSAGE tags are derived from unconventional, <it>i.e</it>. non-protein encoding, transcriptional units absent from gene databases. In this case, larger SAGE libraries would be required to identify a full set of such unconventional transcripts.</p>
         </sec>
         <sec>
            <st>
               <p>Breadth of sampling</p>
            </st>
            <p>Great sampling depth is only of value if the open expression technology identifies transcripts irrespective of their sequence. There is a large discrepancy in the number of different sequences identified by the two methods. At the same sampling depth (<it>i.e</it>. 500,000 tags) there are many more distinct tag sequences in the LongSAGE library than in the MPSS library (151,794 vs. 12,140). Allowing for differences in sequencing error rate by considering only tags that match the human genome, LongSAGE identifies 7.4-fold more unique tag sequences than MPSS (71,838 vs. 9,723). Even using the entire MPSS library, which is 3 times the size of the SAGE library, MPSS identifies 6.3-fold fewer tags than SAGE. Although up to half of this difference may be accounted for by the lower number of <it>Dpn</it>II sites in the genome, a ~3-fold reduction in number of distinct species identified by a method intended to analyse samples to a greater depth is unexpected. This large difference suggests either that the LongSAGE library contains many spurious tags randomly matching genomic sequences or that the MPSS library lacks many genuine tags, despite the sequencing of tags from every captured transcript.</p>
            <p>A simple explanation for the greater complexity of the LongSAGE library is that SAGE samples more tags per transcript than MPSS. It is not possible to directly convert the number of tag sequences found in a library to the number of genes being profiled, for two reasons. First, most genes will have multiple tags owing to polymorphisms, alternative polyadenylation, internal polyadenosine stretches, antisense expression and incomplete cleavage by the restriction enzymes. Second, in some cases, especially within gene families, multiple genes will share the same tag sequence. It should be possible, however, to make a rough estimate of the number of transcriptional loci identified by examining the number of 'tag clusters' found in the genome. As we wanted to compare the two methods rather than identify specific genes or determine an exact gene number, a very simple set of criteria was used to define a tag cluster. Briefly, all the tags matching the genome only once were sorted by chromosome position. Each tag match was analysed in turn and was considered to be part of a new transcriptional locus if it was greater than X bases from the nearest previous tag match on the chromosome or more than Y bases from the first tag match in the previous transcriptional locus. Various values were used for X and Y, but regardless of the exact value used, the LongSAGE library identified 2.8&#8211;3.8 fold more clusters, and therefore presumably loci, than the MPSS library [see Additional file <supplr sid="S2">2</supplr>]. For example, if X is 10 kb, LongSAGE identifies ~27,000 loci and MPSS identifies ~8,200 loci regardless of the value of Y, once the data have been corrected for tags that had to be excluded from the analysis because they matched multiple loci. Thus, the difference in the number of tag species identified by the two methods probably does reflect a real difference in the number of expressed genes sampled, rather than a trivial difference in the number of potential tags sampled per transcript.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Number of transcriptional loci identified. Additional table showing the number of different active transcriptional loci identified in the same cell sample by either SAGE or MPSS according the method described in the text when various alternative parameters are used.</p>
               </text>
               <file name="1471-2164-8-333-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Examination of a set of known transcripts should help determine whether these differences are due to erroneous LongSAGE tags or to the absence of genuine tags from the MPSS library. Analysis of the tags matching the UTBS transcript set yielded the same trend as the analysis of all tags, with the LongSAGE library identifying almost twice as many UTBS transcripts as the MPSS library (Fig. <figr fid="F2">2</figr>). This data can be extrapolated to estimate the number of genes sampled in each library: since 17.3% of all the tag sequences in the MPSS library represented 1,955 known transcripts and 7.0% of the LongSAGE tag sequences represented 3,617 transcripts, it can be estimated that ~11,300 transcripts are sampled in the complete MPSS library and ~51,700 in the entire LongSAGE library. These numbers are expected to be underestimates given that known transcripts are likely to be expressed at a higher level than uncharacterized transcripts. However, these numbers are much higher than those obtained when estimating the numbers of transcriptional loci even using a maximum distance between tags (X) of just 5 kb [see Additional file <supplr sid="S2">2</supplr>]. The likely explanation for this is that sequencing errors are artificially increasing the number of apparently unique tags in the libraries. Using matches to the genome to define genuine tags, extrapolation from the number of UTBS transcripts found suggests that SAGE identified a total of ~24,600 transcripts and MPSS ~8,700 transcripts, in good agreement with the estimates of loci number allowing 5,000&#8211;15,000 bases between tags defining each locus [see Additional file <supplr sid="S2">2</supplr>].</p>
            <p>However, the data is analyzed, MPSS seems to underestimate transcriptome complexity. In this context, it is revealing to examine the expression level of the different classes of transcripts in the UTBS dataset. The average representation level for all LongSAGE tags corresponding to each sense transcript in this set (3,617 transcripts expressed in total) is 45 tags per million (tpm), whereas for transcripts identified by both methods the average total SAGE tag count for a transcript is 65 tpm (1,855 transcripts) and for those identified by LongSAGE only, it is 23 tpm (1,762 transcripts). This suggests that MPSS fails to detect weakly expressed transcripts. Since this is not what is expected of a method capable of sampling many more tags than SAGE, it implies that there are systematic biases in MPSS sequencing, or in library production, or both.</p>
            <p>A trivial explanation for these results is that there is DNA contamination of the LongSAGE library but not the MPSS library. It is of course very difficult to prove that there has been no contamination of a library when deep transcriptome analysis of the given cell has not been undertaken previously. Clearly, every care was taken to ensure that there was no contamination of the libraries at any stage. However, if the SAGE library was contaminated after the initial RNA sample was divided, there are three possible sources of contaminating DNA that could explain our results (<it>i.e</it>. that generated matches to the human genome): human genomic DNA, DNA from other human transcripts or ditags from previously generated human SAGE libraries. The only LongSAGE library previously produced in our laboratory was derived from anti-CD3 antibody-treated CD8<sup>+ </sup>T-cells (SHIA <it>et al</it>., unpublished). In this library, there are very high tag counts for tags derived from transcripts encoding CD8 (934 tpm total) and several other molecules that are completely absent from the CD4<sup>+ </sup>T cell-derived SAGE library. Similarly, tags that are extremely abundant in both the activated CD8<sup>+ </sup>T cell-derived library and our activated CD4<sup>+ </sup>T cell-derived library are completely absent in another large resting CD4<sup>+ </sup>T cell-derived library (MTV <it>et al</it>., unpublished), <it>e.g</it>. CCL4L1 at 1688 tpm, 2029 tpm and 0 tpm, respectively. Thus, library cross-contamination seems unlikely. In addition, the new libraries did not contain any tags derived from transcripts encoding markers of cells that are likely sources of cDNA contamination in our laboratory, <it>e.g</it>. B cells (CD19, CD20, CD21, CD22), myeloid cells (CD14, CD32) or keratinocytes (KRT5, KRT9, KRT14, KRT17). Finally, in the case of genomic DNA contamination, the abundance of contaminating tags would be expected to correlate directly with the number of copies of that sequence found in the human genome. However, the abundance distributions for SAGE tags from UTBS transcripts detected by SAGE only is equivalent to that of all the tags matching UTBS transcripts detected by SAGE as well as those detected by SAGE and by MPSS [see Additional file <supplr sid="S3">3</supplr>]. Thus, there do not appear to be any differences in the distribution of tags detected only by SAGE that can be attributed to genomic contamination.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Comparisons of tag abundance distributions for LongSAGE tags from the activated CD4<sup>+ </sup>T-cell library matching UTBS transcripts according to whether the transcripts are also detected by MPSS. Additional figure comparing the apparent frequency distributions of transcripts from known genes according to whether their corresponding tags were found by SAGE and MPSS or exclusively by one of these techniques in order to demonstrate that transcripts detected only by SAGE did not represent a fixed level of genomic contamination.</p>
               </text>
               <file name="1471-2164-8-333-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Analysis of MPSS bias</p>
            </st>
            <p>In order to try to identify the nature of MPSS bias, we analysed additional MPSS libraries and examined the complexities of one library at different stages of sequencing. Two more MPSS 'bead libraries' were produced from the same cDNA sample used in the production of the library considered up to this point. Since analysis of the first bead library revealed substantial sequence redundancy, <it>i.e</it>. virtually no new sequences were added as the library size increased (Fig. <figr fid="F1">1</figr>), we expected the tag composition of additional bead libraries to be essentially identical. Instead, addition of tags from the new libraries causes dramatic increases in the number of different species (Fig. <figr fid="F3">3A</figr>). Each library has similar numbers of distinct tag sequences (~14,100 to ~14,900 per library), but the majority (71%) of these are only found in one of the three libraries, even after excluding tags that do not match the genome (<it>i.e</it>. potential sequencing errors; Fig. <figr fid="F3">3B</figr>). The tags found in any one library are present at much lower levels than those found in all three libraries (<it>i.e</it>. averaging 9.4 tpm vs. 191.3 tpm). This suggests that random sampling during MPSS library preparation has a large effect on the resulting 'bead library', profoundly reducing its complexity. Since only 2,646 transcripts are identified in the UTBS dataset when the three MPSS libraries are combined (Fig. <figr fid="F3">3B</figr>), versus the 3,617 identified by LongSAGE, more than three MPSS libraries would be required for comprehensive transcriptome analysis using MPSS.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Comparison of tag sequences in three MPSS libraries produced from the same RNA sample</p>
               </caption>
               <text>
                  <p><it>Comparison of tag sequences in three MPSS libraries produced from the same RNA sample</it>. <b>A. </b>The three libraries were sampled to various sizes in a step-wise fashion to examine the effect of library size on the number of distinct tag sequences identified (as done for single SAGE and MPSS libraries in Fig. 1). Closed diamonds represent random sampling of tags from all three libraries combined. Open diamonds represent sampling of each library in turn. Clearly, although the number of distinct species identified by each library (with the possible exception of the third) appears to approach saturation, each library is sampling a different subset of sequences from the initial RNA pool. <b>B. </b>Venn diagrams showing the distribution of tag sequences between the three MPSS libraries. The library represented by the blue circle is the one used in most of the analyses presented in this study. Diagram (i) represents all the different tag sequences in the libraries. Diagram (ii) represents only those tags that match the genome; this reduces the influence of sequencing errors. In both comparisons, the majority of distinct sequences are found in only one library. Diagram (iii) represents known transcripts in the UTBS dataset found expressed in the sense direction. Here the pattern is less marked, but still only half the transcripts were observed in all three libraries (1,312/2,646). The improvement in the correlation of the libraries for known transcripts (<it>i.e</it>. those in the UTBS) was expected because more highly expressed transcripts are more likely to have been previously identified, and therefore known transcripts tend to be more abundant and have a greater chance of being sampled.</p>
               </text>
               <graphic file="1471-2164-8-333-3"/>
            </fig>
            <p>The raw MPSS data was provided in three forms, extracted at different stages of sequencing: <it>i.e</it>. after the sequencing of tags of 14 bp, 17 bp and 20 bp. A comparison of the alternate tag extractions from the first bead library (1.74 M tags) suggests that sequencing length has an effect on the complexity of the library: the longer the tag sequence, the smaller the number of unique tags that are sequenced (Table <tblr tid="T1">1</tblr>). The 14 bp library was ~24% more complex than the 20 bp library, contrary to expectation: a 14 bp library generated <it>in silico </it>from the 20 bp data is ~16% less complex than the 20 bp tag library. It is possible that the additional 14 bp tag sequences that are absent from the libraries of longer tags are 'bad sequencing reads' that are filtered out in the last rounds of sequencing. If this were the case, the 20 bp library should constitute an improvement on the libraries of shorter tags, and a larger proportion of the long tags ought to match the genome sequence. Instead, we found the opposite: as tag length increased, a smaller proportion of tags matched the genome (Table <tblr tid="T1">1</tblr>). There are two explanations for the apparent drop in 'sequence quality' as tag length increases. First, as tag length increases, the chance that an error, polymorphism or splice junction may occur within the sequence also increases. Second, shorter tags are more likely to match the genome due to chance even if they contain an error (analogous to the chance of a genuine tag from one gene randomly matching other locations in the genome; see Additional file <supplr sid="S1">1</supplr>). The loss of library complexity during successive sequencing cycles can only exacerbate the much larger loss of complexity resulting from sampling error at the stage of bead library construction revealed by our comparison of multiple MPSS libraries.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Effect of tag length on MPSS library complexity</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>Tag length sequenced (bp)</p>
                     </c>
                     <c ca="center">
                        <p>Length of tags analysed</p>
                     </c>
                     <c ca="center">
                        <p>Number of unique tags</p>
                     </c>
                     <c ca="center">
                        <p>Tags matching genome sequence</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>14,894</p>
                     </c>
                     <c ca="center">
                        <p>11,489 (77%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>13,576</p>
                     </c>
                     <c ca="center">
                        <p>11,934 (88%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>12,509</p>
                     </c>
                     <c ca="center">
                        <p>12,372 (99%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>18,084</p>
                     </c>
                     <c ca="center">
                        <p>14,307 (79%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>15,190</p>
                     </c>
                     <c ca="center">
                        <p>14,944 (98%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>19,931</p>
                     </c>
                     <c ca="center">
                        <p>19,402 (97%)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>MPSS tags can be extracted from the same initial dataset to produce tags of different lengths; in this case 14, 17 and 20 bp tags were extracted. After the extractions, tag lengths can be computationally shortened to see if there is a difference in complexity between the different tag extractions. Decreasing the tag length sequenced was, unexpectedly, found to increase the complexity of the library. For example, 14,894 different 20 base tags were produced, which contained 13,576 different 17 base sequences if the last 4 bases were ignored. However, if the tags were initially extracted at 17 bases (<it>i.e</it>. ignoring the last annealing step in sequencing) then a library of 18,084 different tag sequences was produced; 4,508 distinct species are therefore lost in this last sequencing step. The last column shows how many of the distinct tag species have perfect matches in the human genome, and this is also expressed as the proportion of the species identified (in brackets).</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Identification of potential novel transcriptional loci</p>
            </st>
            <p>The deep sampling by LongSAGE and to a smaller degree by MPSS means that tags can be matched to genomic regions where transcripts have not previously been identified or predicted by Ensembl (>10,000 tags match such regions). However, many of these matches are unlikely to correspond to actual transcriptional loci, as tags may match more than one genomic site or may represent sequencing errors arising fortuitously from more abundant tags that match the genome elsewhere. On the other hand, it is likely that loci identified by both methods will represent genuine regions of transcription. To investigate the likely numbers of new transcriptional loci identifiable using this approach, strict criteria were used to identify regions where transcription was detected by both methods. Tags were required to match the genome only once, at a position where no known Ensembl genes are annotated within 5000 bases in the sense or antisense direction. Of all the tags, only 5, 975 unique LongSAGE tags and 392 MPSS tags satisfied these criteria (using only the first of the three MPSS libraries). The genomic matches to the tags in both lists were then examined in order to ascertain whether they could be part of the same gene. If a LongSAGE tag matched the genome within 5,000 bases of, and on the same strand as, an MPSS tag, this pair of tags was considered to define a potentially new transcriptional locus. This procedure identified only 147 tag pairs, none of which occur within 5,000 bases of predicted genes in Ensembl Release 40 (<it>i.e</it>. genes predicted without direct cDNA sequence data <it>e.g</it>. from comparison to other genomes). These loci therefore represent possible transcriptional loci for which no clear evidence has previously been obtained. The pairs of tags are listed in Additional file <supplr sid="S4">4</supplr>. Interestingly, the average abundance of MPSS tags that match the genome once and have some form of gene annotation is 59.8 tpm but for those in this novel gene list, it is 20.2 tpm. This confirms that novel genes tend to be expressed at a lower level than those already discovered. It is also interesting to note that roughly half these tag pairs (<it>i.e</it>. 72) are found in genomic regions masked in Ensembl, which are more difficult to analyse by other methods owing to the presence of repetitive elements. Identifying the transcripts corresponding to all these novel loci should be relatively simple using both tags as primers for direct PCR or nested 5' rapid amplification of cDNA ends (RACE) <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Overall, however, our observations suggest that relatively few <it>bona fide </it>new transcriptional loci remain to be discovered.</p>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p>Novel loci of transcription identified by combining LongSAGE and MPSS. Additional table listing all the pairs of SAGE and MPSS tags found close together in genomic regions with no previously annotated transcriptional locus nearby.</p>
               </text>
               <file name="1471-2164-8-333-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>We have described the production of large LongSAGE and MPSS libraries from a single RNA sample and consider their usefulness for identifying the complete transcriptome of a clonal population of cells, including transcripts not expressed in all cells and hence present on average at less than one copy per cell. The two methods give very different estimates of the number of genes expressed by a single cell. Both by counting the number of genomic loci represented or by extrapolation from the number of known genes found, the SAGE tags sequenced are estimated to represent 20,000&#8211;30,000 transcripts, whereas the MPSS tags represent 7,000&#8211;9,000 transcripts. The total number of genes in the human genome is still being debated, but the current consensus places it under 30,000 protein encoding genes <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> (and perhaps below 25,000 <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>). Estimates of the number of different transcripts expressed in a single cell vary widely. Early studies on mouse brain suggested that there are between ~10,000 <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and ~100,000 <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> transcript species per cell. However, gene expression is a stochastic process <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, so it might be expected that, if a pure population of cells could be sampled deeply enough, transcription from every gene would be detected.</p>
         <p>The main conclusion of our work is that although MPSS yields large amounts of expression data very rapidly, sampling problems severely limit transcriptome coverage and bias library complexity towards genes transcribed at higher levels. Our analysis suggests that, despite the very significant problem of sequencing errors in large SAGE libraries, until large amounts of data from new techniques such as SBS <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> become available, LongSAGE will remain the best source of available data for the deep mining of cellular transcriptomes. In organisms for which the complete genome sequence is available, most sequencing errors can be removed by excluding tags that fail to match either the genome or any known transcript. This is likely to remove a few genuine tags because databases of expressed sequences are not complete and splicing, polyadenylation and sequence polymorphisms mean that expressed sequence is not always identical to genomic sequence. However, because most transcripts contain more than one SAGE tag, very few expressed genes will remain unidentified. Other methods for removing sequencing errors from SAGE libraries are mostly based on identifying tags in the library related by simple mutations (single base changes, insertions or deletions). In our hands, these methods removed several genuine transcripts of interest without removing as large a proportion of the tag sequences as the genome-based approach (data not shown). For organisms for which complete genome sequences are unavailable, only these methods would allow meaningful lists of tags representing truly novel transcripts to be compiled.</p>
         <p>We are uncertain about the source of the reduced complexity of the MPSS data. It seems clear that a lack of sampling due to insufficient sequencing is not the major problem. For each bead library virtually all the distinct sequences are sampled after sequencing fewer tags than are obtained in an average library produced using the standard protocol. From our comparison of three MPSS libraries prepared from the same RNA sample, it would appear that the largest amount of complexity is lost at the stage of bead library preparation. Each library samples a small fraction of the transcriptome and, even in combination, the three MPSS libraries fail to identify as many known genes as an >8-fold smaller LongSAGE library, let alone as many different transcript species overall. At present it is not obvious whether it is the production of tags from cDNA or tag to bead ligation that is most responsible for reducing library complexity prior to bead loading and sequencing.</p>
         <p>It should be noted, however, that some transcripts are lost in the course of sequencing, as demonstrated by the loss of library complexity in terms of the number of species identified as tag sequences are extended from 14 to 17 to 20 bp. We are not the first to identify such effects. MPSS sequencing proceeds in four-base steps yielding four-base "words" and it has been noted by Meyers <it>et al</it>. that MPSS has problems with palindromic words (<it>e.g</it>. TTAA) <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B43">43</abbr></abbrgrp>, leading to bias against detection of these sequences. However, Meyers <it>et al</it>. <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> estimated that only ~8% of sequences would be affected by this source of bias and in our libraries this effect will have been suppressed by sequencing in two staggered phases, since this increases the number of tags that do not contain 4 base palindromes in at least one of the phases. The extent of the palindrome effect is therefore not enough to explain the large decrease in library complexity we observe when comparing the 14, 17 and 20 bp extractions of the sequencing data. Either the palindrome-dependent effect is larger than expected or there is some additional, currently unidentified, systematic bias in MPSS sequencing.</p>
         <p>A potential source of bias in classical MPSS data is that the cDNA species immobilised on the beads following <it>Dpn</it>II cleavage vary significantly in length (<it>i.e</it>. the distance between the cleavage site and the end of the cDNA). The effect of tag-position within the cDNA on the observed abundance of MPSS tags has been analysed by Chen and Rattray <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> who found that this was a significant source of bias for both "classical" and "signature" MPSS, although it was more serious for classical MPSS. Signature MPSS is a refinement of the method wherein cDNAs are cleaved with <it>Mme</it>I after cleavage with <it>Dpn</it>II and ligation of a linker so that the same length of sequence, <it>i.e</it>., the tag, is immobilised on the beads in each case. This approach is analogous to the SAGE process, which ensures that all ditags amplify uniformly. Tag-position bias is likely to affect the observed abundance of many tags in our libraries. However, if the library is sequenced to completion, as our data suggests (Fig. <figr fid="F1">1</figr>), all different tag sequences on the beads should have been sampled even if the frequency of sampling does not correlate with abundance. It is possible that there is a maximum length of cDNA species beyond which tags are never (or hardly ever) observed, but this has not been demonstrated and is unlikely to account for the level of inter-library variation we observed.</p>
         <p>Analysis of the GC content of observed SAGE and MPSS tags, compared to that expected by random sampling of tags from known genes <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, also pointed to bias in MPSS but not SAGE tag identification. LongSAGELite <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, which has an additional amplification step, may introduce bias, however. Several signature MPSS libraries analysed by Siddiqui <it>et al</it>. <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> were found to be biased towards GC-rich tags. In contrast, the few classical MPSS libraries that were analysed seemed to have a small bias towards AT rich tags. Using a similar approach to Chen <it>et al</it>., we find that the GC content of both our SAGE and MPSS libraries is higher than that seen in random sampling by ~13 standard deviations and ~57 standard deviations, respectively. These deviations are larger than those observed by Siddiqui <it>et al</it>. <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, which may reflect differences in the details of the sampling procedure or the Refseq pool used. The difference we see between the two methods is consistent with their data for signature MPSS libraries and LongSAGE libraries, <it>i.e</it>. that the MPSS method appears to be significantly biased in favour of GC rich tags. It remains unclear whether this accounts for the complete absence of large numbers of AT rich tags from an MPSS bead library, but it at least partly explains the loss of complexity at the sequencing stage.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our results suggest that MPSS data ought to be used cautiously. Although conventionally sequenced SAGE datasets therefore constitute the only reliable sources of quantitative digital gene expression data at present, this situation is almost certainly set to change. New DNA sequencing technologies, based on polony sequencing <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B15">15</abbr></abbrgrp>, are likely to provide additional, very large datasets. These methods will reduce both the cost and the time taken to generate large SAGE and EST libraries, making these methods even more accessible. It is unlikely that the use of new sequencing technologies <it>per se </it>would introduce unforeseen biases into expression libraries, but our findings suggest that researchers will need to verify this. SBS has been adapted by Solexa, who also own the now redundant MPSS technology, as an alternative method for the parallel production and sequencing of signature sequences <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This technique is quite different from MPSS, in that it does not use beads, does not have the 4-base sequencing cycles, and will be available as an instrument rather than a "black box" service. However, when SBS signature expression libraries become available it will be necessary to exclude the unexpected, and largely unexplained, lack of complexity we have encountered in MPSS libraries.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Library preparation and sequencing</p>
            </st>
            <p>Our LongSAGE library was generated according to the standard protocol using the I-SAGE Long kit from Invitrogen (Groningen, the Netherlands) and was sequenced on an ABI 3700 capillary DNA sequencer (Applied Biosystems, Foster City, CA) using BigDye v3 terminators (Applied Biosystems) to a depth of 503,431 tags. MPSS libraries were produced from the same RNA sample as the LongSAGE library by Lynx Therapeutics Inc (now Solexa Inc, Hayward, CA) under their standard service agreement. They initially provided a library of 1,744,173 tags and then two further libraries of 1,573,952 and 956,867 tags, respectively; giving a total of 4,274,992 MPSS tags. All three libraries were provided as 20 bp reads (including the <it>Dpn</it>II restriction site sequence GATC) sequenced using "steppers" 2 and 4 <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B33">33</abbr></abbrgrp>. On request, data from the sequencing of the initial MPSS library captured in the two sequencing cycles before the final one, <it>i.e</it>. tag lengths 14 and 17 bases including the <it>Dpn</it>II site, were also provided.</p>
         </sec>
         <sec>
            <st>
               <p>Data storage</p>
            </st>
            <p>The data were stored as flat files or in a MySQL database as flat tables to simplify error checking and optimise the speed of access. Data were processed either using Perl and the DBI module for database interactions working with the MySQL database or using C programs working on the flat files. The programs used are currently undergoing optimisation but can be made available on request.</p>
         </sec>
         <sec>
            <st>
               <p>Tag extractions from expression libraries</p>
            </st>
            <p>Sequencing runs from the LongSAGE libraries were initially processed with Phred to remove obvious sequencing errors <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. When choosing a Phred setting one has to balance the need for high quality sequence against that of losing genuine sequences. Analysis of sequences of a similar length to SAGE ditags led Prosdocimi <it>et al</it>. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> to conclude that low Phred settings allowed the optimal ratio of genuine sequences retained to errors removed. Therefore for this analysis, sequences with Phred scores of 10 and above were kept. The Phred screening and tag extraction from ditags were done using Perl scripts written by A.G. McArthur (Marine Biological Laboratory, Woods Hole, USA). Tags were imported into a MySQL database and those possibly derived from linker sequence were removed. Tag counts were then normalised to tags per million (tpm), and the un-rounded normalised counts were used for inter-library comparisons. Public LongSAGE libraries were downloaded from GEO177 and imported into the MySQL database using the same normalisation and linker removal script as for libraries sequenced in-house. MPSS tags were directly imported into the database from tag files provided by Lynx Therapeutics Inc.</p>
         </sec>
         <sec>
            <st>
               <p>Tag extraction from the human genome and known transcriptome</p>
            </st>
            <p>The data source for genome and transcriptome data was Ensembl <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> (version 40, NCBI human genome sequence assembly 36). Tags were extracted from each chromosome using both masked and unmasked genomic sequence data and from the mitochondrial DNA at all possible restriction sites (of <it>Nla</it>III and <it>Dpn</it>II). Further information was then extracted for each tag: three windows were examined, both up- and downstream of the tag, for the presence of gene annotation (at the restriction site, up to 1,000 bases from the site and 1,001&#8211;5,000 bases from the site). For each window, all Ensembl genes and predicted genes were recorded. The genomic tags were then classed as being outside any known gene (default), or as exonic, intronic or boundary (<it>i.e</it>. crossing an exon-intron boundary), or as matching multiple genes.</p>
            <p>Tags were also extracted from all transcripts in the Ensembl Genes dataset <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> at all restriction sites (<it>Nla</it>III and <it>Dpn</it>II) in both the sense and antisense direction. If a tag extended beyond the known 3' end of a transcript, it was extended along the genome unless the transcript was predicted to contain a polyA site (as defined in the supplementary material by Caron <it>et al</it>. <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>), in which case adenosines were added to the 3' end of the tag to complete the length of the tag.</p>
         </sec>
         <sec>
            <st>
               <p>Tag-to-gene mapping</p>
            </st>
            <p>Information from the extractions described above was combined for automated tag-to-gene-mapping. First, frequencies for each tag, in both the genome and transcriptome, were calculated, and then each tag was matched to the genome and classified as one of the following: single match, multiple match, no match or excess matches (more than 20 hits to the genome). No further analysis was undertaken for the excess matches. For single matches and multiple matches where only one match occurred in or near a known gene, tags were further annotated as matching the gene or the region downstream of a gene in a sense or antisense direction. Tags matching the known transcriptome were also categorised as matching a known transcript in the sense or antisense direction or matching multiple known transcripts.</p>
         </sec>
         <sec>
            <st>
               <p>Gene-to-tag mapping</p>
            </st>
            <p>The UTBS transcript set described in the text was produced by identifying all known Ensembl transcripts that contained at least one restriction site for each enzyme (<it>Dpn</it>II and <it>Nla</it>III) and for which all sense and antisense tags in all exons of the gene encoding that transcript were unique within the transcriptome and within the genome. Some tags may be absent from the genome due to splicing, polyadenylation and the fact that the genome is not complete. This set consisted of 8132 genes. For each gene, all possible tags derived using either method were extracted, and expression of the gene was calculated as the sum of the abundance of each of its corresponding tags.</p>
         </sec>
         <sec>
            <st>
               <p>Statistical comparisons of expression libraries</p>
            </st>
            <p>Spearman correlation coefficients were calculated for the data using the statistical analysis program, R <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Spearman correlation coefficients are suitable for examining large-scale gene expression experiments because the calculation uses rank data rather than absolute values and is therefore not influenced by outliers <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>SAGE, serial analysis of gene expression; MPSS, massively parallel signature sequencing; tpm, tags per million; UTR, untranslated region; UTBS, Unique Transcripts for Both Sites; RACE, rapid amplification of cDNA ends.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declares that there are no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>LH devised the basic computational approaches, carried out the initial data analysis, identified the discrepancies discussed and drafted an initial report. VBS devised an improved bioinformatic strategy, undertook the rigorous testing of these results and participated in formulating the manuscript. MTV and SHIA designed and carried out the library production and data acquisition procedures. JKS and SLRJ produced the biological samples necessary for the work and undertook their analysis and testing. SJD conceived the study, participated in its design and co-wrote the manuscript. EJE coordinated and planned the detailed study, participated in the data analysis and interpretation and co-wrote the manuscript.</p>
         <p>All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful for the advice of Dr. S Taylor and the Oxford Computational Biology Group. LH, VBS, JKS and SLRJ are supported by the UK Medical Research Council and MTV, EJE and SJD by the Wellcome Trust. The SAGE and MPSS libraries described in this study have been deposited at the Gene Expression Omnibus (series accession number GSE8612).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The transcriptional landscape of the mammalian genome</p>
            </title>
            <aug>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Maeda</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Oyama</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ravasi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kodzius</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shimokawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bajic</snm>
                  <fnm>VB</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Forrest</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Zavolan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Wilming</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Aidinis</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Allen</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Ambesi-Impiombato</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aturaliya</snm>
                  <fnm>RN</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Bansal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baxter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Beisel</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Bersano</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bono</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Chalk</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Chiu</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Choudhary</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Christoffels</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clutterbuck</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Crowe</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Dalla</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dalrymple</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>de Bono</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Della Gatta</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>di Bernardo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Engstrom</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Fagiolini</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Faulkner</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fletcher</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Fukushima</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Furuno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Futaki</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gariboldi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Georgii-Hemming</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Gustincich</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Harbers</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hayashi</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hensch</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Hirokawa</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Huminiecki</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Iacono</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ikeo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Iwama</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jakt</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Katoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kawasawa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kelso</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kitamura</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kitano</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kollias</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Krishnan</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Kruger</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kummerfeld</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Kurochkin</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Lareau</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Lazarevic</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lipovich</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Liuni</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McWilliam</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Madan Babu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Madera</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marchionni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Matsuda</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matsuzawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mignone</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Miyake</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mottagui-Tabar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mulder</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nakauchi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ng</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Nilsson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nishiguchi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nori</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ohara</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Okazaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Orlando</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Pang</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Pavan</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Pavesi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Petrovsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Piazza</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Reid</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Ring</snm>
                  <fnm>BZ</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ruan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Sandelin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Schonbach</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sekiguchi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Semple</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Seno</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sessa</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sheng</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shibata</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Shimada</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shimada</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Silva</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sinclair</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sperling</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stupka</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Sugiura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sultana</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Takenaka</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Taki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tammoja</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Tegner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Teichmann</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Ueda</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>van Nimwegen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Verardo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Yagi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Yamanishi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zabarovsky</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zimmer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hide</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Grimmond</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Teasdale</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>ET</fnm>
               </au>
               <au>
                  <snm>Brusic</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wahlestedt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Hume</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Kai</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sasaki</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Fukuda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kanamori-Katayama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aoki</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Arakawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Iida</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Imamura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kato</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kawaji</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kawagashira</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kojima</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Konno</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ninomiya</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nishio</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Okada</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Plessy</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shibata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shiraki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tagami</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Waki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Watahiki</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Okamura-Oho</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <fpage>1559</fpage>
            <lpage>1563</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1112014</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Global identification of human transcribed sequences with genome tiling arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Bertone</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Stolc</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Royce</snm>
                  <fnm>TE</fnm>
               </au>
               <au>
                  <snm>Rozowsky</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Urban</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Rinn</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Tongprasit</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Samanta</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>306</volume>
            <fpage>2242</fpage>
            <lpage>2246</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1103388</pubid>
                  <pubid idtype="pmpid" link="fulltext">15539566</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brubaker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stern</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tammana</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sementchenko</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Piccolboni</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bekiranov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Ganesh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bell</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Gerhard</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>1149</fpage>
            <lpage>1154</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1108625</pubid>
                  <pubid idtype="pmpid" link="fulltext">15790807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Applications of DNA tiling arrays for whole-genome analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Mockler</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sundaresan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Jacobsen</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Ecker</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2005</pubdate>
            <volume>85</volume>
            <fpage>1</fpage>
            <lpage>15</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2004.10.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">15607417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Open systems: panoramic views of gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Green</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Simons</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Taillon</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Lewin</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>J Immunol Methods</source>
            <pubdate>2001</pubdate>
            <volume>250</volume>
            <fpage>67</fpage>
            <lpage>79</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-1759(01)00306-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11251222</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Genome sequencing in microfabricated high-density picolitre reactors</p>
            </title>
            <aug>
               <au>
                  <snm>Margulies</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Egholm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Attiya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Bemben</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Berka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Braverman</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Dewell</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fierro</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gomes</snm>
                  <fnm>XV</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Helgesen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Irzyk</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Jando</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Alenquer</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Jarvie</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Jirage</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lanza</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Leamon</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Lefkowitz</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Lei</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lohman</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Makhijani</snm>
                  <fnm>VB</fnm>
               </au>
               <au>
                  <snm>McDade</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>McKenna</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Nickerson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nobile</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Plant</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Puc</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Ronan</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Sarkis</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Simons</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tartaro</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Tomasz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vogt</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Volkmer</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Weiner</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Begley</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Rothberg</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <fpage>376</fpage>
            <lpage>380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1464427</pubid>
                  <pubid idtype="pmpid" link="fulltext">16056220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Gene discovery and annotation using LCM-454 transcriptome sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Emrich</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Barbazuk</snm>
                  <fnm>WB</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Schnable</snm>
                  <fnm>PS</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>69</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1716268</pubid>
                  <pubid idtype="pmpid" link="fulltext">17095711</pubid>
                  <pubid idtype="doi">10.1101/gr.5145806</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Stolovitzky</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Kundaje</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Held</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Duggar</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Haudenschild</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Vasicek</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Aderem</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Roach</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>1402</fpage>
            <lpage>1407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">547838</pubid>
                  <pubid idtype="pmpid" link="fulltext">15668391</pubid>
                  <pubid idtype="doi">10.1073/pnas.0406555102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>A rapid method for computationally inferring transcriptome coverage and microarray sensitivity</p>
            </title>
            <aug>
               <au>
                  <snm>Reverter</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>McWilliam</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Barris</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Dalrymple</snm>
                  <fnm>BP</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>80</fpage>
            <lpage>89</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth472</pubid>
                  <pubid idtype="pmpid" link="fulltext">15308544</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Assessment of the relationship between signal intensities and transcript concentration for Affymetrix GeneChip arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Chudin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Walker</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kosaka</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>SX</fnm>
               </au>
               <au>
                  <snm>Rabert</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Kreder</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>RESEARCH0005</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">150452</pubid>
                  <pubid idtype="pmpid" link="fulltext">11806828</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Serial analysis of gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>484</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.270.5235.484</pubid>
                  <pubid idtype="pmpid" link="fulltext">7570003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Tag-based approaches for transcriptome research and genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Harbers</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <fpage>495</fpage>
            <lpage>502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth768</pubid>
                  <pubid idtype="pmpid" link="fulltext">15973418</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Using the transcriptome to annotate the genome</p>
            </title>
            <aug>
               <au>
                  <snm>Saha</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sparks</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Rago</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Akmaev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2002</pubdate>
            <volume>20</volume>
            <fpage>508</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt0502-508</pubid>
                  <pubid idtype="pmpid" link="fulltext">11981567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bridgham</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Golda</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McCurdy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Foy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ewan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>George</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eletr</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Albrecht</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Vermaas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Moon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Burcham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pallas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>DuBridge</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Kirchner</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fearon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Corcoran</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2000</pubdate>
            <volume>18</volume>
            <fpage>630</fpage>
            <lpage>634</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76469</pubid>
                  <pubid idtype="pmpid" link="fulltext">10835600</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Toward the 1,000 dollars human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Bennett</snm>
                  <fnm>ST</fnm>
               </au>
               <au>
                  <snm>Barnes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Pharmacogenomics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>373</fpage>
            <lpage>382</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1517/14622416.6.4.373</pubid>
                  <pubid idtype="pmpid" link="fulltext">16004555</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Protocol for Whole Genome Sequencing using Solexa Technology</p>
            </title>
            <aug>
               <au>
                  <cnm>Solexa_Inc</cnm>
               </au>
            </aug>
            <source>Biotechniques Protocol Guide 2007</source>
            <publisher>, Informa Life Sciences</publisher>
            <pubdate>2006</pubdate>
            <fpage>29</fpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>An expression atlas of rice mRNAs and small RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Nobuta</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Venu</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Belo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vemaraju</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kulkarni</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Pillay</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2007</pubdate>
            <volume>25</volume>
            <fpage>473</fpage>
            <lpage>477</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1291</pubid>
                  <pubid idtype="pmpid" link="fulltext">17351617</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The expression of three abundance classes of messenger RNA in mouse tissues</p>
            </title>
            <aug>
               <au>
                  <snm>Hastie</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Bishop</snm>
                  <fnm>JO</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1976</pubdate>
            <volume>9</volume>
            <fpage>761</fpage>
            <lpage>774</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(76)90139-2</pubid>
                  <pubid idtype="pmpid">1017013</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The T cell surface--how well do we know it?</p>
            </title>
            <aug>
               <au>
                  <snm>Evans</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Hene</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sparks</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Dong</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Retiere</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fennelly</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Manso-Sancho</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Powell</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Braud</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Rowland-Jones</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>McMichael</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Immunity</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>213</fpage>
            <lpage>223</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1074-7613(03)00198-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">12932355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Biological insights into TCRgammadelta+ and TCRalphabeta+ intraepithelial lymphocytes provided by serial analysis of gene expression (SAGE)</p>
            </title>
            <aug>
               <au>
                  <snm>Shires</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Theodoridis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hayday</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>Immunity</source>
            <pubdate>2001</pubdate>
            <volume>15</volume>
            <fpage>419</fpage>
            <lpage>434</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1074-7613(01)00192-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">11567632</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Gene expression profile in human leukocytes</p>
            </title>
            <aug>
               <au>
                  <snm>Hashimoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nagai</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sese</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Obata</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Toyoda</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dong</snm>
                  <fnm>HY</fnm>
               </au>
               <au>
                  <snm>Kurachi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nagahata</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Shizuno</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Morishita</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Matsushima</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Blood</source>
            <pubdate>2003</pubdate>
            <volume>101</volume>
            <fpage>3509</fpage>
            <lpage>3513</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1182/blood-2002-06-1866</pubid>
                  <pubid idtype="pmpid" link="fulltext">12522010</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Stage-dependent gene expression profiles during natural killer cell development</p>
            </title>
            <aug>
               <au>
                  <snm>Kang</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yoon</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Kawamura</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>YC</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Myung</snm>
                  <fnm>PK</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Genomics</source>
            <pubdate>2005</pubdate>
            <volume>86</volume>
            <fpage>551</fpage>
            <lpage>565</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ygeno.2005.06.010</pubid>
                  <pubid idtype="pmpid" link="fulltext">16054799</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Signatures of the immune response</p>
            </title>
            <aug>
               <au>
                  <snm>Shaffer</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Rosenwald</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hurt</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Giltnane</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>LT</fnm>
               </au>
               <au>
                  <snm>Pickeral</snm>
                  <fnm>OK</fnm>
               </au>
               <au>
                  <snm>Staudt</snm>
                  <fnm>LM</fnm>
               </au>
            </aug>
            <source>Immunity</source>
            <pubdate>2001</pubdate>
            <volume>15</volume>
            <fpage>375</fpage>
            <lpage>385</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1074-7613(01)00194-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11567628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Identification of T cell-restricted genes, and signatures for different T cell responses, using a comprehensive collection of microarray datasets</p>
            </title>
            <aug>
               <au>
                  <snm>Chtanova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Newton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Weininger</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Silva</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Bertoni</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rinaldi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chappaz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sallusto</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rolph</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Mackay</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>J Immunol</source>
            <pubdate>2005</pubdate>
            <volume>175</volume>
            <fpage>7837</fpage>
            <lpage>7847</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16339519</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Transcript identification by analysis of short sequence tags--influence of tag length, restriction site and transcript database</p>
            </title>
            <aug>
               <au>
                  <snm>Unneberg</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Wennborg</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsson</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>2217</fpage>
            <lpage>2226</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153741</pubid>
                  <pubid idtype="pmpid" link="fulltext">12682372</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg313</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>RefSeq and LocusLink: NCBI gene-centered resources</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>137</fpage>
            <lpage>140</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29787</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125071</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>A human immunodeficiency virus 1 (HIV-1) clade A vaccine in clinical trials: stimulation of HIV-specific T-cell responses by DNA and recombinant modified vaccinia virus Ankara (MVA) vaccines in humans</p>
            </title>
            <aug>
               <au>
                  <snm>Mwau</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cebere</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chikoti</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Winstone</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wee</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Beattie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YH</fnm>
               </au>
               <au>
                  <snm>Dorrell</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>McShane</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Conlon</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rowland-Jones</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Bwayo</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>McMichael</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Hanke</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>J Gen Virol</source>
            <pubdate>2004</pubdate>
            <volume>85</volume>
            <fpage>911</fpage>
            <lpage>919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1099/vir.0.19701-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">15039533</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The Ensembl genome database project</p>
            </title>
            <aug>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eyras</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hammond</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huminiecki</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kasprzyk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lehvaslaiho</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lijnzaad</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Melsopp</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pettett</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Pocock</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Potter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rust</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Slater</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Spooner</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Stabenau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stalker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stupka</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ureta-Vidal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vastrik</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>38</fpage>
            <lpage>41</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99161</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752248</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.38</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>MicroSAGE is highly representative and reproducible but reveals major differences in gene expression among samples obtained from similar tissues</p>
            </title>
            <aug>
               <au>
                  <snm>Blackshaw</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kuo</snm>
                  <fnm>WP</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Tsujikawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gunnersen</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Boon</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Cepko</snm>
                  <fnm>CL</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>R17</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153457</pubid>
                  <pubid idtype="pmpid" link="fulltext">12620102</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-3-r17</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms</p>
            </title>
            <aug>
               <au>
                  <snm>Reinartz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bruyns</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>JZ</fnm>
               </au>
               <au>
                  <snm>Burcham</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bowen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kramer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Woychik</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Brief Funct Genomic Proteomic</source>
            <pubdate>2002</pubdate>
            <volume>1</volume>
            <fpage>95</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bfgp/1.1.95</pubid>
                  <pubid idtype="pmpid" link="fulltext">15251069</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptome</p>
            </title>
            <aug>
               <au>
                  <snm>Dinel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bolduc</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Belleau</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Boivin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yoshioka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Calvo</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Piedboeuf</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Snyder</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Labrie</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>St-Amand</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>e26</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">549424</pubid>
                  <pubid idtype="pmpid" link="fulltext">15716308</pubid>
                  <pubid idtype="doi">10.1093/nar/gni025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Detecting the impact of sequencing errors on SAGE data</p>
            </title>
            <aug>
               <au>
                  <snm>Colinge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feger</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>840</fpage>
            <lpage>842</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.9.840</pubid>
                  <pubid idtype="pmpid" link="fulltext">11590101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The use of MPSS for whole-genome transcriptional analysis in Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Tej</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Vu</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Haudenschild</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Agrawal</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Edberg</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Ghazal</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Decola</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>1641</fpage>
            <lpage>1653</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">509274</pubid>
                  <pubid idtype="pmpid" link="fulltext">15289482</pubid>
                  <pubid idtype="doi">10.1101/gr.2275604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Analysis of human transcriptomes</p>
            </title>
            <aug>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lash</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rago</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Beaudry</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Ciriello</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Cook</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Dufault</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Ferguson</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Gao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Hermeking</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hiraldo</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Hwang</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Luderer</snm>
                  <fnm>HF</fnm>
               </au>
               <au>
                  <snm>Mathews</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Petroziello</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Polyak</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Zawel</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Haluska</snm>
                  <fnm>FG</fnm>
               </au>
               <au>
                  <snm>Jen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sukumar</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Landes</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Riggins</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>1999</pubdate>
            <volume>23</volume>
            <fpage>387</fpage>
            <lpage>388</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/70487</pubid>
                  <pubid idtype="pmpid" link="fulltext">10581018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The impact of SNPs on the interpretation of SAGE and MPSS experimental data</p>
            </title>
            <aug>
               <au>
                  <snm>Silva</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>De Souza</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Galante</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Riggins</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>De Souza</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>AA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>6104</fpage>
            <lpage>6110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534621</pubid>
                  <pubid idtype="pmpid" link="fulltext">15562001</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh937</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A comprehensive transcript index of the human genome generated using microarrays and computational approaches</p>
            </title>
            <aug>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>GuhaThakurta</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Holder</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ying</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Svetnik</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Leonardson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cavet</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Castle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>McDonagh</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kasarskis</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Margarint</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Caceres</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Armour</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Garrett-Engele</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Tsinoremas</snm>
                  <fnm>NF</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R73</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">545593</pubid>
                  <pubid idtype="pmpid" link="fulltext">15461792</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-10-r73</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Detecting novel low-abundant transcripts in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bao</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shi</snm>
                  <fnm>RZ</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>YC</fnm>
               </au>
               <au>
                  <snm>Wing</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2005</pubdate>
            <volume>11</volume>
            <fpage>939</fpage>
            <lpage>946</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370778</pubid>
                  <pubid idtype="pmpid" link="fulltext">15923377</pubid>
                  <pubid idtype="doi">10.1261/rna.7239605</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer</p>
            </title>
            <aug>
               <au>
                  <snm>Frohman</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Dush</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>GR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>8998</fpage>
            <lpage>9002</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">282649</pubid>
                  <pubid idtype="pmpid" link="fulltext">2461560</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.23.8998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Has the yo-yo stopped? An assessment of human protein-coding gene number</p>
            </title>
            <aug>
               <au>
                  <snm>Southan</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2004</pubdate>
            <volume>4</volume>
            <fpage>1712</fpage>
            <lpage>1726</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/pmic.200300700</pubid>
                  <pubid idtype="pmpid" link="fulltext">15174140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Finishing the euchromatic sequence of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Consortium</snm>
                  <fnm>IHGS</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>931</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15496913</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Complexity and characterization of polyadenylated RNA in the mouse brain</p>
            </title>
            <aug>
               <au>
                  <snm>Bantle</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Hahn</snm>
                  <fnm>WE</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1976</pubdate>
            <volume>8</volume>
            <fpage>139</fpage>
            <lpage>150</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(76)90195-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">986249</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Transcription of individual genes in eukaryotic cells occurs randomly and infrequently</p>
            </title>
            <aug>
               <au>
                  <snm>Ross</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Browne</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Hume</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Immunol Cell Biol</source>
            <pubdate>1994</pubdate>
            <volume>72</volume>
            <fpage>177</fpage>
            <lpage>185</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/icb.1994.26</pubid>
                  <pubid idtype="pmpid">8200693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Vu</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Tej</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Ghazal</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Matvienko</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Agrawal</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Ning</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Haudenschild</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2004</pubdate>
            <volume>22</volume>
            <fpage>1006</fpage>
            <lpage>1011</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt992</pubid>
                  <pubid idtype="pmpid" link="fulltext">15247925</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Analysis of tag-position bias in MPSS technology</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rattray</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>77</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1533822</pubid>
                  <pubid idtype="pmpid" link="fulltext">16603069</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-77</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Sequence biases in large scale gene expression profiling data</p>
            </title>
            <aug>
               <au>
                  <snm>Siddiqui</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Delaney</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Schnerch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Griffith</snm>
                  <fnm>OL</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>e83</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1524917</pubid>
                  <pubid idtype="pmpid" link="fulltext">16840527</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl404</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Comprehensive transcript analysis in small quantities of mRNA by SAGE-lite</p>
            </title>
            <aug>
               <au>
                  <snm>Peters</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Kassam</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Yonas</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>O'Hare</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Ferrell</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Brufsky</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>e39</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148762</pubid>
                  <pubid idtype="pmpid" link="fulltext">10572191</pubid>
                  <pubid idtype="doi">10.1093/nar/27.24.e39</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Base-calling of automated sequencer traces using phred. I. Accuracy assessment</p>
            </title>
            <aug>
               <au>
                  <snm>Ewing</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wendl</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>175</fpage>
            <lpage>185</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9521921</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Evaluation of window cohabitation of DNA sequencing errors and lowest PHRED quality values</p>
            </title>
            <aug>
               <au>
                  <snm>Prosdocimi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Peixoto</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Ortega</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Genet Mol Res</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <fpage>483</fpage>
            <lpage>492</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15688315</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>An overview of Ensembl</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Bevan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Caccamo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Coates</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cutts</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Down</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Eyras</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Fernandez-Suarez</snm>
                  <fnm>XM</fnm>
               </au>
               <au>
                  <snm>Gane</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gibbins</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gilbert</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hammond</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hotz</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Iyer</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Jekosch</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kahari</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kasprzyk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Keefe</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Keenan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lehvaslaiho</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>McVicker</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Melsopp</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Meidl</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Pettett</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Potter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Proctor</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rae</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Slater</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smedley</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Spooner</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Stabenau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stalker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Storey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ureta-Vidal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Woodwark</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Cameron</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>925</fpage>
            <lpage>928</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479121</pubid>
                  <pubid idtype="pmpid" link="fulltext">15078858</pubid>
                  <pubid idtype="doi">10.1101/gr.1860604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>The Ensembl automatic gene annotation system</p>
            </title>
            <aug>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Eyras</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Searle</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>942</fpage>
            <lpage>950</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">479124</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123590</pubid>
                  <pubid idtype="doi">10.1101/gr.1858004</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The human transcriptome map: clustering of highly expressed genes in chromosomal domains</p>
            </title>
            <aug>
               <au>
                  <snm>Caron</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>van Schaik</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>van der Mee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baas</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Riggins</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>van Sluis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hermus</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>van Asperen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Boon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Voute</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Heisterkamp</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>van Kampen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Versteeg</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>291</volume>
            <fpage>1289</fpage>
            <lpage>1292</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1056794</pubid>
                  <pubid idtype="pmpid" link="fulltext">11181992</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The R project</p>
            </title>
            <aug>
               <au>
                  <cnm>The_R_Foundation</cnm>
               </au>
            </aug>
            <url>http://www.r-project.org</url>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Analysis of host response to bacterial infection using error model based gene expression microarray experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Stekel</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Sarti</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Trevino</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Salmon</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Buckley</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pallen</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Penn</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Falciani</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>e53</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1072804</pubid>
                  <pubid idtype="pmpid" link="fulltext">15800204</pubid>
                  <pubid idtype="doi">10.1093/nar/gni050</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
