<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2001-2-3-preprint0002</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Deposited research article</dochead>
      <bibl>
         <title>
            <p>Improving SAGE di-tag processing</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Colinge</snm>
               <fnm>Jacques</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2" ca="yes">
               <snm>Feger</snm>
               <fnm>Georg</fnm>
               <insr iid="I1"/>
               <email>yuan.33@osu.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Serono Pharmaceutical Research Institute, Ch. des Aulx 14, CH-1228 Plan-les-Ouates, Switzerland</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2001</pubdate>
         <volume>2</volume>
         <issue>3</issue>
         <fpage>preprint0002.1</fpage>
         <lpage>preprint0002.10</lpage>
         <url>http://genomebiology.com/2001/2/3/preprint/0002</url>
         <note>This is the first version of this article to be made available publicly, and no other version is available at present. The article was submitted to <it>Genome Biology</it> for peer review.</note>
         <xrefbib>
            <pubid idtype="doi">10.1186/gb-2001-2-3-preprint0002</pubid>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>13</day>
               <month>2</month>
               <year>2001</year>
            </date>
         </rec>
         <pub>
            <date>
               <day>22</day>
               <month>2</month>
               <year>2001</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2001</year>
         <collab>BioMed Central Ltd</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>SAGE is a genome-wide method for obtaining gene expression profiles. It generates tags of 10 nucleotides in length, which are assumed to determine the corresponding gene transcript. In practice however, this is not always sufficient for uniquely identifying a gene.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We propose an improved processing of SAGE sequences that allows us to obtain one extra base for reasonably abundant tags. This method includes a statistical test for controlling the relevance of extra base predictions.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>The improved SAGE sequence processing we present reduces the uncertainty in SAGE tag to gene mapping and can be applied to any SAGE library.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010013">Methods</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Serial Analysis of Gene Expression (SAGE) is a method for measuring the relative abundance of gene transcripts in different mRNA samples. It identifies a short mRNA tag from each individual transcript and concatenates them into long DNA molecules, which are then sequenced. By counting these tags one can estimate, for example, the expression of genes in a cell [<abbr bid="B1">1</abbr>]. SAGE popularity is growing fast and many public data are accessible from the Internet [<abbr bid="B2">2</abbr>].</p>
         <p>Processing of SAGE sequences is described in [<abbr bid="B1">1</abbr>], [<abbr bid="B2">2</abbr>] and [<abbr bid="B3">3</abbr>]. The usual length of SAGE tag is 10 bases. In practice, this length is not sufficient to uniquely identify each gene: several genes share the same tag. The SAGE method uses tag-pairs to avoid bias by PCR amplification. As pointed out in [<abbr bid="B2">2</abbr>], the observed length of the di-tags is not constant, it varies between 20 and 26 (see Table <tblr tid="T1">1</tblr>) due to a certain flexibility in the enzyme used. The usual processing of SAGE sequences does not take advantage of these longer di-tags. Here we present a new method to predict an 11<sup>th</sup> base for sufficiently abundant tags, hence increasing precision in gene identification (the number of possible genes to which a tag is mapped is divided by 4 on average).</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Di-tag length distribution</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="center">
                     <p>Di-tag length</p>
                  </c>
                  <c ca="center">
                     <p>Number detected</p>
                  </c>
                  <c ca="center">
                     <p>Percentage</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>233</p>
                  </c>
                  <c ca="center">
                     <p>0.5%</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>21</p>
                  </c>
                  <c ca="center">
                     <p>2524</p>
                  </c>
                  <c ca="center">
                     <p>5.3%</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>22</p>
                  </c>
                  <c ca="center">
                     <p>25502</p>
                  </c>
                  <c ca="center">
                     <p>53.3%</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>23</p>
                  </c>
                  <c ca="center">
                     <p>17052</p>
                  </c>
                  <c ca="center">
                     <p>35.7%</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>24</p>
                  </c>
                  <c ca="center">
                     <p>2151</p>
                  </c>
                  <c ca="center">
                     <p>4.5%</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>25</p>
                  </c>
                  <c ca="center">
                     <p>129</p>
                  </c>
                  <c ca="center">
                     <p>0.3%</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>26</p>
                  </c>
                  <c ca="center">
                     <p>191</p>
                  </c>
                  <c ca="center">
                     <p>0.4%</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Example of di-tag length distribution obtained for a human white matter SAGE library [5]. Distributions obtained for other libraries follow the same pattern.</p>
            </tblfn>
         </tbl>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>We use di-tags of sufficient length to compute the frequencies of the four possible extra bases (A, C, G and T) for every tag. Then we use contingency tables and hypothesis testing [<abbr bid="B4">4</abbr>] to determine relevant extra bases. The null hypothesis we apply is that every possible extra base has the same probability to be sequenced.</p>
         <p>In the publicly available data set [<abbr bid="B5">5</abbr>], we used as an example a SAGE library made for Homo sapiens normal white matter [<abbr bid="B6">6</abbr>]. We used the di-tag list of this SAGE library to exemplify the usefulness of our method. [<abbr bid="B6">6</abbr>] contains 51640 di-tags of length between 20 and 26 bases. We rejected 3856 suspect repeated di-tags (see [<abbr bid="B2">2</abbr>] and [<abbr bid="B3">3</abbr>]). From the remaining 47784 di-tags we extracted 32668 different tags. The number of tags for which we could predict an 11<sup>th</sup> base by applying our method is given in Table <tblr tid="T2">2</tblr>.</p>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Number of 11<sup>th</sup> base prediction</p>
            </caption>
            <tblbdy cols="4">
               <r>
                  <c ca="center">
                     <p>Relevance</p>
                  </c>
                  <c ca="center">
                     <p>Number of predictions</p>
                  </c>
                  <c ca="center">
                     <p>Average count</p>
                  </c>
                  <c ca="center">
                     <p>Median count</p>
                  </c>
               </r>
               <r>
                  <c cspan="4">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>95.0%</p>
                  </c>
                  <c ca="center">
                     <p>1700 (432)</p>
                  </c>
                  <c ca="center">
                     <p>28.8 (8.5)</p>
                  </c>
                  <c ca="center">
                     <p>25 (7)</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>99.0%</p>
                  </c>
                  <c ca="center">
                     <p>1268 (488)</p>
                  </c>
                  <c ca="center">
                     <p>35.7 (10.7)</p>
                  </c>
                  <c ca="center">
                     <p>20 (13)</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>99.9%</p>
                  </c>
                  <c ca="center">
                     <p>780</p>
                  </c>
                  <c ca="center">
                     <p>51.4</p>
                  </c>
                  <c ca="center">
                     <p>22</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Number of 11<sup>th</sup> base predictions for human normal white matter SAGE library [<abbr bid="B5">5</abbr>]. Statistics about tag abundance for each relevance degree are given both as the average and median counts. Statistics for a specific relevance degree only are in parentheses.</p>
            </tblfn>
         </tbl>
         <p>For illustration purpose, we identified these tags by extracting SAGE tags of UniGene [<abbr bid="B7">7</abbr>] clusters (build 108). We only considered tags at the end of the UniGene sequences, i.e. we consider UniGene sequences as 5' oriented. Other identification strategies are possible, see for instance [<abbr bid="B2">2</abbr>]. An example of a tag is CAAGCATCCC, observed 1510 times with 5 extra As (the base A was observed at the 11<sup>th</sup> position in the di-tag), 1426 extra Cs, 13 extra Gs and 13 extra Ts. We uniquely identified this tag in UniGene as Hs.250444 <it>small inducible cytokine A7 (monocyte chemotactic protein 3).</it> The 11<sup>th</sup> base found in the UniGene cluster sequence matches with the dominant extra C we mention above. According to the method we propose (see Materials and Methods), the prediction of C as an extra base is relevant at the 99.9% level.</p>
         <p>An example of a tag shared by two genes, one of which is apparently not expressed, is provided by GGGCTGGGGT, observed 86 times with 5 extra As and 80 extra Cs. GGGCTGGGGT<b>A</b> is identified in UniGene [<abbr bid="B7">7</abbr>] as Hs.90436 <it>sperm acrosomal protein (SPAG7)</it> and GGGCTGGGGT<b>C</b> as Hs. 183 698 <it>ribosomal protein L29 (RPL29).</it> According to the extra bases observed, it seems that only RPL29 is expressed (99.9% relevant, SPAG7 is possibly weakly expressed).</p>
         <p>The special situation of several expressed genes sharing the same 10-base tag is illustrated by GTGAAACCCC, observed 422 times with 161 extra As, 22 extra Cs, 202 extra Gs, and 12 extra Ts. According to the null hypothesis (equiprobability of every extra base), both A and G are relevant at a higher probability than 95% (99.9% in this case). We can estimate a count of 161/(161+202).422=187 for GTGAAACCCC<b>A</b> and 202/(161+202).422=235 for GTGAAACCCC<b>G</b>. We subsequently found that this tag is shared by many UniGene[<abbr bid="B7">7</abbr>] clusters: 49 clusters with extra A, 7 cluster with extra C and 54 clusters with extra G.</p>
         <p>[<abbr bid="B2">2</abbr>] proposes the assignment of a score to each identification, in order to characterize its reliability. If a tag comes with a predicted extra base, the latter should be checked with the database sequence and the result included in the score computation.</p>
         <p>The complex situation of tag GTGAAACCCC above suggests a possible extension of our method. We test the relevance of predicted extra bases by comparing (hypothesis testing) the observed frequencies with the hypothetical situation of equiprobability. Another possible null hypothesis would be (1) to chose a method for identifying tags, as we did with UniGene [<abbr bid="B7">7</abbr>], and (2) to estimate the relevance of the possible extra bases according to this new null hypothesis. Returning to the example of tag GTGAAACCCC, none of the four possible extra bases significantly departs from the distribution obtained from UniGene. This implies that no extra base can be selected reliably and, consequently, every possible extra base should be considered. We cannot obtain any simplification of the data in that case, contrary to what we found with the equiprobability null hypothesis.</p>
         <p>We do not apply the latter extension of the method in practice for two reasons: first, this extension is dependant on the method for identifying tags and, second, considering the difficulty in analyzing SAGE data, we prefer to concentrate on dominantly abundant extra bases for the sake of simplicity.</p>
         <p>We presented a method that allows for the prediction of one extra base for sufficiently abundant tags (at least 7 occurrences). The method applies to every SAGE library, without any special preparation. The predictions may be controlled in terms of relevance by using appropriate hypothesis testing techniques. The longer tags permit a better identification of expressed genes.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and Methods</p>
         </st>
         <p>We assume that, in the case of di-tags of length 20 or more, the first 10 bases belong to the first tag and the last 10 bases belong to the second tag. Since the tags are linked into di-tags randomly, the extra available bases, in the middle of a di-tag of length 22 or more (see Figure <figr fid="F1">1</figr>), belong to each tag with a probability that is symmetrical. Accordingly, we propose a new di-tag processing.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>A di-tag of length 24.</p>
            </caption>
            <text>
               <p>A di-tag of length 24. The two 10-base tags are made of the bases 1 to 10 and 15 to 24. The bottom part of the figure shows an idealization of the probability that each base belongs to a specific tag (blue line for the first tag, red line for the second tag)</p>
            </text>
            <graphic file="gb-2001-2-3-preprint0002-1"/>
         </fig>
         <sec>
            <st>
               <p>Algorithm</p>
            </st>
            <p>Let c(t) denotes the counter associated with a tag t. Let A(t), C(t), G(t), T(t) denote the counters associated with each 4 possible extra base of tag t. We denote by R(s) the operation to take the complementary reverse of s (read s in reverse order and exchange letters: 'A' with T', 'C' with 'G').</p>
            <p>1. For each di-tag d of length k:</p>
            <p>Take 10 bases at each end of d in order to obtain the two tags t<sub>1</sub> and t<sub>2</sub> it contains. Namely, we have t<sub>1</sub>=d [1..10] and t<sub>2</sub>=R(d [k-10..1]). Increment the counters c(t<sub>1</sub>) and c(t<sub>2</sub>). If k &#8805; 22, then extract one extra base for each tag: b<sub>1</sub>=d [11] and b<sub>2</sub>=R(d [k-11]). These extra bases are used to increment counters A, C, G, T: If b<sub>1</sub>='A' then increment A(t<sub>1</sub>), if b<sub>1</sub>='C' then increment C(t<sub>1</sub>), etc. The same for b<sub>2</sub>.</p>
            <p>2. We chose a degree of relevance, typically 95% or 99%. Then, for each different tag t, which has at least one of its extra base counter different from 0, we test whether each possible extra base is relevant (it is possible that more than one extra base is relevant). This is achieved by using contingency tables and hypothesis testing [<abbr bid="B4">4</abbr>].</p>
         </sec>
         <sec>
            <st>
               <p>Hypothesis testing</p>
            </st>
            <p>We describe in detail a possible method for implementing Step 2. We apply hypothesis testing to decide whether an extra base is relevant or not. Namely we use contingency table methods [<abbr bid="B4">4</abbr>]. Let us denote by D the counter of an extra base to test. Our null hypothesis is that every possible extra base has the same probability to be sequenced. We denote by Q the sum of the other counters. If D+Q is not a multiple of 4, we add 1, 2, or 3 to Q in order to have N=D+Q a multiple of 4. The null hypothesis is equivalent to test whether D is significantly different from N/4. Since we are interested in extra bases that are in excess from N/4, we consider as non-relevant extra bases with D &#8804; N/4. The situation is summarized in a contingency table (see Table <tblr tid="T3">3</tblr>).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Contingency table</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>To test</p>
                     </c>
                     <c ca="center">
                        <p>Others</p>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Observed counts</p>
                     </c>
                     <c ca="center">
                        <p>D</p>
                     </c>
                     <c ca="center">
                        <p>Q</p>
                     </c>
                     <c ca="center">
                        <p>N=D+Q</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Null hypothesis</p>
                     </c>
                     <c ca="center">
                        <p>N/4</p>
                     </c>
                     <c ca="center">
                        <p>3N/4</p>
                     </c>
                     <c ca="center">
                        <p>N</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Contingency table for testing the relevance of a possible extra base.</p>
               </tblfn>
            </tbl>
            <p>Chi-squared statistics allows estimation of the significance of the departure from the null hypothesis. This can be done, for instance, by using the Chi-squared distribution with 1 degree of freedom or Fisher's exact test as soon as D &lt; 5, see [<abbr bid="B4">4</abbr>].</p>
            <p>In our algorithm we only consider di-tags of a length of at least 22 for extra base prediction. We do not use 21-base long di-tags for extra base prediction because (1) the distribution of di-tag lengths (Table <tblr tid="T1">1</tblr>) shows that there are enough 22-base long di-tags, and (2) this would generate too many wrong 11<sup>th</sup> base counts, hence making the application of hypothesis testing more difficult.</p>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to acknowledge Mark Iberson for reading an early version of this paper. We also thank Massimo de Francesco for his help and his support.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p> Serial Analysis of Gene Expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>484</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7570003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>SAGEmap: A Public Gene Expression Resource.</p>
            </title>
            <aug>
               <au>
                  <snm>Lash</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Tolstoshev</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Strausberg</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Riggins</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Gen. Res.</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1051</fpage>
            <lpage>1060</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.10.7.1051</pubid>
                  <pubid idtype="pmpid" link="fulltext">10899154</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p> eSAGE: Managing and Analyzing Data Generated with Serial Analysis of Gene Expression (SAGE).</p>
            </title>
            <aug>
               <au>
                  <snm>Margulies</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Innis</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16(7)</volume>
            <fpage>650</fpage>
            <lpage>651</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.7.650</pubid>
                  <pubid idtype="pmpid" link="fulltext">11038335</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The Analysis of Contingency Tables.</p>
            </title>
            <aug>
               <au>
                  <snm>Everitt</snm>
                  <fnm>BS</fnm>
               </au>
            </aug>
            <source>London: Chapman and Hall,</source>
            <pubdate>1977</pubdate>
         </bibl>
         <bibl id="B5">
            <title>
               <p>CGAP (Cancer Genome Anatomy Project)</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/CGAP/</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>SAGE library for human normal white matter</p>
            </title>
            <url>ftp://ncbi.nlm.nih.gov/pub/sage/extr/SAGE_BB542_whitematter/</url>
         </bibl>
         <bibl id="B7">
            <title>
               <p> A Gene Map of the Human Genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Schuler</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Boguski</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Stewart</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Gyapay</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Tome</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aggarwal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bajorek</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bentolila</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Birren</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Butler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Castle</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Chiannilkulchai</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clee</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cowles</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Dibling</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Drouot</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dunham</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Duprat</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>East</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hudson</snm>
                  <fnm>TJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>274(5287)</volume>
            <fpage>540</fpage>
            <lpage>546</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.274.5287.540</pubid>
                  <pubid idtype="pmpid" link="fulltext">8849440</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
