<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-10-435</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Assessing the genomic evidence for conserved transcribed pseudogenes under selection</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Khachane</snm>
               <mi>N</mi>
               <fnm>Amit</fnm>
               <insr iid="I1"/>
               <email>amit.khachane@mcgill.ca</email>
            </au>
            <au ca="yes" id="A2">
               <snm>Harrison</snm>
               <mi>M</mi>
               <fnm>Paul</fnm>
               <insr iid="I1"/>
               <email>paul.harrison@mcgill.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave., Montreal, QC, H3A 1B1 Canada</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>1</issue>
         <fpage>435</fpage>
         <url>http://www.biomedcentral.com/1471-2164/10/435</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19754956</pubid>
               <pubid idtype="doi">10.1186/1471-2164-10-435</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>11</day>
               <month>3</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>15</day>
               <month>9</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>15</day>
               <month>9</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Khachane and Harrison; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p><it>Transcribed pseudogenes </it>are copies of protein-coding genes that have accumulated indicators of coding-sequence decay (such as frameshifts and premature stop codons), but nonetheless remain transcribed. Recent experimental evidence indicates that transcribed pseudogenes may regulate the expression of homologous genes, through antisense interference, or generation of small interfering RNAs (siRNAs). Here, we assessed the genomic evidence for such transcribed pseudogenes of potential functional importance, in the human genome. The most obvious indicators of such functional importance are significant evidence of conservation and selection pressure.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>A variety of pseudogene annotations from multiple sources were pooled and filtered to obtain a subset of sequences that have significant mid-sequence disablements (frameshifts and premature stop codons), and that have clear evidence of full-length mRNA transcription. We found 1750 such transcribed pseudogene annotations (TPAs) in the human genome (corresponding to ~11.5% of human pseudogene annotations). We checked for syntenic conservation of TPAs in other mammals (rhesus monkey, mouse, rat, dog and cow). About half of the human TPAs are conserved in rhesus monkey, but strikingly, very few in mouse (~3%). The TPAs conserved in rhesus monkey show evidence of selection pressure (relative to surrounding intergenic DNA) on: <it>(i) </it>their GC content, and <it>(ii) </it>their rate of nucleotide substitution. This is in spite of distributions of Ka/Ks (ratios of non-synonymous to synonymous substitution rates), congruent with a lack of protein-coding ability. Furthermore, we have identified 68 human TPAs that are syntenically conserved in at least two other mammals. Interestingly, we observe three TPA sequences conserved in dog that have intermediate character (<it>i.e.</it>, evidence of both protein-coding ability and pseudogenicity), and discuss the implications of this.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Through evolutionary analysis, we have identified candidate sequences for functional human transcribed pseudogenes, and have pinpointed 68 strong candidates for further investigation as potentially functional transcribed pseudogenes across multiple mammal species.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification id="endnote" subtype="user_supplied_xml" type="bmc"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Pseudogenes (derived from protein-coding genes) are gene copies that show signs diagnostic of protein-coding deficiency. Such signs commonly include premature stop codons and coding-sequence frameshifts, or neutral codon substitution patterns <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Pseudogenes can arise in two chief ways: <it>(i) </it>from retrotransposition, <it>i.e.</it>, reverse transcription of a cellular messenger RNA, followed by reintegration into the genomic DNA <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>, or <it>(ii) </it>from decay of genes that originated (however long ago) from duplication <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B6">6</abbr></abbrgrp>. These genomic entities have generally been believed to be non-functional. Historically, there were some early individual reports of transcribed pseudogenes in the scientific literature <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Examples included: human leukocyte interferon (LeIFN) pseudogene <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, glyceraldehyde-3-phosphate dehydrogenase pseudogene <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, glucocerebrosidase pseudogene <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, steroid 21-hydrolase pseudogene <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, glutamine synthetase pseudogene <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, tumor repressor &#936;PTEN <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         <p>More recently, genome-wide screens have detected transcription evidence for many retropseudogenes (>200) in humans <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. In mouse oocytes, transcribed pseudogenes have been shown to play a significant role in the generation of small interfering RNAs (siRNAs) <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, which regulate the expression of homologous genes.</p>
         <p>Collectively, these reports indicate that an unknown cohort of human transcribed pseudogenes could be potentially functional in regulation of gene transcription. A key indicator of such function is significant conservation in other mammalian genomes. Svensson <it>et al</it>. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> have explored conservation of apparent pseudogenes in three mammals (human, chimpanzee and mouse). Their analysis revealed 30 cases of transcribed pseudogenes that are preserved in mouse. Here, we analyze the distribution of transcribed pseudogene annotations (TPAs) in the human and mouse genomes, examine their conservation in an expanded panel of mammals (rhesus monkey, mouse, rat, dog and cow), and assess evidence for significant selection pressures. TPAs that are conserved in rhesus monkey show evidence of significant selection pressure, despite also displaying codon substitution patterns characteristic of non-protein-coding sequences. Also, we have discovered a short-list of 68 putative human transcribed pseudogenes that are syntenically conserved in at least two other mammals from our panel. These sequences represent a strong subset of candidates for further investigation as functional transcribed pseudogenes.</p>
      </sec>
      <sec>
         <st>
            <p>Results &amp; Discussion</p>
         </st>
         <sec>
            <st>
               <p>Derivation of transcribed pseudogene annotations (TPAs) in the human genome</p>
            </st>
            <p>To identify transcribed pseudogenes, transcript sequences from the Unigene, RefSeq and H-InvDB databases were mapped onto the human genome and were examined for overlap with pseudogene annotations. These pseudogene annotations were taken from the VEGA <url>http://vega.sanger.ac.uk/</url> and <url>http://pseudogene.org</url> websites (see <it>Methods </it>for details). We pooled these datasets with re-mappings of: <it>(i) </it>'disrupted mRNAs' (dmRNAs) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, and <it>(ii) </it>transcribed processed pseudogenes <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, from previous analyses <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr></abbrgrp>. Overlap of the transcripts with pseudogenes was verified through using the positions of mid-sequence disablements (frameshifts and premature stop codons) as 'anchors'. That is, at least one disablement position was required to occur in both the genomic DNA and the transcript sequence (see <it>Methods </it>for further details).</p>
            <p>We found that ~11.5% (1750/15000) of human pseudogenes are transcribed (after correcting for pseudogene annotation overlaps within and between the various data sets) [see Additional file <supplr sid="S1">1</supplr>]. Table <tblr tid="T1">1</tblr> summarises the numbers of transcribed pseudogene annotations (TPAs) in different categories and data sets. The number of processed pseudogenes that were identified to be transcribed is 3-4 times higher than in our previous analysis <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Interestingly, in humans, regardless of category, only a small fraction of TPAs are transcribed completely in the antisense direction (~3-6%). Such a finding of significant avoidance of antisense transcription (Table <tblr tid="T1">1</tblr>) is surprising, especially for retropseudogenes. Retrotransposed sequences are inserted back into the genomic DNA irrespective of the position of existing local promoters. Thus, one would expect equal numbers of sense and antisense transcripts. However, the above finding indicates a general selection pressure against antisense transcribed pseudogenes, thus generally limiting the possibilities for complementary hybridization with transcripts and RNA elements from homologous genes.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>List of transcribed human pseudogenes</b>. Genomic coordinates of the transcribed pseudogenes found in the human genome.</p>
               </text>
               <file name="1471-2164-10-435-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Percentages of TPAs in human and mouse.</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Dataset</b>
                        </p>
                     </c>
                     <c ca="left" cspan="6">
                        <p>
                           <b>Transcribed (human)</b>
                        </p>
                     </c>
                     <c ca="left" cspan="2">
                        <p>
                           <b>Transcribed (mouse)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>VEGA</b>
                        </p>
                     </c>
                     <c ca="left" cspan="2">
                        <p>Total = 866/8160 (10.6%)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p># Processed = 383/3737 (10.24%)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p># Non-processed = 71/1078 (6.58%)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p>Total = 71/4187 (1.7%)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>828 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>38 (antisense)</p>
                     </c>
                     <c ca="left">
                        <p>371 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>12 (antisense)</p>
                     </c>
                     <c ca="left">
                        <p>61 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>4 (antisense)</p>
                     </c>
                     <c ca="left">
                        <p>49 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>22 (antisense)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><b><url>http://Pseudogene.org</url></b>(excluding ambiguous pseudogenes)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p>Total = 1035/13354 (7.75%)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p># Processed = 767/11072 (6.93%)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p># Non-processed = 268/2282 (11.74%)</p>
                     </c>
                     <c ca="left" cspan="2">
                        <p>Total = 65/15064 (0.5%)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>977 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>58 (antisense)</p>
                     </c>
                     <c ca="left">
                        <p>724 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>43 (antisense)</p>
                     </c>
                     <c ca="left">
                        <p>253 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>15 (antisense)</p>
                     </c>
                     <c ca="left">
                        <p>53 (sense)</p>
                     </c>
                     <c ca="left">
                        <p>12 (antisense)</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We performed a similar survey in the mouse genome for TPAs. Surprisingly, in the mouse genome, we found a very low percentage of TPAs, in comparison to the human genome (&lt;2%) (P &lt;&lt; 0.001 for the likelihood of the number in mouse, given the human percentage as an expectation, using binomial statistics). This is despite these two animals having similar amounts of pseudogene annotation data (Table <tblr tid="T1">1</tblr>), and transcript data (203,785 transcript sequences in total for human, and 203,550 for mouse). This indicates that transcribed pseudogenes are rarer in mice than in humans.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of orthologous pseudogenes in mammals</p>
            </st>
            <p>Transcription of pseudogenes <it>per se </it>does not necessarily indicate functionality. It has been shown that transcriptional activation at a particular genomic locus has a ripple effect on the neighboring loci <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. It is therefore possible that many transcribed pseudogenes arise simply because of this. However, of the various identified human TPAs in our present study, those that are evolutionarily conserved across mammals due to natural selection are more likely to be biologically functional. Therefore, we set out to identify a list of such orthologous transcribed pseudogenes that have conserved in &#8805;2 of our panel of mammals.</p>
            <p>Certain gene families are known to spawn large numbers of pseudogenes. Examples include olfactory receptors <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, ribosomal-protein genes <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, human thioredoxin and glutaredoxin <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, ABC transporters <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and heat shock proteins <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. In such cases, identifying orthologs in other mammals using the standard bi-directional best-hit procedure is problematic, since the rates of sequence evolution may vary in different lineages and genomic regions. Furthermore, such a procedure does not work well for pseudogenes, since these sequences are not evolving like protein-coding genes, which are under strong purifying selection. Because of this, the best match obtained using <it>blastp </it>to a pseudogene query is expectedly the parent protein-coding gene or a pseudogene recently evolved from the parent gene. Thus, the standard bi-directional best-hit procedure alone is not sufficient. Therefore, here, we have used synteny information between two organisms to pin-point pseudogene orthologs. We have used synteny maps along with homology-based searches to identify conserved orthologs in five mammals (rhesus monkey, mouse, rat, dog and cow) (see <it>Methods </it>for details). We identified a set of 68 human TPAs that are conserved in at least two of these mammals, representing potentially functional candidates (see Additional file <supplr sid="S2">2</supplr>: Table S1). In general, although approximately half (742/1750) of the human TPAs are syntenically conserved in rhesus monkey, only 3% are syntenically conserved in mouse. This suggests that a large number of human transcribed pseudogenes are primate-specific.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Table S2</b>. List of human TPAs that are conserved in other mammals.</p>
               </text>
               <file name="1471-2164-10-435-S2.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>A multiple sequence alignment of orthologous sequences for an example taken from Additional file <supplr sid="S2">2</supplr>: Table S2, is shown in Figure <figr fid="F1">1(a)</figr>. The corresponding phylogenetic tree is drawn in Figure <figr fid="F1">1(b)</figr>. This example is a human pseudogene named '<it>urn:lsid:pseudogene.org:9606.Pseudogene:4346</it>', that is homologous to human ADP-ribose pyrophosphatase. In this case, one can see clearly that disablements at several positions in the alignment are conserved in divergent species, and parsimoniously would be assigned in the ancestral sequence. Also, in this phylogenetic tree, dog clusters closer to primates, than rodents do; this may be due to variance in local genomic mutation rates. Interestingly, we find that a significantly higher number of human transcribed pseudogenes were conserved in dog, compared to in mouse (Fisher's exact test, <it>P</it>-value: 0.0086) (Figure <figr fid="F2">2</figr>). There is some debate regarding whether human is phylogenetically closer to rodents than to dog, although most data analysis indicates a rodent-primate grouping <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Multiple sequence alignment and phylogenetic analysis of a human transcribed pseudogene that has orthologs in the &#8805;2 other mammals</p>
               </caption>
               <text>
                  <p><b>Multiple sequence alignment and phylogenetic analysis of a human transcribed pseudogene that has orthologs in the &#8805;2 other mammals</b>. (a) Multiple sequence alignment of conceptually-translated ortholog sequences (<it>urn:lsid:pseudogene.org:9606.Pseudogene:4346</it>; see Additional file <supplr sid="S2">2</supplr>: Table S2) from different mammals along with the human parental protein sequence (human ADP-ribose pyrophosphatase, swissprot accession id: NUDT9_HUMAN). The positions of stop codons in the alignment are denoted by '<b>X</b>' and frameshifts denoted as '<b>B</b>'. (b) A rooted phylogenetic tree constructed from the most conserved segment from a multiple nucleotide sequence alignment between ortholog sequences (human parental protein sequence - ADP-ribose pyrophosphatase, swissprot accession id: NUDT9_HUMAN). As an outgroup, we chose a protein sequence from <it>C.elegans </it>from the 'nudix' hydrolase family but belonging to another subfamily (NDX2_CAEEL) identified based on BLAST matching. PHYLIP Bootstrap support values out of 1000 iterations are indicated at each node.</p>
               </text>
               <graphic file="1471-2164-10-435-1"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Number of shared orthologs between human and other mammals</p>
               </caption>
               <text>
                  <p><b>Number of shared orthologs between human and other mammals</b>. (a) For the 68 conserved TPAs (orthologs in > = 2 mammals); (b) for all conserved TPAs between human and the other mammals. The shared number of cases between dog and human is highlighted in red to indicate that this number is higher than for human/rodents.</p>
               </text>
               <graphic file="1471-2164-10-435-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Conserved transcribed pseudogenes are overrepresented on chromosome X</p>
            </st>
            <p>It is noteworthy that the highest number of conserved TPAs is on the human chromosome X (13 out of the total of 68; Figure <figr fid="F3">3</figr> and Additional file <supplr sid="S2">2</supplr>: Table S2), followed by 11 cases on chromosome 6. There is also a significant over-representation of human conserved TPAs on these chromosomes after normalizing for the chromosome size and dosage in gametes (&#967;<sup>2 </sup>test, d.f. = 1, <it>P</it>-value &lt; 10<sup>-3</sup>). Furthermore, it is chromosome X that is consistently and most significantly overrepresented in the whole population of TPAs, and in the datasets of TPAs that are conserved in monkey and in mouse (calculations not shown). This finding is in line with the observation that over the course of evolution there has been some extensive gene trafficking to/from the mammalian X chromosome <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>A scatter plot showing the number of conserved TPAs on each human chromosome versus the chromosome size (in Mb)</p>
               </caption>
               <text>
                  <p><b>A scatter plot showing the number of conserved TPAs on each human chromosome versus the chromosome size (in Mb)</b>. The chromosome X is circled. The collective size of the human genome, excluding chromosome X, is 3,098,124,053; and the size of the chromosome X after normalizing for chromosome dosage in gametes, as done by Emerson <it>et al</it>. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, is 116,185,316 (<it>i.e</it>., 0.75 times the original size of 154,913,754), which harbors 13 transcribed preserved pseudogenes. Since the collective genome size is 27 times bigger than that of the human chromosome X, we expect 2 or 3 preserved pseudogenes on the chromosome X. Presence of 13 cases, an increase of more than 4.5 times than expected, suggests a statistically significant overrepresentation (exact binomial test of goodness-of-fit, <it>P</it>-value &lt; 10<sup>-5</sup>, for the null hypothesis (<it>H</it><sub>0</sub>) that there is no difference between the observed and expected frequencies of transcribed preserved pseudogenes on the chromosome X).</p>
               </text>
               <graphic file="1471-2164-10-435-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Analysis of TPAs for selection to maintain them</p>
            </st>
            <p>Human TPAs from the VEGA data set, that have syntenically-conserved orthologs in rhesus monkey, were analyzed for significant selection pressure to maintain them. This was assessed through comparison of the nucleotide percentage sequence identity between orthologs, with the highest nucleotide percentage sequence identity for the immediately flanking genomic regions, as illustrated in Figure <figr fid="F4">4</figr>. We chose the VEGA data set for this analysis, since the genomic coordinates of this pseudogene annotation data set are more precisely annotated.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>A schematic representation of the procedure for identifying syntenic noncoding (flanking) regions</p>
               </caption>
               <text>
                  <p><b>A schematic representation of the procedure for identifying syntenic noncoding (flanking) regions</b>. Flanking regions (10000 nts) are scanned in a sliding window, equal to the length of conceptual human pseudogene transcript), by globally aligning it to the upstream/downstream regions of the human pseudogene. The best scoring window is identified, which corresponds to the syntenic (flanking) region in the monkey genome.</p>
               </text>
               <graphic file="1471-2164-10-435-4"/>
            </fig>
            <p>Our analysis indicated that TPA orthologs have higher sequence homology in comparison to their syntenic flanking regions. Average sequence identities among different syntenic regions in human and rhesus monkey are as follows: 75.0% (s.d. 12.6%) in the 5' areas, 81.0% in the 3' areas (s.d. 13.5%), and 86.7% (s.d. 9.6%) in the TPA sequences. The difference in the sequence identities between pseudogenes and the respective flanking regions is statistically significant (Wilcoxon signed rank test, <it>P</it>-value: &lt; 5e<sup>-50</sup>). In the majority of cases (~86%, 293/341), the percentage sequence identity between orthologous TPA sequences is greater than that of the flanking regions. This suggests that significant selection pressure exists to maintain them. We note that similar analysis comparing the human and chimpanzee genomes is not informative because the species are too similar. Also, comparisons of the human genome with the other mammals in our panel are not informative either, because the appropriate regions cannot be aligned accurately or significantly.</p>
         </sec>
         <sec>
            <st>
               <p>Conserved TPAs tend to be GC rich</p>
            </st>
            <p>We examined whether there existed any sequence feature that distinguished conserved TPAs from the rest of the human pseudogene population. A positive finding would indicate that these pseudogenes are not evolving neutrally. It has been widely observed that genes tend to be GC-rich in comparison to non-transcribed genomic segments <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Here, we examined whether TPAs and annotations of non-transcribed pseudogenes showed any difference in the GC contents relative to their flanking regions (GC<sub>diff</sub>). Pseudogene populations from a variety of organisms have been shown to relax to the composition of intergenic DNA over evolutionary time <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>. Here, the GC content of neutrally-evolving pseudogenes would be expected to relax to that of the background genomic GC content. Interestingly, GC content calculations revealed that 84% (327/391) of the human TPAs derived from the VEGA pseudogene data set, that are conserved in rhesus monkey, have GC content greater than their flanking regions (Table <tblr tid="T2">2</tblr>). This compares to 74% (5289/7154) for the non-transcribed cases (Table <tblr tid="T2">2</tblr>). This difference is statistically significant (&#967;<sup>2 </sup>test, d.f. = 1, <it>P</it>-value &lt; 10<sup>-4</sup>). A similar result is obtained if we examine the whole population of TPAs, or also if we just examine transcription of conserved <it>processed </it>pseudogenes. There is however no such significant differences for transcribed pseudogenes of the 'nonprocessed' and 'unclassified' pseudogene categories (Table <tblr tid="T2">2</tblr>). This shows that there is a greater tendency for the conserved TPAs to be GC-rich than for non-transcribed cases, and that this tendency arises primarily because of transcription of processed pseudogenes. This finding on GC content is further evidence that such transcribed pseudogenes are not evolving neutrally. Such GC trends can be explained by selection for transcriptional efficiency, as noted above <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Proportions of pseudogenes that have GC contents higher relative to their flanking regions in different categories of VEGA annotated human pseudogenes.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Pseudogene category</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Transcribed and conserved (monkey) pseudogenes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Non-transcribed pseudogenes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b><it>P</it>-value for statistical difference*</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Processed</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>165/188 (87.77%)</p>
                     </c>
                     <c ca="center">
                        <p>2473/3274 (75.53%)</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 10<sup>-04</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Nonprocessed</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>9/14 (64.29%)</p>
                     </c>
                     <c ca="center">
                        <p>577/910 (63.40%)</p>
                     </c>
                     <c ca="center">
                        <p>0.7807</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Unclassified</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>153/189 (80.95%)</p>
                     </c>
                     <c ca="center">
                        <p>2239/2970 (75.39%)</p>
                     </c>
                     <c ca="center">
                        <p>0.0961</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Total</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>327/391 (83.63%)</p>
                     </c>
                     <c ca="center">
                        <p>5289/7154 (73.93%)</p>
                     </c>
                     <c ca="center">
                        <p>&lt; 0.0001</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* &#967;<sup>2 </sup>test with <it>d.f</it>. = 1.</p>
               </tblfn>
            </tbl>
            <p>We checked whether the age of pseudogenes could be causing the GC content differences noticed above. To do this, we looked at the GC<sub>diff </sub>in the following (VEGA) subsets of TPAs, <it>i.e.: (i) </it>TPAs unique to humans; <it>(ii) </it>TPAs conserved in rhesus monkey only; <it>(iii) </it>TPAs conserved in more divergent mammals such as mouse, rat, dog and cow. We found that 82.5% (381/462) of set (i) have GC<sub>diff </sub>> 0 (<it>i.e.</it>, GC content of pseudogene greater than that of the flanking region). Similar percentages were observed in the other classes: 84.1% (317/377) in set (ii), and 77.78% (21/27) in set (iii). There was no statistically significant difference in the GC<sub>diff </sub>between any two of the classes (&#967;<sup>2 </sup>test, <it>P-value </it>> 0.55), suggesting that age of pseudogenes does not have any influence on the observed GC content differences.</p>
         </sec>
         <sec>
            <st>
               <p>Ka/Ks trends for TPAs that are conserved in rhesus monkey</p>
            </st>
            <p>We decided to assess the genomic evidence for a lack of protein-coding ability in the human TPAs that are syntenically conserved in rhesus monkey. We compared TPA characteristics to the characteristics of two other groupings: <it>(i) </it>known human protein-coding genes with orthologs in rhesus monkey; <it>(ii) </it>populations of simulated sequences that are randomly mutating without coding-sequence selection pressures. The human TPA sequences are used as starting sequences for these simulations. The protocol for these simulations is described in '<it>Methods: Ka/Ks ratio calculations</it>'.</p>
            <p>The ratio of non-synonymous to synonymous substitution rates (Ka/Ks) provides a measure of selection pressure for protein-coding ability on nucleotide sequences. Classically, values significantly &lt;&lt;1.0 indicate purifying selection, whereas sequences without coding ability theoretically yield values near 1.0. We examined the trends for Ka/Ks in the population of human TPAs that are conserved syntenically in the rhesus monkey. Ka/Ks was calculated using the PAML package (as described in <it>Methods</it>). The distribution of Ka/Ks is shown for TPAs, split into two groups, those that have a disrupted protein domain of known three-dimensional structure (TPA<sub>DD</sub>, blue bar, Figure <figr fid="F5">5</figr>), and those that do not (TPA<sub>NDD</sub>, red bar, Figure <figr fid="F5">5</figr>).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Distributions of Ka/Ks for TPA<sub>DD</sub>s (blue bar), and TPA<sub>NDD</sub>s (red bar)</p>
               </caption>
               <text>
                  <p><b>Distributions of Ka/Ks for TPA<sub>DD</sub>s (blue bar), and TPA<sub>NDD</sub>s (red bar)</b>. Also shown are: the distribution of Ka/Ks for simulated sequences that have randomly mutated without coding-sequence selection pressures (green curve; derived as described in <it>Methods</it>), and the Ka/Ks distribution for orthologous pairs of known protein-coding genes from rhesus monkey and human.</p>
               </text>
               <graphic file="1471-2164-10-435-5"/>
            </fig>
            <p>In addition, we calculated the distribution of Ka/Ks values for sequences that are randomly mutating without coding-sequence selection pressures. These sequences were generated using the simulation protocol described in '<it>Methods: Ka/Ks ratio calculations'</it>, using the human TPAs as starting sequences (green curve, Figure <figr fid="F5">5</figr>). The Ka/Ks distribution for these simulated sequences does not peak at ~1.0, as would be classically expected. This is likely due to some inaccuracy in modeling the expected frequency for the different possible nucleotide substitutions, which varies for different genomic areas <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The distribution for TPA<sub>DD</sub>s peaks in the range 0.6 to 1.0. This peak is similar for the randomly-mutating sequences (Figure <figr fid="F5">5</figr>). For the TPA<sub>NDD</sub>s, the peak is at lower Ka/Ks values (0.4-0.6).</p>
            <p>As a further comparison, we have calculated the Ka/Ks curve for orthologous pairs of protein-coding genes from the rhesus monkey and the human (blue curve, Figure <figr fid="F5">5</figr>). Clearly, these protein-coding sequences behave very differently from the TPAs, with a substantial mode in the range 0.0 to 0.2. In summary, these Ka/Ks trends indicate that the substitution patterns in the TPAs generally behave like non-protein-coding sequences, and <it>not </it>like protein-coding ones. This is despite the overall significant conservation relative to surrounding intergenic genomic DNA that was discussed in the previous section.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the ratio of non-synonymous to synonymous substitution rates (Ka/Ks) relative to orthologous TPAs in dog and in mouse</p>
            </st>
            <p>To gain a more complete picture, we also examined Ka/Ks values for TPAs that are conserved in two more divergent species, the dog and the mouse. We compared Ka/Ks values for orthologous TPA pairs (termed Ka/Ks<sub><it>&#936;</it>-<it>ortho</it></sub>), with the corresponding Ka/Ks values for their parent genes (Ka/Ks<sub><it>parent</it>-<it>ortho</it></sub>) (Figure <figr fid="F6">6</figr>). These were calculated for human/dog (Figure <figr fid="F6">6(a)</figr>), and human/mouse comparisons (Figure <figr fid="F6">6(b)</figr>). For human/dog comparisons, the substantial majority (83%) have Ka/Ks<sub><it>&#936;</it>-<it>ortho </it></sub>> Ka/Ks<sub><it>parent</it>-<it>ortho</it></sub>, whereas for human/mouse all of the pseudogene pairs have larger Ka/Ks values than their corresponding parent pairs.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Scatter plots showing Ka/Ks ratio comparisons between TPA sequences and their respective orthologous parental protein coding genes for: (a) human/dog comparisons, (b) human/mouse comparisons</p>
               </caption>
               <text>
                  <p><b>Scatter plots showing Ka/Ks ratio comparisons between TPA sequences and their respective orthologous parental protein coding genes for: (a) human/dog comparisons, (b) human/mouse comparisons</b>. Ka/Ks values for TPAs, that are significantly less than values for neutral selection determined by simulation, are indicated as circle symbols, else the Ka/Ks values are indicated with square symbols. Those TPAs that have a disrupted protein domain of known three-dimensional structure are indicated with unfilled symbols, those without such a domain are indicated with filled symbols.</p>
               </text>
               <graphic file="1471-2164-10-435-6"/>
            </fig>
            <p>The Ka/Ks results suggest that these transcribed pseudogenes are relaxing to higher Ka/Ks values, since origination from their parents. But why do they not have Ka/Ks values of ~1.0? We suggest that this is chiefly because: <it>(i) </it>there may be some inaccuracy in modelling the expected frequency for the different possible nucleotide substitutions, which varies for different genomic areas (as noted in the previous section); <it>(ii) </it>in some cases, the transcribed pseudogenes were originally protein-coding, and became disabled subsequently in multiple lineages; <it>(iii) </it>some of them maintain an imprint of the original coding sequence because of selection pressure for regulation of homologous genes <it>via </it>antisense interference (<it>e.g.</it>, through genesis of siRNAs); <it>(iv) </it>selection pressures on non-synonymous codon substitution rates in protein-coding genes, may have relaxed in the pseudogenes, contributing to an apparent relative increase in Ks; <it>(v) </it>it is also possible that some of these sequences are currently protein-coding, and have evolved through multiple coding-sequence disablements, as discussed previously <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
            <p>To examine these data more closely, we calculated whether the Ka/Ks<sub><it>&#936;</it>-<it>ortho </it></sub>values are significantly less than would be expected for mutation without coding-sequence selection pressures (using the simulational analysis described in the <it>Methods </it>section). Several cases with such significant values (that may indicate purifying selection typical of protein-coding sequences), are observed (represented by circles in the Figure <figr fid="F6">6</figr> plots). These Ka/Ks values (that apparently indicate protein-coding ability) may arise for the reasons listed in the preceding paragraph.</p>
            <p>In addition, we examined whether the TPAs contain a protein domain of known three-dimensional structure, that is disabled by a frameshift or a premature stop codon (denoted 'TPA<sub><it>DD</it></sub>s'; see <it>Methods </it>for details of annotation of such domains). The TPA<sub><it>DD</it></sub>s are indicated by unfilled symbols in parts (a) and (b) of Figure <figr fid="F6">6</figr>. Interestingly, in the human-dog comparisons, there are three cases of TPA orthologous pairs that have such a disabled protein domain, despite Ka/Ks values that indicate apparent purifying selection. These sequences are thus of 'intermediate' character, <it>i.e.</it>, they have evidence of both protein-coding ability and pseudogenicity.</p>
         </sec>
         <sec>
            <st>
               <p>Antisense homologies of human pseudogenes to other full-length human cDNAs</p>
            </st>
            <p>Transcribed pseudogenes can regulate the expression of other genes by RNA interference mechanisms. For example, antisense transcribed RNA from a NOS pseudogene regulates the expression of neural nitric oxide synthase (nNOS) protein through formation of RNA duplex <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. Therefore, we investigated how many of the TPAs have antisense homology to the annotated full-length human cDNAs (<it>E</it>-value &lt; 1e<sup>-10 </sup>and alignment length > = 100 nucleotides). A small proportion (8.3%, 69/828) of the human (VEGA) pseudogenic transcripts have either complete or partial, but significant, reverse complement (antisense) homology to human cDNAs. Of these, 63 have short length strong antisense homology to human cDNAs (alignment length > = 20, mismatches &lt; = 2). However, there is no significant association of such antisense homologies to pseudogene transcription, since non-transcribed pseudogenes have similar levels of antisense homology (7.65%, &#967;<sup>2 </sup>test <it>P</it>-value = 0.5).</p>
            <p>Out of the identified 68 human conserved TPAs, 3 have antisense homology to human cDNAs (<it>E</it>-value &lt; 1e<sup>-10</sup>, 5 if alignment length > = 50 nucleotides is considered) (Table <tblr tid="T3">3</tblr>). These are cases that may generate small interfering RNAs (siRNAs) that could potentially regulate the expression levels of their homologous genes. Pseudogenes have been implicated in the negative regulation of parental genes (for a review, see ref. <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>) and in the <it>Dicer</it>-mediated generation of small RNAs <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. It would be interesting to verify experimentally whether these pseudogene transcripts can indeed generate small interfering RNAs, through the action of <it>Dicer</it>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Human conserved TPAs that have antisense homology to human full length cDNAs.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Pseudogene/transcript id</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>ENSEMBL transcript id</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Antisense identity (%)</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Alignment length</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b><it>E</it>-value</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OTTHUMT00000269970</p>
                     </c>
                     <c ca="left">
                        <p>ENST00000323294</p>
                     </c>
                     <c ca="left">
                        <p>94.92</p>
                     </c>
                     <c ca="left">
                        <p>118</p>
                     </c>
                     <c ca="left">
                        <p>7.00e-46</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OTTHUMT00000270027</p>
                     </c>
                     <c ca="left">
                        <p>ENST00000379565</p>
                     </c>
                     <c ca="left">
                        <p>87.59</p>
                     </c>
                     <c ca="left">
                        <p>282</p>
                     </c>
                     <c ca="left">
                        <p>2.00e-70</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>chr10_Q96RG0.4_-</p>
                     </c>
                     <c ca="left">
                        <p>ENST00000315032</p>
                     </c>
                     <c ca="left">
                        <p>83.23</p>
                     </c>
                     <c ca="left">
                        <p>161</p>
                     </c>
                     <c ca="left">
                        <p>1.00e-17</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>urn:lsid:pseudogene.org:9606.Pseudogene:18315</p>
                     </c>
                     <c ca="left">
                        <p>ENST00000344386</p>
                     </c>
                     <c ca="left">
                        <p>96.3</p>
                     </c>
                     <c ca="left">
                        <p>81</p>
                     </c>
                     <c ca="left">
                        <p>3.00e-31</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OTTHUMT00000082689</p>
                     </c>
                     <c ca="left">
                        <p>ENST00000343936</p>
                     </c>
                     <c ca="left">
                        <p>91.38</p>
                     </c>
                     <c ca="left">
                        <p>58</p>
                     </c>
                     <c ca="left">
                        <p>2.00e-12</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Small RNA mappings to pseudogenes</p>
            </st>
            <p>Transcribed pseudogenes can also regulate the transcription of genes by producing siRNAs that have antisense homology <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Due to unavailability of genome-wide human siRNA data, we used the siRNA data for the mouse genome from Tam <it>et al</it>. <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and Watanabe <it>et al</it>. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> to check how many of the small RNAs mapped to mouse transcribed pseudogenes that we identified. Interestingly, 24 out of 136 (17.6%) mouse TPAs had siRNA mappings compared to ~1% (178/18168) of the total mouse pseudogenes. The above difference is statistically significant (<it>P</it>-value &lt; 0.05, using normal statistics for the distribution of the mean number of transcribed pseudogenes in a sample of 136 cases). This demonstrates that transcribed pseudogenes are significantly likely to generate siRNAs in mouse. For comparison, in <it>Arabidopsis thaliana</it>, ~40% of 572 pseudogenes have small RNA mappings <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this study, we identified hundreds of cases of putative transcribed pseudogene annotations (TPAs), in the human genome. Importantly, we detected evidence for selection pressure on these transcribed elements. These findings therefore draw wider attention towards the potential functionality of these genomic elements. In addition, we found that 68 human TPAs are conserved in at least 2 other studied mammals. These human TPAs have ancient origins dating back >120 million years ago, as evidenced by their conservation patterns across distantly related mammals. These pseudogenes represent novel genomic elements of potential functional relevance.</p>
         <p>We have shown that human TPAs that are syntenically conserved in rhesus monkey generally behave like non-protein-coding sequences, despite significant selection pressure on them, relative to the surrounding genomic DNA. Examination of Ka/Ks values for TPAs that are conserved in more divergent species (mouse and dog), indicated that some TPAs might actually be protein-coding. However, we cannot rule out other reasons for these low Ka/Ks values. For example, it is possible that some of these sequences had phases of protein-coding ability at some evolutionary stage. Also, it is possible that there is an imprint of purifying selection on these sequences because of selection pressure to form small interfering RNAs with homologous protein-coding genes. Ultimately, these questions can only be answered by detailed experimental characterization of these molecules; our analysis here provides a rich data source for prioritizing likely candidates of functional importance as transcribed pseudogenes.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>1. Collection of data</p>
            </st>
            <p>Complete genome sequences of mammals were obtained from <url>http://www.ensembl.org</url> (Ensembl release 47 for human genome; Ensembl release 48 for other mammals, namely, rhesus monkey, mouse, rat, cow and dog). Pseudogene annotations for both processed and nonprocessed categories, were obtained from [<url>http://www.pseudogene.org</url>; <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>] and for VEGA pseudogenes from <url>http://vega.sanger.ac.uk/</url>, for disrupted mRNAs (dmRNAs) from Harrison and Yu <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and for other transcribed processed pseudogenes from Harrison <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The Blastx program <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> was used to determine the parent protein coding genes for VEGA pseudogenes (using <it>E</it>-value &lt; 1e<sup>-09 </sup>as significance threshold), whereas for other datasets the annotations were readily available at the respective websites mentioned above.</p>
         </sec>
         <sec>
            <st>
               <p>2. Screening for putative transcribed pseudogenes</p>
            </st>
            <p>Transcription data for human and mouse were taken from RefSeq database <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, Unigene database at the NCBI <url>http://www.ncbi.nlm.nih.gov</url>, H-InvDB database <url>http://www.h-invitational.jp/</url> and Fantom3 database <url>http://fantom3.gsc.riken.jp/</url>. To identify putative transcribed pseudogenes, individual transcript sequences were mapped onto the respective genomes using GMAP software <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> with match criteria of >99% sequence identity and >99% sequence coverage. Transcript sequences that mapped to pseudogenes were aligned to parent protein sequences of respective pseudogenes to identify disablements such as frame shift or premature stop codon using the 'GeneWise' program (Wise2 - version 2.1.20 package downloaded from the European Bioinformatics Institute, <url>http://www.ebi.ac.uk/Tools/Wise2/index.html</url>) <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. The disablement positions in pseudogenes and transcript sequences were then used as 'anchors' to confirm the transcription of pseudogenes as in previous analyses <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B43">43</abbr></abbrgrp>. Additional data file (file 1) contains the list of transcribed human pseudogenes. For a schematic representation of the annotation pipeline, see Figure. <figr fid="F7">7</figr>.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Annotation pipeline for human transcribed and conserved pseudogenes</p>
               </caption>
               <text>
                  <p><b>Annotation pipeline for human transcribed and conserved pseudogenes</b>. (Note: 'dmRNA' represents disrupted mRNA dataset from ref. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, 'tppg' represents transcribed processed pseudogenes from ref. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.)</p>
               </text>
               <graphic file="1471-2164-10-435-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>3. Identification of orthologous pseudogenes in various sequenced mammalian genomes</p>
            </st>
            <p>Orthologous counterparts to a human pseudogene are detected by the presence of a homologous sequence at the syntenic position in the other mammalian genome. Based on this criterion, a search was carried out within 100 kb nucleotides distance of the exact syntenic coordinate (because genes can shuffle locally) in the target mammal as indicated in the synteny maps, to locate the orthologous pseudogenes. 'GeneWise' tool <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> was used, to align the above-obtained genomic DNA sequence and the human parental protein sequence, and to detect disablements in the alignment. The following mammals were included in the analysis: monkey, mouse, rat, cow and dog. The pair wise synteny map data for the various mammals were obtained from <url>http://genome.ucsc.edu/</url>.</p>
         </sec>
         <sec>
            <st>
               <p>4. Analysis for pseudogene sequence conservation</p>
            </st>
            <p>Flanking sequences 5' and 3' of human pseudogenes were individually obtained, of length equal to the length of the human pseudogene, and were each globally aligned using 'needle' module of EMBOSS package <url>http://www.ebi.ac.uk</url> to the corresponding flanking region sequences (10000 nucleotides 5' and 3') of monkey in a sliding window of size also equal to the length of human pseudogene. The window in which best identity score was obtained was considered as the most optimum alignment between the flanking regions, representing syntenic regions. The Wilcoxon signed rank test was used for assessing the statistical significance of the difference between the degrees of homology calculated between two orthologous pseudogenes and that between the respective (orthologous) flanking regions. Cases with pair wise sequence identities &lt;40% were excluded.</p>
         </sec>
         <sec>
            <st>
               <p>5. Analysis of lengths and GC percentage of pseudogenes and their flanking regions</p>
            </st>
            <p>For sequence length and GC percentage calculations, only the exonic segments of pseudogenes were considered. One thousand nucleotides upstream and downstream of a pseudogenes were considered as flanking regions. GC content is calculated as the sum of guanine and cytosine nucleotides divided by the total number of nucleotides represented in terms of percentage.</p>
         </sec>
         <sec>
            <st>
               <p>6. Ka/Ks ratio calculations</p>
            </st>
            <p>'PAL2NAL' <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> was used to construct codon alignments between protein sequences (conceptual amino acid translation sequences in the case of pseudogenes) and corresponding DNA sequences, separately, for orthologous pseudogenes and parental protein coding genes. 'PAML 4' package <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> was used to calculate Ka/Ks ratios. Orthologs of human parental protein coding genes were identified using a similar approach as that for pseudogene orthologs discussed above, and also obtained from Ensembl database.</p>
            <p>We derived a simulation protocol to calculate Ka/Ks values for evolution without coding-sequence selection pressures. This simulation protocol is as follows: <it>(i) </it>the nucleotide distance (D<sub>nt</sub>) between a sequence and its ortholog was calculated, using the program DNADIST <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>; <it>(ii) </it>for each sequence, samples of 500 simulated sequences were generated, by randomly mutating the human sequence until the D<sub>nt </sub>value was reached; <it>(iii) </it>Ka/Ks was calculated using PAML <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, for each simulated sequence compared to the original human sequence; <it>(iv) </it>those original human sequences that have Ka/Ks values &lt; 95% of simulated Ka/Ks values were labeled as potentially under significant purifying selection. For these simulations, all Ka/Ks calculations are performed on the longest ORF in the sequence.</p>
            <p>We also analysed simulated distributions of Ka/Ks for populations of sequences mutating without coding-sequence selection pressures, starting from the human TPA sequences. These were derived simply by merging the simulated distributions of Ka/Ks for each individual TPA.</p>
         </sec>
         <sec>
            <st>
               <p>7. Annotation of disrupted protein domains</p>
            </st>
            <p>Protein domains were assigned to the TPAs, using protein structure domain sequences downloaded from the ASTRALSCOP database <url>http://astral.berkeley.edu</url>, as described previously <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Protein domains sequences were aligned to the TPA nucleotide sequences to assess for disablement by a frameshift or premature stop codon at least 15 amino acids from the end of the aligned subsequence. Disablements were required to be detected both by blast/bl2seq and by the TFASTX program <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B39">39</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>8. Antisense homology</p>
            </st>
            <p>Transcribed human pseudogenes were aligned to full-length annotated human cDNA to examine for any antisense homology by using the sequence-searching program BlastN from the BLAST package (<it>E</it>-value &lt; 1e<sup>-10</sup>).</p>
         </sec>
         <sec>
            <st>
               <p>9. small RNA (siRNA) mapping</p>
            </st>
            <p>siRNAs have been previously determined in the mouse genome <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Using this data we mapped the siRNA sequences onto the mouse genome using GMAP software <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, and checked how many of these overlap with the annotations of transcribed mouse pseudogenes.</p>
         </sec>
         <sec>
            <st>
               <p>10. Phylogenetic analysis</p>
            </st>
            <p>Ortholog sequences to the human transcribed ADP-ribose pyrophosphatase pseudogene (urn:lsid:pseudogene.org:9606.Pseudogene:4346; see Table <tblr tid="T1">1</tblr>), were obtained from the various studied mammals and were aligned using the online ClustalW tool <url>http://www.ebi.ac.uk/clustalw/</url>. The most conserved segment representing 257-396 positions of the human pseudogene was considered for the phylogenetic analysis. Phylogenetic tree was constructed using 'PHYLIP' software <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The tree was evaluated statistically using 1000 bootstrap iterations and was visualized using the 'NJplot' tool <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>A.N.K. and P.M.H. would like to thank the funding support from the National Science and Engineering Research Council of Canada (NSERC), and from Les Fonds Qu&#233;b&#233;cois de la Recherche sur la Nature et les Technologies (FQRNT).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Studying genomes through the aeons: protein families, pseudogenes and proteome evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>318</volume>
            <issue>5</issue>
            <fpage>1155</fpage>
            <lpage>1174</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00109-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12083509</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Pseudogenes: are they "junk" or functional DNA?</p>
            </title>
            <aug>
               <au>
                  <snm>Balakirev</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Ayala</snm>
                  <fnm>FJ</fnm>
               </au>
            </aug>
            <source>Annual review of genetics</source>
            <pubdate>2003</pubdate>
            <volume>37</volume>
            <fpage>123</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genet.37.040103.103949</pubid>
                  <pubid idtype="pmpid" link="fulltext">14616058</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>12</issue>
            <fpage>2541</fpage>
            <lpage>2558</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.1429003</pubid>
                  <pubid idtype="pmcid">403796</pubid>
                  <pubid idtype="pmpid" link="fulltext">14656962</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Carriero</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>8</issue>
            <fpage>2374</fpage>
            <lpage>2383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gki531</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860774</pubid>
                  <pubid idtype="pmcid">1087782</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Analysis of the role of retrotransposition in gene evolution in vertebrates</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Morais</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ivanga</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>308</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-8-308</pubid>
                  <pubid idtype="pmcid">2048973</pubid>
                  <pubid idtype="pmpid" link="fulltext">17718914</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Hegyi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Balasubramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Bertone</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Echols</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>2</issue>
            <fpage>272</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.207102</pubid>
                  <pubid idtype="pmcid">155275</pubid>
                  <pubid idtype="pmpid" link="fulltext">11827946</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Abundant adrenal-specific transcription of the human P450c21A "pseudogene"</p>
            </title>
            <aug>
               <au>
                  <snm>Bristow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gitelman</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Tee</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Staels</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1993</pubdate>
            <volume>268</volume>
            <issue>17</issue>
            <fpage>12919</fpage>
            <lpage>12924</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7685353</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene</p>
            </title>
            <aug>
               <au>
                  <snm>Hirotsune</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Garrett</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Sugiyama</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Takahashi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yagami</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wynshaw-Boris</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yoshiki</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>423</volume>
            <issue>6935</issue>
            <fpage>91</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01535</pubid>
                  <pubid idtype="pmpid" link="fulltext">12721631</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The structure of eight distinct cloned human leukocyte interferon cDNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Goeddel</snm>
                  <fnm>DV</fnm>
               </au>
               <au>
                  <snm>Leung</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Dull</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lawn</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>McCandliss</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Seeburg</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Ullrich</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yelverton</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>PW</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1981</pubdate>
            <volume>290</volume>
            <issue>5801</issue>
            <fpage>20</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/290020a0</pubid>
                  <pubid idtype="pmpid">6163083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Isolation and characterization of rat and human glyceraldehyde-3-phosphate dehydrogenase cDNAs: genomic complexity and molecular evolution of the gene</p>
            </title>
            <aug>
               <au>
                  <snm>Tso</snm>
                  <fnm>JY</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>XH</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Reece</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1985</pubdate>
            <volume>13</volume>
            <issue>7</issue>
            <fpage>2485</fpage>
            <lpage>2502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/13.7.2485</pubid>
                  <pubid idtype="pmcid">341170</pubid>
                  <pubid idtype="pmpid" link="fulltext">2987855</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>High level transcription of the glucocerebrosidase pseudogene in normal subjects and patients with Gaucher disease</p>
            </title>
            <aug>
               <au>
                  <snm>Sorge</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gross</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>West</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Beutler</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Clin Invest</source>
            <pubdate>1990</pubdate>
            <volume>86</volume>
            <issue>4</issue>
            <fpage>1137</fpage>
            <lpage>1141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1172/JCI114818</pubid>
                  <pubid idtype="pmcid">296842</pubid>
                  <pubid idtype="pmpid" link="fulltext">1698821</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Detection of a functional promoter/enhancer in an intron-less human gene encoding a glutamine synthetase-like enzyme</p>
            </title>
            <aug>
               <au>
                  <snm>Chakrabarti</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>McCracken</snm>
                  <fnm>JB</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Chakrabarti</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Souba</snm>
                  <fnm>WW</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1995</pubdate>
            <volume>153</volume>
            <issue>2</issue>
            <fpage>163</fpage>
            <lpage>199</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0378-1119(94)00751-D</pubid>
                  <pubid idtype="pmpid" link="fulltext">7875583</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Transcriptional analysis of the PTEN/MMAC1 pseudogene, psiPTEN</p>
            </title>
            <aug>
               <au>
                  <snm>Fujii</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Morimoto</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Berson</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Bolen</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>1999</pubdate>
            <volume>18</volume>
            <issue>9</issue>
            <fpage>1765</fpage>
            <lpage>1769</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1202492</pubid>
                  <pubid idtype="pmpid" link="fulltext">10208437</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Carriero</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>2374</fpage>
            <lpage>2383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gki531</pubid>
                  <pubid idtype="pmcid">1087782</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860774</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Integrated pseudogene annotation for human chromosome 22: evidence for transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Karro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carriero</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>349</volume>
            <issue>1</issue>
            <fpage>27</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2005.02.072</pubid>
                  <pubid idtype="pmpid" link="fulltext">15876366</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Frame disruptions in human mRNA transcripts, and their relationship with splicing and protein structures</p>
            </title>
            <aug>
               <au>
                  <snm>Harrison</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>371</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-8-371</pubid>
                  <pubid idtype="pmcid">2194788</pubid>
                  <pubid idtype="pmpid" link="fulltext">17937804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Evolutionary fate of retroposed gene copies in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Vinckenbosch</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dupanloup</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <issue>9</issue>
            <fpage>3220</fpage>
            <lpage>3225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0511307103</pubid>
                  <pubid idtype="pmcid">1413932</pubid>
                  <pubid idtype="pmpid" link="fulltext">16492757</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes</p>
            </title>
            <aug>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Totoki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Toyoda</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kaneda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kuramochi-Miyagawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Obata</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chiba</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kohara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kono</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2008</pubdate>
            <volume>453</volume>
            <issue>7194</issue>
            <fpage>539</fpage>
            <lpage>543</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature06908</pubid>
                  <pubid idtype="pmpid" link="fulltext">18404146</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes</p>
            </title>
            <aug>
               <au>
                  <snm>Tam</snm>
                  <fnm>OH</fnm>
               </au>
               <au>
                  <snm>Aravin</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Stein</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Girard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Murchison</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Cheloufi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hodges</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Anger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sachidanandam</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>RM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2008</pubdate>
            <volume>453</volume>
            <issue>7194</issue>
            <fpage>534</fpage>
            <lpage>538</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature06904</pubid>
                  <pubid idtype="pmpid" link="fulltext">18404147</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Genome-wide survey for biologically functional pseudogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Svensson</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Arvestad</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lagergren</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>5</issue>
            <fpage>e46</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020046</pubid>
                  <pubid idtype="pmcid">1456316,1456316</pubid>
                  <pubid idtype="pmpid" link="fulltext">16680195</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Ripples from neighbouring transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Ebisuya</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yamamoto</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nakajima</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nat Cell Biol</source>
            <pubdate>2008</pubdate>
            <volume>10</volume>
            <issue>9</issue>
            <fpage>1106</fpage>
            <lpage>1113</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ncb1771</pubid>
                  <pubid idtype="pmpid">19160492</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Human specific loss of olfactory receptor genes</p>
            </title>
            <aug>
               <au>
                  <snm>Gilad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Man</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lancet</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <issue>6</issue>
            <fpage>3324</fpage>
            <lpage>3327</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.0535697100</pubid>
                  <pubid idtype="pmcid">152291</pubid>
                  <pubid idtype="pmpid" link="fulltext">12612342</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>10</issue>
            <fpage>1466</fpage>
            <lpage>1482</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.331902</pubid>
                  <pubid idtype="pmcid">187539</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368239</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A genome-wide survey of human thioredoxin and glutaredoxin family pseudogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Spyrou</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Padilla</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Holmgren</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Miranda-Vizuete</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Hum Genet</source>
            <pubdate>2001</pubdate>
            <volume>109</volume>
            <issue>4</issue>
            <fpage>429</fpage>
            <lpage>439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004390100597</pubid>
                  <pubid idtype="pmpid" link="fulltext">11702225</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>The human ABC transporter pseudogene family: Evidence for transcription and gene-pseudogene interference</p>
            </title>
            <aug>
               <au>
                  <snm>Piehler</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Hellum</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wenzel</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Kaminski</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Haug</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Kierulf</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kaminski</snm>
                  <fnm>WE</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>165</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-9-165</pubid>
                  <pubid idtype="pmcid">2329642</pubid>
                  <pubid idtype="pmpid" link="fulltext">18405356</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>hsp70 genes in the human genome: Conservation and differentiation patterns predict a wide array of overlapping and specialized functions</p>
            </title>
            <aug>
               <au>
                  <snm>Brocchieri</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Conway de Macario</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Macario</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2008</pubdate>
            <volume>8</volume>
            <fpage>19</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2148-8-19</pubid>
                  <pubid idtype="pmcid">2266713</pubid>
                  <pubid idtype="pmpid" link="fulltext">18215318</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Parallel adaptive radiations in two major clades of placental mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Madsen</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Scally</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Douady</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Kao</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>DeBry</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Adkins</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Amrine</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Stanhope</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>de Jong</snm>
                  <fnm>WW</fnm>
               </au>
               <au>
                  <snm>Springer</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <issue>6820</issue>
            <fpage>610</fpage>
            <lpage>614</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35054544</pubid>
                  <pubid idtype="pmpid" link="fulltext">11214318</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Molecular phylogenetics and the origins of placental mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Murphy</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Eizirik</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>YP</fnm>
               </au>
               <au>
                  <snm>Ryder</snm>
                  <fnm>OA</fnm>
               </au>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>409</volume>
            <issue>6820</issue>
            <fpage>614</fpage>
            <lpage>618</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35054550</pubid>
                  <pubid idtype="pmpid" link="fulltext">11214319</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>A phylogenomic study of human, dog, and mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Cannarozzi</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gonnet</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>1</issue>
            <fpage>e2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.0030002</pubid>
                  <pubid idtype="pmcid">1761043,1761043</pubid>
                  <pubid idtype="pmpid" link="fulltext">17206860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Extensive gene traffic on the mammalian X chromosome</p>
            </title>
            <aug>
               <au>
                  <snm>Emerson</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Betran</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <issue>5657</issue>
            <fpage>537</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090042</pubid>
                  <pubid idtype="pmpid" link="fulltext">14739461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The sequence of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Mural</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>291</volume>
            <issue>5507</issue>
            <fpage>1304</fpage>
            <lpage>1351</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1058040</pubid>
                  <pubid idtype="pmpid" link="fulltext">11181995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Amelioration of bacterial genomes: rates of change and exchange</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>44</volume>
            <issue>4</issue>
            <fpage>383</fpage>
            <lpage>397</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006158</pubid>
                  <pubid idtype="pmpid" link="fulltext">9089078</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Echols</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Balasubramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Luscombe</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Bertone</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <issue>11</issue>
            <fpage>2515</fpage>
            <lpage>2523</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/30.11.2515</pubid>
                  <pubid idtype="pmcid">117176</pubid>
                  <pubid idtype="pmpid" link="fulltext">12034841</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Neuronal expression of neural nitric oxide synthase (nNOS) protein is suppressed by an antisense RNA transcribed from an NOS pseudogene</p>
            </title>
            <aug>
               <au>
                  <snm>Korneev</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Park</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>O'Shea</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Neurosci</source>
            <pubdate>1999</pubdate>
            <volume>19</volume>
            <issue>18</issue>
            <fpage>7711</fpage>
            <lpage>7720</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10479675</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Timed and targeted differential regulation of nitric oxide synthase (NOS) and anti-NOS genes by reward conditioning leading to long-term memory formation</p>
            </title>
            <aug>
               <au>
                  <snm>Korneev</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Straub</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kemenes</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Korneeva</snm>
                  <fnm>EI</fnm>
               </au>
               <au>
                  <snm>Ott</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Benjamin</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>O'Shea</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Neurosci</source>
            <pubdate>2005</pubdate>
            <volume>25</volume>
            <issue>5</issue>
            <fpage>1188</fpage>
            <lpage>1192</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1523/JNEUROSCI.4671-04.2005</pubid>
                  <pubid idtype="pmpid" link="fulltext">15689555</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Elucidation of the small RNA component of the transcriptome</p>
            </title>
            <aug>
               <au>
                  <snm>Lu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Tej</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Haudenschild</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Meyers</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>PJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <issue>5740</issue>
            <fpage>1567</fpage>
            <lpage>1569</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1114112</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141074</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>PseudoPipe: an automated pseudogene identification pipeline</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Carriero</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Karro</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>12</issue>
            <fpage>1437</fpage>
            <lpage>1439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl116</pubid>
                  <pubid idtype="pmpid" link="fulltext">16574694</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Karro</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Yan</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Carriero</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cayting</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Harrrison</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D55</fpage>
            <lpage>60</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl851</pubid>
                  <pubid idtype="pmcid">1669708</pubid>
                  <pubid idtype="pmpid" link="fulltext">17099229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <issue>3</issue>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2231712</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D61</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl842</pubid>
                  <pubid idtype="pmcid">1716718</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>GMAP: a genomic mapping and alignment program for mRNA and EST sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>CK</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>9</issue>
            <fpage>1859</fpage>
            <lpage>1875</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti310</pubid>
                  <pubid idtype="pmpid" link="fulltext">15728110</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>GeneWise and Genomewise</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>5</issue>
            <fpage>988</fpage>
            <lpage>995</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.1865504</pubid>
                  <pubid idtype="pmcid">479130</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123596</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>The evolutionary fate of MULE-mediated duplications of host gene fragments in rice</p>
            </title>
            <aug>
               <au>
                  <snm>Juretic</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hoen</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Huynh</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Bureau</snm>
                  <fnm>TE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>9</issue>
            <fpage>1292</fpage>
            <lpage>1297</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.4064205</pubid>
                  <pubid idtype="pmcid">1199544</pubid>
                  <pubid idtype="pmpid" link="fulltext">16140995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Suyama</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Torrents</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <issue>34 Web Server</issue>
            <fpage>W609</fpage>
            <lpage>612</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkl315</pubid>
                  <pubid idtype="pmcid">1538804</pubid>
                  <pubid idtype="pmpid" link="fulltext">16845082</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>PAML 4: phylogenetic analysis by maximum likelihood</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2007</pubdate>
            <volume>24</volume>
            <issue>8</issue>
            <fpage>1586</fpage>
            <lpage>1591</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msm088</pubid>
                  <pubid idtype="pmpid" link="fulltext">17483113</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>PHYLIP - Phylogeny Inference Package (Version 3.2)</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cladistics</source>
            <pubdate>1989</pubdate>
            <volume>5</volume>
            <fpage>164</fpage>
            <lpage>166</lpage>
         </bibl>
         <bibl id="B47">
            <title>
               <p>WWW-query: an on-line retrieval system for biological sequence banks</p>
            </title>
            <aug>
               <au>
                  <snm>Perriere</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gouy</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Biochimie</source>
            <pubdate>1996</pubdate>
            <volume>78</volume>
            <issue>5</issue>
            <fpage>364</fpage>
            <lpage>369</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0300-9084(96)84768-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">8905155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
