<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2005-6-4-r30</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Genome-wide prediction and identification of <it>cis</it>-natural antisense transcripts in <it>Arabidopsis thaliana</it></p>
			</title>
			<aug>
				<au id="A1">
					<snm>Wang</snm>
					<fnm>Xiu-Jie</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>wangx@rockefeller.edu</email>
				</au>
				<au id="A2">
					<snm>Gaasterland</snm>
					<fnm>Terry</fnm>
					<insr iid="I1"/>
					<insr iid="I3"/>
					<email>gaasterland@mail.rockefeller.edu</email>
				</au>
				<au id="A3" ca="yes">
					<snm>Chua</snm>
					<fnm>Nam-Hai</fnm>
					<insr iid="I4"/>
					<email>chua@mail.rockefeller.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Laboratory of Computational Genomics, The Rockefeller University, New York, NY 10021, USA</p>
				</ins>
				<ins id="I2">
					<p>Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China</p>
				</ins>
				<ins id="I3">
					<p>Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093, USA</p>
				</ins>
				<ins id="I4">
					<p>Laboratory of Plant Molecular Biology, The Rockefeller University, New York, NY 10021, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2005</pubdate>
			<volume>6</volume>
			<issue>4</issue>
			<fpage>R30</fpage>
			<url>http://genomebiology.com/2005/6/4/R30</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">15833117</pubid><pubid idtype="doi">10.1186/gb-2005-6-4-r30</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>17</day>
					<month>12</month>
					<year>2004</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>7</day>
					<month>2</month>
					<year>2005</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>25</day>
					<month>2</month>
					<year>2005</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>15</day>
					<month>3</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>Wang et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<shorttitle>
			<p>Identification of natural antisense transcripts in <it>Arabidopsis</it></p>
		</shorttitle>
		<shortabs>
			<p>A new computational method for predicting <it>cis</it>-encoded natural antisense transcripts (NATs) in <it>Arabidopsis</it> identified 1,340 potential NAT pairs. The expression of both sense and antisense transcripts of 957 NAT pairs was confirmed, and analysis of MPSS data suggested that for most pairs one of the two transcripts is predominantly expressed in a tissue-specific manner.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Natural antisense transcripts (NAT) are a class of endogenous coding or non-protein-coding RNAs with sequence complementarity to other transcripts. Several lines of evidence have shown that <it>cis</it>- and <it>trans</it>-NATs may participate in a broad range of gene regulatory events. Genome-wide identification of <it>cis</it>-NATs in human, mouse and rice has revealed their widespread occurrence in eukaryotes. However, little is known about <it>cis</it>-NATs in the model plant <it>Arabidopsis thaliana</it>.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We developed a new computational method to predict and identify <it>cis</it>-encoded NATs in <it>Arabidopsis </it>and found 1,340 potential NAT pairs. The expression of both sense and antisense transcripts of 957 NAT pairs was confirmed using <it>Arabidopsis </it>full-length cDNAs and public massively parallel signature sequencing (MPSS) data. Three known or putative <it>Arabidopsis </it>imprinted genes have <it>cis</it>-antisense transcripts. Sequences and the genomic arrangement of two <it>Arabidopsis </it>NAT pairs are conserved in rice.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>We combined information from full-length cDNAs and <it>Arabidopsis </it>genome annotation in our NAT prediction work and reported <it>cis</it>-NAT pairs that could not otherwise be identified by using one of the two datasets only. Analysis of MPSS data suggested that for most <it>Arabidopsis cis</it>-NAT pairs, there is predominant expression of one of the two transcripts in a tissue-specific manner.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010019">Plant biology</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>In the past few years, several families of regulatory RNA molecules have been shown to be widely expressed in eukaryotes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Natural antisense transcripts (NATs) belong to one such family. NATs are endogenous RNA molecules whose partial or entire sequences exhibit complementarity to other transcripts. There are two types of NATs. <it>Cis</it>-NATs are transcribed from the same genomic loci as their sense transcripts but on the opposite DNA strand. By contrast, <it>trans</it>-NATs are expressed from genomic regions distinct from those encoding their sense transcripts <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. <it>Cis</it>-NATs and their sense RNAs are usually related in a one-to-one fashion, whereas a single <it>trans</it>-NAT may target several sense transcripts; for example, one type of micro RNA (miRNA) could regulate the expression of several distinct target mRNAs <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
			<p>Studies performed in various organisms have suggested that NATs can participate in a broad range of regulatory events, such as transcription occlusion resulting in the reciprocal expression of sense-antisense RNAs <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> and RNA interference (RNAi) which leads to the degradation of double-stranded sense-antisense transcript pairs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. There is evidence for the involvement of NATs in alternative splicing <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, RNA editing <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, DNA methylation <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, genomic imprinting <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp> and X-chromosome inactivation <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. NATs are also known to regulate expression of some circadian clock genes <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. However, because each of the above regulatory modes was only observed in a few cases, the general biological functions and regulatory mechanisms of NATs are still unclear.</p>
			<p>Recent large-scale NAT identifications in several model organisms have revealed the widespread existence of <it>cis</it>-NATs in eukaryotes. Lehner <it>et al. </it>first reported 372 NATs in human by searching for overlapping mRNA sequences in public databases <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Using a public expressed sequence tag (EST) database, Shendure and Church also found 144 human NATs and 73 mouse NATs <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. In a later work, Yelin <it>et al. </it>predicted 2,667 NATs in human and concluded that around 1,600 NAT pairs were transcribed from both strands after experimental validation <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The RIKEN group identified 2,481 NAT pairs and 899 non-antisense bidirectional transcript units from 60,770 mouse full-length cDNAs <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. A similar analysis by the same group uncovered 687 bidirectional transcript pairs from 32,127 rice (<it>Oryza sativa</it>) full-length cDNAs <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Antisense expression of about 7,600 annotated genes was observed in a recent work using whole-genome arrays to analyze the transcription activity of the <it>A. thaliana </it>genome. However, a detailed list of these <it>Arabidopsis </it>antisense RNAs and their complete analysis is not yet available <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. We note that in all previous investigations NAT prediction focused on <it>cis</it>-NATs only.</p>
			<p>Here, we present results of a genome-wide computational search to predict and identify <it>cis</it>-NATs in <it>Arabidopsis</it>. Combining sequence information of <it>Arabidopsis </it>full-length cDNAs from the public databases and <it>Arabidopsis </it>annotated genes from the <it>Arabidopsis </it>genome release, we have identified 1,340 potential <it>cis</it>-NAT pairs. Expression evidence for transcripts derived from both strands of 957 <it>cis</it>-NAT pairs was obtained from the <it>Arabidopsis </it>full-length cDNA and the public <it>Arabidopsis </it>massively parallel signature sequencing (MPSS) database.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Prediction and identification of <it>Arabidopsis cis</it>-NAT pairs</p>
				</st>
				<p>To search for <it>cis</it>-encoded <it>Arabidopsis </it>natural antisense transcripts, we aligned all <it>Arabidopsis </it>full-length cDNA sequences collected in the UniGene and RIKEN datasets with the <it>Arabidopsis </it>genome sequences. Pairs of transcripts that satisfied the following criteria were selected as <it>cis</it>-encoded natural sense-antisense transcript pairs (referred to as NAT pairs hereafter): first, cDNAs of both transcripts can be uniquely mapped to the <it>Arabidopsis </it>genome with at least 96% sequence identity; second, the two transcripts are derived from opposite strands of the genome; third, both transcripts are encoded by overlapping genomic loci, and the overlap length is longer than 50 nucleotides; fourth, the sense and antisense transcripts have distinct splicing patterns. Applying all of the above criteria, we identified 332 sense-antisense pairs from <it>Arabidopsis </it>full-length cDNAs. These NAT pairs are referred to as cDNA-NATs.</p>
				<p>The 332 pairs of cDNA-NATs can be grouped into two categories. The first category contained 145 NAT pairs in which both the sense and antisense transcripts had nearly perfect annotated gene matches. The second category contained 187 NAT pairs in which at least one transcript had no corresponding annotated gene. This observation led us to hypothesize that additional NAT pairs, whose corresponding cDNAs were not included in the UniGene and RIKEN <it>Arabidopsis </it>full-length cDNA datasets, could be identified using the <it>Arabidopsis </it>genome annotation.</p>
				<p>To identify potential NAT pairs without full-length cDNA evidence, we compared the genomic loci of all <it>Arabidopsis </it>annotated genes to search for gene pairs that overlap in an antiparallel manner. Using the criteria described in Materials and methods, 952 putative NAT pairs were identified from the <it>Arabidopsis </it>genome and were named genomic-NATs. Among the 952 genomic-NATs, 145 pairs had corresponding full-length cDNA for both the sense and antisense genes, and therefore were also included in the cDNA-NAT set. The remaining 807 new NAT pairs were predicted using the <it>Arabidopsis </it>genome annotation only and are referred as the unique genomic-NAT set in the following analysis (Figure <figr fid="F1">1a</figr>).</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Relationships between NAT pairs from different datasets</p>
					</caption>
					<text>
						<p>Relationships between NAT pairs from different datasets. <b>(a) </b>Overlap between cDNA-NAT pairs and genomic-NAT pairs. Among the 332 cDNA-NAT pairs, 145 pairs have corresponding annotated genes for both transcripts. For the other 187 cDNA-NAT pairs, at least one transcript has no counterpart in the current <it>Arabidopsis </it>genome annotation. <b>(b) </b>Overlap between cDNA-, genomic- and genomic-cDNA-NAT pairs. All cDNA-NAT pairs are included in genome-cDNA-NAT pairs. Blue circle, cDNA-NATs; red circle, genomic-NATs; green circle, genomic-cDNA-NATs.</p>
					</text>
					<graphic file="gb-2005-6-4-r30-1"/>
				</fig>
				<p>For most NAT pairs in the second category of the cDNA-NAT set, only one transcript in each pair matched an annotated gene. This indicates that transcripts of some full-length cDNAs could form <it>cis</it>-NAT pairs with other transcripts, although their corresponding genes are not included in the current <it>Arabidopsis </it>genome annotation. In a search of such NAT pairs, we compared the genomic loci of the UniGene and RIKEN <it>Arabidopsis </it>full-length cDNAs with those of annotated genes and identified 1,291 full-length cDNAs whose transcripts could form <it>cis</it>-NAT pairs with potential transcripts of annotated genes (see Materials and methods for criteria). The 1,291 genomic-cDNA-NAT pairs included the 332 cDNA-NAT pairs and 758 unique genomic-NAT pairs. Therefore, 201 unique NAT pairs were predicted by the cDNA-genome comparison approach and are referred to as unique genomic-cDNA-NAT pairs hereafter (Figure <figr fid="F1">1b</figr>).</p>
				<p>In total, we have found 1,340 potential NAT pairs from three categories: 332 pairs with cDNA evidence for both sense and antisense transcripts; 807 pairs based on the <it>Arabidopsis </it>genome annotation (including 758 pairs with full-length cDNA evidence for one strand) and another 201 genomic-cDNA pairs by combining genome annotation with full-length cDNA sequence information.</p>
			</sec>
			<sec>
				<st>
					<p>Characterization of <it>Arabidopsis </it>NAT pairs</p>
				</st>
				<p>We classified the 1,340 unique NAT pairs according to the exon-intron structures of each transcript and their overlapping patterns (Table <tblr tid="T1">1</tblr>). The overlapping patterns of NAT pairs were determined by comparing the exon positions of both transcripts using sim4 <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> alignment results. Consistent with previous reports of NAT pairs in other organisms <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, the majority of <it>Arabidopsis </it>NAT pairs (72.1%) overlapped at their 3' end. For almost all NAT pairs (99%), the overlapping region included exon sequences, with a few exceptions in which one transcript was transcribed entirely from the intronic sequences of the other. Figure <figr fid="F2">2</figr> shows the distribution of overlap lengths of NATs. No obvious chromosomal bias was observed for the genomic distribution of NATs (Table <tblr tid="T2">2</tblr>) <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Structure analysis of NAT pairs</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c ca="left">
								<p>Category</p>
							</c>
							<c cspan="4" ca="center">
								<p>Number of pairs</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>genomic-NAT</p>
							</c>
							<c ca="center">
								<p>genomic-cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>Total</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Tail to tail (3' to 3')</p>
							</c>
							<c ca="center">
								<p>181</p>
							</c>
							<c ca="center">
								<p>737</p>
							</c>
							<c ca="center">
								<p>48</p>
							</c>
							<c ca="center">
								<p>966 (72.1%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Head to head (5' to 5')</p>
							</c>
							<c ca="center">
								<p>97</p>
							</c>
							<c ca="center">
								<p>31</p>
							</c>
							<c ca="center">
								<p>57</p>
							</c>
							<c ca="center">
								<p>185 (13.8%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>One transcript contained entirely within the other transcript</p>
							</c>
							<c ca="center">
								<p>51</p>
							</c>
							<c ca="center">
								<p>35</p>
							</c>
							<c ca="center">
								<p>90</p>
							</c>
							<c ca="center">
								<p>176 (13.1%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Two transcripts overlap only within introns</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>13 (1.0%)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>332</p>
							</c>
							<c ca="center">
								<p>807</p>
							</c>
							<c ca="center">
								<p>201</p>
							</c>
							<c ca="center">
								<p>1,340 (100%)</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Chromosomal distribution of NAT pairs</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="left">
								<p>Chromosome</p>
							</c>
							<c cspan="4" ca="center">
								<p>Number of NAT pairs</p>
							</c>
							<c ca="center">
								<p>Chromosome size (Mb)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="4">
								<hr/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>genomic-NAT</p>
							</c>
							<c ca="center">
								<p>genomic-cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>Total</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="center">
								<p>85</p>
							</c>
							<c ca="center">
								<p>216</p>
							</c>
							<c ca="center">
								<p>55</p>
							</c>
							<c ca="center">
								<p>356</p>
							</c>
							<c ca="center">
								<p>29.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2</p>
							</c>
							<c ca="center">
								<p>41</p>
							</c>
							<c ca="center">
								<p>120</p>
							</c>
							<c ca="center">
								<p>40</p>
							</c>
							<c ca="center">
								<p>201</p>
							</c>
							<c ca="center">
								<p>19.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3</p>
							</c>
							<c ca="center">
								<p>69</p>
							</c>
							<c ca="center">
								<p>142</p>
							</c>
							<c ca="center">
								<p>46</p>
							</c>
							<c ca="center">
								<p>257</p>
							</c>
							<c ca="center">
								<p>23.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4</p>
							</c>
							<c ca="center">
								<p>48</p>
							</c>
							<c ca="center">
								<p>129</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="center">
								<p>206</p>
							</c>
							<c ca="center">
								<p>17.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>5</p>
							</c>
							<c ca="center">
								<p>89</p>
							</c>
							<c ca="center">
								<p>200</p>
							</c>
							<c ca="center">
								<p>31</p>
							</c>
							<c ca="center">
								<p>320</p>
							</c>
							<c ca="center">
								<p>26.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>332</p>
							</c>
							<c ca="center">
								<p>807</p>
							</c>
							<c ca="center">
								<p>201</p>
							</c>
							<c ca="center">
								<p>1340</p>
							</c>
							<c ca="center">
								<p>115.4</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Distribution of genomic overlap lengths of NATs</p>
					</caption>
					<text>
						<p>Distribution of genomic overlap lengths of NATs. The overlap length of each NAT pair in exons was calculated. The number of NAT pairs (<it>y</it>-axis) is plotted against the overlap lengths (in nucleotides) of exons in each NAT pair (<it>x</it>-axis).</p>
					</text>
					<graphic file="gb-2005-6-4-r30-2"/>
				</fig>
				<p>The sim4 cDNA alignment results showed that some <it>Arabidopsis </it>full-length cDNAs are non-spliced transcripts. To assess the quality of full-length cDNAs, we systematically compared the splicing pattern and coding potential of all full-length cDNAs used in this study to all predicted <it>Arabidopsis </it>genes. Our result showed that the proportion of non-spliced transcripts in UniGene and RIKEN full-length cDNAs was lower than the proportion of non-spliced transcripts in annotated genes, indicating non-spliced cDNAs are likely to be derived from <it>bona fide </it>transcripts rather than genomic DNA contamination (Table <tblr tid="T3">3</tblr>).</p>
				<tbl id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Splicing pattern and coding potential of <it>Arabidopsis </it>full-length cDNAs and annotated genes</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>UniGene cDNAs</p>
							</c>
							<c ca="center">
								<p>RIKEN cDNAs</p>
							</c>
							<c ca="center">
								<p>The <it>Arabidopsis </it>genome</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total transcripts</p>
							</c>
							<c ca="center">
								<p>20,683</p>
							</c>
							<c ca="center">
								<p>13,181</p>
							</c>
							<c ca="center">
								<p>29,993</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of transcripts with perfect genome match</p>
							</c>
							<c ca="center">
								<p>17,814</p>
							</c>
							<c ca="center">
								<p>12,877</p>
							</c>
							<c ca="center">
								<p>29,993</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of transcripts with ORFs</p>
							</c>
							<c ca="center">
								<p>16,621</p>
							</c>
							<c ca="center">
								<p>12,544</p>
							</c>
							<c ca="center">
								<p>26,207</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of non-spliced transcripts with ORFs</p>
							</c>
							<c ca="center">
								<p>2,534</p>
							</c>
							<c ca="center">
								<p>1,555</p>
							</c>
							<c ca="center">
								<p>4,722</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of transcripts without ORFs</p>
							</c>
							<c ca="center">
								<p>1,193</p>
							</c>
							<c ca="center">
								<p>333</p>
							</c>
							<c ca="center">
								<p>3,786</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of non-spliced transcripts without ORFs</p>
							</c>
							<c ca="center">
								<p>466</p>
							</c>
							<c ca="center">
								<p>130</p>
							</c>
							<c ca="center">
								<p>3,786</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>The splicing pattern of each transcript was obtained by aligning its corresponding cDNA sequences to the <it>Arabidopsis </it>genome using sim4. The coding potential of the genomic sequence of each transcript was examined by GeneScan.</p>
					</tblfn>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>Expression analysis of NAT pairs using public <it>Arabidopsis </it>MPSS data</p>
				</st>
				<p>To investigate the expression of our predicted NAT pairs, we used the public <it>Arabidopsis </it>MPSS data at the University of Delaware <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. MPSS is a bead-based sequencing technology that identifies a sequence of 17-20 nucleotides from each transcript. This sequencing technique is capable of identifying new, rarely expressed transcripts. MPSS can also quantitatively measure the expression level of a transcript because the transcripts per million (TMP) value for a transcript in the sequencing results reflect its <it>in vivo </it>abundance <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>.</p>
				<p>The public <it>Arabidopsis </it>MPSS database contains 87,705 'trusted' signature sequences from 14 cDNA libraries. By aligning these MPSS sequences to the <it>Arabidopsis </it>genome and the 1,340 NAT pairs, we identified 455 NAT pairs with unique MPSS matches on both the sense and antisense strands, including 103 cDNA-NAT pairs, 293 genomic-NAT pairs and 59 genomic-cDNA-NAT pairs. Because MPSS signatures are short 17-nucleotide sequences identified from each transcript, sequences with multiple genomic loci were excluded from our analysis to avoid ambiguity with respect to the origin of a MPSS signature and to ensure fidelity of assigning a MPSS signature to its corresponding transcript (see Materials and methods for details). Among the 455 NAT pairs with unambiguous MPSS data for both transcripts, expression of both transcripts of 78 pairs was only found in distinct libraries, indicating these NAT pairs might have an exclusive transcription relationship. For the other 377 NAT pairs, expression of the sense and antisense transcripts was mainly observed in different libraries or one transcript was dominantly expressed when both transcripts could be detected in the same library (Tables <tblr tid="T4">4</tblr> and <tblr tid="T5">5</tblr>). For a pair of NATs found in the same library, if the TPM value of one transcript is at least three times as high as that of the other transcript, we consider that transcript as dominantly expressed. The number of coexpressed and dominantly expressed transcripts in each library was shown in Figure <figr fid="F3">3</figr>. On average, coexpression was only observed in two of the 14 tested sample libraries for each of the 377 NAT pairs, whereas dominant expression of one transcript was observed in 9 libraries. No expression was detected in the remaining libraries.</p>
				<tbl id="T4">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>Summary of MPSS matches for NAT pairs</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c>
								<p/>
							</c>
							<c cspan="4" ca="center">
								<p>Number of NAT pairs</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>genomic-NAT</p>
							</c>
							<c ca="center">
								<p>genomic-cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>Total</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total NAT pairs</p>
							</c>
							<c ca="center">
								<p>332</p>
							</c>
							<c ca="center">
								<p>807</p>
							</c>
							<c ca="center">
								<p>201</p>
							</c>
							<c ca="center">
								<p>1,340</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of pairs with MPSS matches on both strands</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>103</p>
							</c>
							<c ca="center">
								<p>293</p>
							</c>
							<c ca="center">
								<p>59</p>
							</c>
							<c ca="center">
								<p>455</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Expressed absolutely in different libraries</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>49</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>78</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Expressed mainly in different libraries, occasionally in same libraries</p>
							</c>
							<c ca="center">
								<p>89</p>
							</c>
							<c ca="center">
								<p>244</p>
							</c>
							<c ca="center">
								<p>44</p>
							</c>
							<c ca="center">
								<p>377</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<tbl id="T5">
					<title>
						<p>Table 5</p>
					</title>
					<caption>
						<p>Examples of NAT pairs with MPSS matches on both strands</p>
					</caption>
					<tblbdy cols="16">
						<r>
							<c ca="left">
								<p>ID</p>
							</c>
							<c ca="center">
								<p>Strand</p>
							</c>
							<c cspan="14" ca="center">
								<p>Libraries</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c cspan="14">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>CAF</p>
							</c>
							<c ca="center">
								<p>INF</p>
							</c>
							<c ca="center">
								<p>LEF</p>
							</c>
							<c ca="center">
								<p>ROF</p>
							</c>
							<c ca="center">
								<p>SIF</p>
							</c>
							<c ca="center">
								<p>AP1</p>
							</c>
							<c ca="center">
								<p>AP3</p>
							</c>
							<c ca="center">
								<p>AGM</p>
							</c>
							<c ca="center">
								<p>INS</p>
							</c>
							<c ca="center">
								<p>ROS</p>
							</c>
							<c ca="center">
								<p>SAP</p>
							</c>
							<c ca="center">
								<p>S04</p>
							</c>
							<c ca="center">
								<p>S52</p>
							</c>
							<c ca="center">
								<p>LES</p>
							</c>
						</r>
						<r>
							<c cspan="16">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Pair A</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>At1g09750</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>9</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>0</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>1</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>1</it>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>At1g09760</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c ca="center">
								<p>
									<it>70</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>39</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>32</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>46</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>30</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>240</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>125</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>139</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>208</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>170</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>56</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>48</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>48</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>45</it>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Pair B</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>At1g72060</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>
									<it>5</it>
								</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>
									<it>31</it>
								</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>
									<it>74</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>79</it>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>At1g72070</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c ca="center">
								<p>
									<it>0</it>
								</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>
									<it>8</it>
								</p>
							</c>
							<c ca="center">
								<p>N</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>N</it>
								</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Distinct expression of sense and antisense transcripts of NAT pair A was observed in all but one library. In the library where both transcripts of pair A were expressed, the abundance of one transcript was significantly higher than the other. For NAT pair B, the sense and antisense transcripts were expressed differentially in different libraries. Libraries in which both transcripts of a NAT pairs were expressed are shown in bold; libraries in which transcripts of only one gene of a NAT pairs were expressed are shown in italics. Abbreviations for libraries: CAF, callus - actively growing, classic MPSS; INF, infloresence - mixed stage, immature buds, classic MPSS; LEF, leaves - 21 day, untreated, classic MPSS; ROF, root - 21 day, untreated, classic MPSS; SIF, silique - 24-48 h post-fertilization, classic MPSS; AP1, ap1-10 infloresence - mixed stage, immature buds; AP3, ap3-6 infloresence - mixed stage, immature buds; AGM, agamous infloresence - mixed stage, immature buds; INS, infloresence - mixed stage, immature buds; ROS, root - 21 day, untreated; SAP, sup/ap1 infloresence - mixed stage, immature buds; S04, leaves, 4 h after salicylic acid treatment; S52, leaves, 52 h after salicylic acid treatment; LES, leaves - 21 day, untreated.</p>
					</tblfn>
				</tbl>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Distribution of coexpressed and dominantly expressed NAT pairs in different libraries</p>
					</caption>
					<text>
						<p>Distribution of coexpressed and dominantly expressed NAT pairs in different libraries. The number of coexpressed NAT pairs in each library was shown in blue bar and that of dominantly expressed NAT pairs in red bar. See legend of Table 5 for library information.</p>
					</text>
					<graphic file="gb-2005-6-4-r30-3"/>
				</fig>
				<p>We also found additional 222 genomic-NAT pairs and 51 genomic-cDNA-NAT pairs with full-length cDNA evidence for one transcript and MPSS data for the other transcript. Together with the 332 cDNA-NAT pairs, we have obtained either full-length cDNA or MPSS expression evidence for both transcripts of 957 NAT pairs, corresponding to 71.4% of the total 1340 pairs ((455 - 103) + 332 + 222 + 51 = 957).</p>
			</sec>
			<sec>
				<st>
					<p>siRNA matches of NAT pairs</p>
				</st>
				<p>We compared short interfering RNA (siRNA) sequences collected in the <it>Arabidopsis </it>small RNA database to investigate the possibility that <it>cis</it>-NAT pairs may generate siRNAs. Similar to the MPSS alignment process, only siRNAs with unique loci on the <it>Arabidopsis </it>genome were used in the comparison to ensure unambiguous assignment. We found 11 pairs of NATs had siRNA sequences mapped uniquely to their overlapping region (Table <tblr tid="T6">6</tblr>). SiRNAs of all but one NAT pairs originated from their overlap region, the only exception being pair At#S18901030 and At#S18898439, whose overlap length was only 52 nucleotides long.</p>
				<tbl id="T6">
					<title>
						<p>Table 6</p>
					</title>
					<caption>
						<p>siRNA matches of NAT pairs</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c ca="left">
								<p>Category of NAT pairs</p>
							</c>
							<c ca="center">
								<p>Gene ID</p>
							</c>
							<c ca="center">
								<p>Strand</p>
							</c>
							<c ca="center">
								<p>Overlap length (nucleotides)</p>
							</c>
							<c ca="left">
								<p>Description</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Genomic-NAT</p>
							</c>
							<c ca="center">
								<p>At2g06510</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>506</p>
							</c>
							<c ca="left">
								<p>Replication protein, putative</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At2g06520</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Membrane protein, putative</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At4g35850</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>360</p>
							</c>
							<c ca="left">
								<p>Pentatricopeptide (PPR) repeat-containing protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At4g35860</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Ras-related GTP-binding protein, putative</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g20720</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>294</p>
							</c>
							<c ca="left">
								<p>Chaperonin, chloroplast</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g20730</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Auxin-responsive factor</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g41680</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>587</p>
							</c>
							<c ca="left">
								<p>Protein kinase family protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g41685</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Mitochondrial import receptor subunit TOM7</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g48870</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>118</p>
							</c>
							<c ca="left">
								<p>Small nuclear ribonucleoprotein, putative</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g48880</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Acetyl-CoA C-acyltransferase 1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>RAFL19-56-G17</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>1,209</p>
							</c>
							<c ca="left">
								<p>No coding potential</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>RAFL09-70-E21</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Expressed protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18901030</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>52</p>
							</c>
							<c ca="left">
								<p>Putative transcription factor</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18898439</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Pentatricopeptide (PPR) repeat containing protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18900150</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>884</p>
							</c>
							<c ca="left">
								<p>No coding potential</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18898471</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>expressed protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18912025</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>1,149</p>
							</c>
							<c ca="left">
								<p>No coding potential</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18898946</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TCP family transcription factor</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Genomic-cDNA-NAT</p>
							</c>
							<c ca="center">
								<p>At1g07725</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>1,640</p>
							</c>
							<c ca="left">
								<p>Exocyst subunit EXO70 family protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At#S18898556</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>No coding potential</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At2g16587</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>379</p>
							</c>
							<c ca="left">
								<p>expressed protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>RAFL19-48-E15</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>No coding potential</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>Conservation of <it>Arabidopsis </it>NAT pairs in rice</p>
				</st>
				<p>To examine whether NAT pairs might be conserved during evolution, we compared the protein sequences of the 1,340 putative <it>Arabidopsis </it>NAT pairs with the protein sequences of the 687 predicted rice NAT pairs <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Orthologs of two <it>Arabidopsis </it>NAT pairs were also encoded by antiparallel genes originated from the same locus in rice (Table <tblr tid="T7">7</tblr>). In addition, homologs of one transcript of 392 <it>Arabidopsis </it>NAT pairs were also found in the rice NAT set.</p>
				<tbl id="T7">
					<title>
						<p>Table 7</p>
					</title>
					<caption>
						<p>Conserved NAT pairs of <it>Arabidopsis </it>and rice</p>
					</caption>
					<tblbdy cols="7">
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>ID</p>
							</c>
							<c ca="center">
								<p>Strand</p>
							</c>
							<c ca="center">
								<p>Overlap pattern</p>
							</c>
							<c ca="center">
								<p>Overlap length (nucleotides)</p>
							</c>
							<c ca="left">
								<p>Description</p>
							</c>
						</r>
						<r>
							<c cspan="7">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NAT pair 1</p>
							</c>
							<c ca="left">
								<p>
									<it>Arabidopsis</it>
								</p>
							</c>
							<c ca="center">
								<p>At5g02820</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>Tail to tail</p>
							</c>
							<c ca="center">
								<p>1,138</p>
							</c>
							<c ca="left">
								<p>DNA topoisomerase VIA</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g02830</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>PPR repeat-containing protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Rice</p>
							</c>
							<c ca="center">
								<p>J033010B03</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>Tail to tail</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>DNA topoisomerase VIA</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>J013135M09</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>PPR repeat-containing protein</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NAT pair 2</p>
							</c>
							<c ca="left">
								<p>
									<it>Arabidopsis</it>
								</p>
							</c>
							<c ca="center">
								<p>At5g54270</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>Tail to tail</p>
							</c>
							<c ca="center">
								<p>1,047</p>
							</c>
							<c ca="left">
								<p>Chlorophyll A-B binding protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>At5g54280</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Myosin heavy chain</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Rice</p>
							</c>
							<c ca="center">
								<p>006-301-C08</p>
							</c>
							<c ca="center">
								<p>+</p>
							</c>
							<c ca="center">
								<p>Tail to tail</p>
							</c>
							<c ca="center">
								<p>4,425</p>
							</c>
							<c ca="left">
								<p>Chlorophyll A-B binding protein</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>J013155K02</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Myosin heavy chain</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>Although NATs are often seen in prokaryotes, their prevalence in eukaryotes was not detected until the past few years <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B34">34</abbr></abbrgrp>. In this work, we combined sequence information on <it>Arabidopsis </it>full-length cDNAs with that from the <it>Arabidopsis </it>genome annotation and identified 1,340 potential <it>cis</it>-NAT pairs in <it>Arabidopsis </it>(Additional data file 1, 2, 3).</p>
			<sec>
				<st>
					<p>Assessment of our NAT prediction methods</p>
				</st>
				<p>The 1,340 <it>Arabidopsis </it>NAT pairs were identified from three sources. First, by aligning full-length cDNA sequences to the <it>Arabidopsis </it>genome, we identified 332 cDNA-NAT pairs. However, comparison of these 332 cDNA-NAT pairs with <it>Arabidopsis </it>annotated genes showed that more than half of these NAT pairs had one partner that was not included in the current <it>Arabidopsis </it>genome annotation. Because traditional genome annotation mainly aims at the identification of protein coding genes within a genome, there is the possibility that non-coding antisense transcripts may be overlooked by currently trained gene finders. A recent report using a genome tiling array to examine the transcription activity of the entire <it>Arabidopsis </it>genome also supports this notion <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
				<p>To search for potential NAT pairs not included in the current full-length <it>Arabidopsis </it>cDNA library, we compared the genomic coordinates of all annotated genes with each other and with those of full-length cDNAs. This approach uncovered another 807 overlapping genomic-NAT pairs based on the annotation of their corresponding genes, and 201 genomic-cDNA-NAT pairs, each including a transcript derived from an annotated gene on one strand and a transcript represented in the full-length cDNA database on the other strand. The full-length cDNAs included in genomic-cDNA-NAT pairs either had no annotated gene match or their corresponding transcripts cannot form <it>cis</it>-NAT pairs with transcripts of other genes based on their annotation. These results indicate that although the <it>Arabidopsis </it>genome is currently one of the best annotated eukaryotic genomes, a lot of information is still missing. The identification in eukaryotes of several classes of regulatory RNA genes, such as those encoding natural antisense transcripts, which are the focus here, will not only further our understanding of genome structure and gene regulation, but will also open a new window for improved genome annotation.</p>
				<p>Most antisense prediction work reported to date has focused on identifying NATs from expressed cDNAs and ESTs <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. In this work, we avoided using ESTs because of the ambiguous orientation of some sequences. We also included sequence information of annotated <it>Arabidopsis </it>genes in our NAT prediction in order to provide a more complete picture of antisense transcripts in <it>Arabidopsis</it>. The reliability of our approach is supported by the following lines of evidence: first, the expression of both sense and antisense transcripts of 293 pairs of genomic-NATs (36.3% of a total of 807) was observed in the public MPSS data, and another 222 genomic-NAT pairs (27.5% of a total of 807) have full-length cDNA evidence for one transcript and associated MPSS data for the other transcript; second, the two NAT pairs which were conserved in rice were also identified in our <it>Arabidopsis </it>genomic-NAT dataset; third, it is known that imprinted genes are likely subject to antisense regulation; three of the six reported <it>Arabidopsis </it>imprinted genes <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>, <it>FIE</it>, <it>FIS2 </it>and <it>MSI1</it>, are included in our genomic-NAT sets. However, it remains possible that some genomic-NAT pairs are false positives if the lengths of their untranslated regions (UTRs) were annotated inaccurately.</p>
				<p>In rice, both transcripts of 86% of the NAT pairs have coding sequence (CDS) regions whereas 28% of the predicted <it>Arabidopsis </it>NAT pairs include at least one transcript without coding potential. Non-protein-coding transcripts are more prevalent in cDNA-and genomic-cDNA-NAT pairs in that 170 cDNA NAT pairs and 156 genomic-cDNA-NAT pairs include one non-protein-coding transcript. We used Genescan to evaluate the coding potential of each transcript by screening their corresponding genomic DNA sequence for valid gene structures. Using annotated genes as controls, we estimated the false-negative rate of our definition of coding potential to be 2.3%. Unlike CDS-containing antisense transcripts that may be translated into proteins under certain conditions, transcripts without any protein-coding potential could possess solely regulatory functions.</p>
				<p>In our work described here, and in all other genome-wide antisense transcript identification papers published so far <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, the investigation was focused on <it>cis</it>-antisense RNAs, which are transcribed from the same genomic loci as their sense RNAs, but on the opposite genome strand. To ensure the <it>cis</it>-antisense relationship of NATs reported here, only cDNAs with unique genomic loci were included in this study. We note that certain number of <it>trans</it>-antisense transcripts also exist in cells. Examples include miRNAs and siRNAs which are widely studied in most model organisms <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Genome-wide identification of <it>trans</it>-antisense transcripts in <it>Arabidopsis </it>is being attempted.</p>
			</sec>
			<sec>
				<st>
					<p>Evaluation of NAT expression using MPSS data</p>
				</st>
				<p>The non-gel-based properties of MPSS technology render it an ideal resource for evaluating the expression profile of NAT pairs for the following reasons: first, because the MPSS technology captures almost all polyadenylated transcripts within cells, this technology is theoretically capable of identifying new, rarely expressed transcripts without prior knowledge of their corresponding genes; second, the digital result of MPSS reflects the expression pattern of a sequenced RNA molecule, and therefore provides a quantitative relationship between the sense and antisense transcript of a NAT pair in different tissues. This information was not available in any of the previous NAT prediction work <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>.</p>
				<p>Using the full-length cDNA and public <it>Arabidopsis </it>MPSS data, we were able to obtain expression evidence for both transcripts of 957 NAT pairs. The digital nature of MPSS data enabled us to evaluate the expression relationship of the sense and antisense transcripts directly. Our results showed that the sense and antisense transcripts of a NAT pair tend to be expressed in different tissues or under different conditions. In addition, in cases where the sense and antisense transcripts of a NAT pair were expressed in the same library, one type of transcript was usually more abundant than the other. On average, transcripts of NAT pairs were found to be coexpressed in only two libraries, whereas dominant expression (the expression level of one transcript was at least three times higher than that of the other transcript) or absolute expression (only one transcript of a NAT pair was expressed) was observed in nine libraries. The tissue-specific expression of sense and antisense transcripts observed in this study is consistent with the <it>Arabidopsis </it>genome transcription study using a whole genome-tiling array, in which about 7,600 genes were found to have tissue-specific sense and antisense expression <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Although a detailed list of these 7,600 genes is not yet available, it is possible that for some genes not included in our list, the antisense transcription activity was contributed by <it>trans</it>-antisense transcripts. This could explain why we predicted fewer NAT pairs than the previous work, as our work only focuses on <it>cis</it>-antisense transcripts.</p>
				<p>To ensure the MPSS sequences were indeed generated by their matching transcripts, all MPSS data were first aligned with the <it>Arabidopsis </it>genome and all annotated mRNAs to remove signatures with multiple genomic loci. Therefore, unless an MPSS signature sequence was derived from the joint-exon region of some transcripts that are not included in the current genome annotation, it should originate from its corresponding transcript.</p>
			</sec>
			<sec>
				<st>
					<p>Speculation on the function and origin of NATs</p>
				</st>
				<p>One possible function of NATs is to trigger the degradation of their sense transcripts via the RNAi pathway. However, in our study, we found only 11 NAT pairs with known siRNA matches. There are two possible explanations for this observation. First, the current public <it>Arabidopsis </it>siRNA database, which only contains 1,822 unique siRNA sequences, is small and does not cover all siRNAs associated with sequences of the NAT pairs reported here. Second, all NATs identified in this work are <it>cis</it>-antisense transcripts. siRNAs are used to downregulate expression levels of their target mRNA to achieve a low protein concentration. <it>Cis</it>-antisense transcripts can accomplish the same goal by interfering with the transcription of their sense transcripts, and this might be a more energy-efficient mechanism to achieve local gene regulation. This hypothesis predicts that we would expect to find more siRNAs associated with <it>trans</it>-antisense transcripts.</p>
				<p>For most NAT pairs with associated MPSS data for both transcripts, the expression of sense and antisense transcripts tends to occur in different tissues. In these cases, we could speculate that transcription of genes encoding these NAT transcript pairs may be regulated by similar factors but that the production of antisense transcripts might interfere with the transcription of their sense transcripts, resulting in reciprocal expression patterns. Another possibility is that the two genes of a NAT pair are subject to different transcriptional regulation and consequently they are never expressed in the same tissue at the same time. Functional analysis of all NAT pairs using gene ontology reveals no over-representation of any functional category compared to the <it>Arabidopsis </it>genome, indicating that <it>cis</it>-antisense regulation might be a global mechanism for all gene families. Further experiments are needed to investigate the validity of these hypotheses.</p>
				<p>Antiparallel transcription and antisense transcripts are known to be involved in genomic imprinting of <it>Xist </it>gene in mouse and human <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. There is supporting evidence that the <it>MEA </it>and <it>PHE </it>genes of <it>Arabidopsis </it>are imprinted <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and in addition, <it>FIS2</it>, <it>FIE</it>, <it>MSI1 </it>and <it>FWA </it>may also be imprinted, although the evidence for these four other genes is not unequivocal <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp>. Nonetheless, we found antisense transcription units for <it>FIS2</it>, <it>FIE </it>and <it>FWA</it>, suggesting that transcription of these three genes might be regulated by antisense transcripts, or their antisense transcripts might be involved in silencing their expression. Genomic imprinting usually involves a chromosomal locus and, in certain cases, may even extend overa chromosomal region. Given the close proximity of the sense-antisense gene transcripts if one member of the pair is imprinted, it is likely that the other would be subject to the same regulation. Unfortunately, because of the absence of data on imprinted genes in rice, we were unable to examine whether imprinted genes were also subject to antisense regulation in rice.</p>
				<p>We found that two <it>Arabidopsis </it>NAT pairs are conserved in rice. These conserved NAT pairs could be used to study the antisense regulatory mechanism and the origin of NATs in plants. Given over 150 million years of evolutionary distance between <it>Arabidopsis </it>and rice, the gene order on the two genomes has diverged quite significantly. Therefore, the conservation of these two NAT pairs might have some functional relevance. A closer comparison of the <it>Arabidopsis </it>and rice NAT pairs and the identification of additional conserved NAT pairs could help address this issue.</p>
				<p>Taken together, our results provide the first genome-wide identification and prediction of NATs in <it>Arabidopsis</it>. These results will facilitate functional studies of NATs in this model plant, as well as in other plant species, and help to unravel complex gene regulatory networks in eukaryotes.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Identification of sense-antisense transcript pairs from full-length cDNA datasets</p>
				</st>
				<p>The <it>Arabidopsis </it>UniGene (Build 45) dataset (file named At.seq.all) was downloaded from the National Center for Biotechnology Information (NCBI) UniGene Resources <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. A total of 20,683 full-length cDNA sequences were extracted from the UniGene dataset by selecting sequences marked as 'Full-length/full-length cDNA'. The RIKEN <it>Arabidopsis </it>full-length cDNA dataset, which contains 13,181 sequences, was downloaded from the RIKEN BioResource Center (BRC) <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. The 20,683 UniGene and 13,181 RIKEN full-length cDNAs were aligned to the <it>Arabidopsis </it>genome sequences from The Institute for Genomic Research (TIGR) (release version 5) <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> by BLAT. The splicing pattern of the transcript derived from each cDNA was further confirmed using the sim4 sequence alignment program <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. cDNAs with at least 96% sequence identity to the <it>Arabidopsis </it>genome were used in the following analysis. For pairs of cDNAs encoded by opposite strands of the <it>Arabidopsis </it>genome and sharing overlapping genomic loci, if both their corresponding sense and antisense transcripts had no other genomic locations and exhibited different splicing patterns, they were selected as encoding sense-antisense transcript pairs and are referred to as cDNA-NAT pairs in the text.</p>
			</sec>
			<sec>
				<st>
					<p>Prediction of sense-antisense transcript pairs using the <it>Arabidopsis </it>genome annotation and full-length cDNAs</p>
				</st>
				<p>We used the <it>A. thaliana </it>genome annotations from TIGR (release version 5) in this study <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Putative NAT pairs were identified on the basis of annotated genomic loci of <it>Arabidopsis </it>genes. If a pair of overlapping genes were located on opposite strands of the <it>Arabidopsis </it>genome and at least one gene had no annotated UTR at the overlap end, their encoded transcripts were selected as a putative NAT pair regardless of the overlap length of the encoded transcripts. Otherwise, if a pair of antiparallel overlapping genes both have annotated UTR regions at the overlap end, the overlap length of their encoded transcripts must be longer than 50 nucleotides to qualify as NAT pairs. NAT pairs from the above two categories are both referred to as genomic-NAT pairs in the text.</p>
				<p>Genomic-cDNA-NAT pairs were identified by comparing the genomic loci of full-length cDNAs with those of annotated genes. UniGene and RIKEN full-length cDNAs with unique genomic locations and at least 96% sequence identity to the <it>Arabidopsis </it>genome were used in this step. Using the same criteria for genomic NATs, if an annotated gene had a overlap cDNA match on the opposite strand and the transcript of the annotated gene and that derived from the antisense cDNA had different splicing patterns, the gene and its matching cDNA were selected as a genomic-cDNA-NAT pair.</p>
			</sec>
			<sec>
				<st>
					<p>Splicing pattern and coding potential evaluation of full-length cDNAs and annotated genes</p>
				</st>
				<p>Splicing patterns of transcripts encoded by full-length cDNAs were obtained by aligning the cDNA sequences to the <it>Arabidopsis </it>genome using the sim4 program <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Splicing patterns of transcripts derived from <it>Arabidopsis </it>annotated genes were extracted from the TIGR <it>Arabidopsis </it>genome annotation (release version 5) <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. To evaluate the coding potential of full-length cDNAs, their corresponding genomic sequences (determined by BLAT and sim4 result) were extracted and screened by GeneScan <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Identification of MPSS evidence for NAT pairs</p>
				</st>
				<p>We used the public <it>Arabidopsis </it>MPSS data at the University of Delaware <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> to evaluate the expression of NAT pairs. MPSS sequences from 14 different libraries of <it>Arabidopsis </it>Columbia-0 (Col-0) ecotype were downloaded from <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Each MPSS library contained signature sequences identified from the same tissue. The quality of these MPSS sequences was evaluated according to the information provided by the database. Only MPSS sequences with 'reliable' (present in more than one sequencing run) and 'significant' (TPM &#8805; 4) expression pattern were considered as 'trusted' signatures and used in this analysis.</p>
				<p>The public MPSS database contained 87,705 trusted signatures that satisfied the above expression criteria. These signatures were aligned to the sequences of the 1,340 putative NAT pairs to identify MPSS sequences derived from them. Signatures with multiple perfect matches to the <it>Arabidopsis </it>genome or to cDNAs had ambiguous origins and were not considered further. For a NAT pair, if both the sense and antisense transcripts had associated MPSS data and their expression values were both significant in one or more libraries, transcripts in this NAT pair were considered as coexpressed in the same tissue. On the other hand, if both transcripts had MPSS data but had no significant coexpression in any of the examined libraries, then the transcripts were considered as expressed, but in different libraries.</p>
			</sec>
			<sec>
				<st>
					<p>Homology comparison with reported rice NATs</p>
				</st>
				<p>Full-length cDNA sequences of the 687 rice NAT pairs were downloaded from the website described in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. To facilitate protein sequence comparison, the rice and <it>Arabidopsis </it>cDNAs were mapped to their corresponding genomes by BLAT <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Both the <it>A. thaliana </it>and <it>O. sativa </it>genomes were downloaded from TIGR <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The corresponding genomic sequences of each cDNA were extracted according to their genomic coordinates from the BLAT results. Protein sequences were obtained by evaluating the genomic sequences of those cDNAs using GENSCAN <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The protein sequences of rice NATs were aligned with those of <it>Arabidopsis </it>NATs using blastp <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. High similarity pairs with <it>E</it>-value less than 10<sup>-30 </sup>and alignment coverage greater than 50% of query sequence were considered as homologous sequences.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a table listing all genomic-NAT pairs. Additional data file <supplr sid="S2">2</supplr> is a table listing all cDNA-NATs. Additional data file <supplr sid="S3">3</supplr> is a table listing all genomic-cDNA-NATs.</p>
			<suppl id="S1">
				<title>
					<p>Additional File 1</p>
				</title>
				<caption>
					<p>A table listing genomic-NAT pairs '+' strand refers to the forward strand of a chromosome; '- ' strand refers to the reverse strand of a chromosome</p>
				</caption>
				<text>
					<p>Classes of overlap patterns: 1. tail to tail (3' end overlap); 2. head to head (5' end overlap); 3. one transcript is contained entirely within the other transcript; 4.two transcripts overlap only within introns. Coding potential of a transcript: '+' with coding potential; '- ' without coding potential</p>
				</text>
				<file name="gb-2005-6-4-r30-S1.xls">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S2">
				<title>
					<p>Additional File 2</p>
				</title>
				<caption>
					<p>A table listing cDNA-NAT pairs '+' strand refers to the forward strand of a chromosome; '- ' strand refers to the reverse strand of a chromosome</p>
				</caption>
				<text>
					<p>Classes of overlap patterns: 1. tail to tail (3' end overlap); 2. head to head (5' end overlap); 3. one transcript is contained entirely within the other transcript; 4.two transcripts overlap only within introns. Coding potential of a transcript: '+' with coding potential; '- ' without coding potential</p>
				</text>
				<file name="gb-2005-6-4-r30-S2.xls">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S3">
				<title>
					<p>Additional File 3</p>
				</title>
				<caption>
					<p>A table listing genomic-cDNA-NAT pairs '+' strand refers to the forward strand of a chromosome; '- ' strand refers to the reverse strand of a chromosome</p>
				</caption>
				<text>
					<p>Classes of overlap patterns: 1. tail to tail (3' end overlap); 2. head to head (5' end overlap); 3. one transcript is contained entirely within the other transcript; 4.two transcripts overlap only within introns. Coding potential of a transcript: '+' with coding potential; '- ' without coding potential</p>
				</text>
				<file name="gb-2005-6-4-r30-S3.xls">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Takatoshi Kiba and Siripong Thitamadee for fruitful discussions and Peter Hare and Yupu Liang for carefully reading the manuscript. This research was supported by NIH GM44640 to N-H.C. and DBI-9984882 to T.G.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Noncoding RNA transcripts.</p>
				</title>
				<aug>
					<au>
						<snm>Szymanski</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Barciszewska</snm>
						<fnm>MZ</fnm>
					</au>
					<au>
						<snm>Zywicki</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Barciszewski</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>J Appl Genet</source>
				<pubdate>2003</pubdate>
				<volume>44</volume>
				<fpage>1</fpage>
				<lpage>19</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12590177</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Non-coding RNA genes and the modern RNA world.</p>
				</title>
				<aug>
					<au>
						<snm>Eddy</snm>
						<fnm>SR</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>919</fpage>
				<lpage>929</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35103511</pubid>
						<pubid idtype="pmpid" link="fulltext">11733745</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>In search of antisense.</p>
				</title>
				<aug>
					<au>
						<snm>Lavorgna</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Dahary</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Lehner</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Sorek</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Sanderson</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Casari</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Trends Biochem Sci</source>
				<pubdate>2004</pubdate>
				<volume>29</volume>
				<fpage>88</fpage>
				<lpage>94</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.tibs.2003.12.002</pubid>
						<pubid idtype="pmpid" link="fulltext">15102435</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Antisense RNA: function and fate of duplex RNA in cells of higher eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Kumar</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Carmichael</snm>
						<fnm>GG</fnm>
					</au>
				</aug>
				<source>Microbiol Mol Biol Rev</source>
				<pubdate>1998</pubdate>
				<volume>62</volume>
				<fpage>1415</fpage>
				<lpage>1434</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">98951</pubid>
						<pubid idtype="pmpid" link="fulltext">9841677</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Do natural antisense transcripts make sense in eukaryotes?</p>
				</title>
				<aug>
					<au>
						<snm>Vanhee-Brossollet</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Vaquero</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>1998</pubdate>
				<volume>211</volume>
				<fpage>1</fpage>
				<lpage>9</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(98)00093-6</pubid>
						<pubid idtype="pmpid" link="fulltext">9573333</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>MicroRNAs: genomics, biogenesis, mechanism, and function.</p>
				</title>
				<aug>
					<au>
						<snm>Bartel</snm>
						<fnm>DP</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>2004</pubdate>
				<volume>116</volume>
				<fpage>281</fpage>
				<lpage>297</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0092-8674(04)00045-5</pubid>
						<pubid idtype="pmpid" link="fulltext">14744438</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Role of sequences within the first intron in the regulation of expression of eukaryotic initiation factor 2 alpha.</p>
				</title>
				<aug>
					<au>
						<snm>Silverman</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Noguchi</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Safer</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1992</pubdate>
				<volume>267</volume>
				<fpage>9738</fpage>
				<lpage>9742</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">1374407</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Naturally occurring antisense transcripts are present in chick embryo chondrocytes simultaneously with the down-regulation of the alpha 1 (I) collagen gene.</p>
				</title>
				<aug>
					<au>
						<snm>Farrell</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Lukens</snm>
						<fnm>LN</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1995</pubdate>
				<volume>270</volume>
				<fpage>3400</fpage>
				<lpage>3408</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.270.7.3400</pubid>
						<pubid idtype="pmpid" link="fulltext">7852426</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Specific interference with gene expression induced by long, double-stranded RNA in mouse embryonal teratocarcinoma cell lines.</p>
				</title>
				<aug>
					<au>
						<snm>Billy</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Brondani</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Muller</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Filipowicz</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>14428</fpage>
				<lpage>14433</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">64698</pubid>
						<pubid idtype="pmpid" link="fulltext">11724966</pubid>
						<pubid idtype="doi">10.1073/pnas.261562698</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Inhibition of c-erbA mRNA splicing by a naturally occurring antisense RNA.</p>
				</title>
				<aug>
					<au>
						<snm>Munroe</snm>
						<fnm>SH</fnm>
					</au>
					<au>
						<snm>Lazar</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1991</pubdate>
				<volume>266</volume>
				<fpage>22083</fpage>
				<lpage>22086</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">1657988</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Characterization of multiple alternative RNAs resulting from antisense transcription of the PR264/SC35 splicing factor gene.</p>
				</title>
				<aug>
					<au>
						<snm>Sureau</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Soret</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Guyon</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Gaillard</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Dumon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Keller</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Crisanti</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Perbal</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>4513</fpage>
				<lpage>4522</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">147067</pubid>
						<pubid idtype="pmpid" link="fulltext">9358160</pubid>
						<pubid idtype="doi">10.1093/nar/25.22.4513</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>RNA editing and regulation of Drosophila 4f-rnp expression by sas-10 antisense readthrough mRNA transcripts.</p>
				</title>
				<aug>
					<au>
						<snm>Peters</snm>
						<fnm>NT</fnm>
					</au>
					<au>
						<snm>Rohrbach</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Zalewski</snm>
						<fnm>BA</fnm>
					</au>
					<au>
						<snm>Byrkett</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Vaughn</snm>
						<fnm>JC</fnm>
					</au>
				</aug>
				<source>RNA</source>
				<pubdate>2003</pubdate>
				<volume>9</volume>
				<fpage>698</fpage>
				<lpage>710</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1261/rna.2120703</pubid>
						<pubid idtype="pmpid" link="fulltext">12756328</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Widespread RNA editing of embedded alu elements in the human transcriptome.</p>
				</title>
				<aug>
					<au>
						<snm>Kim</snm>
						<fnm>DD</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>TT</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Kobayashi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Matise</snm>
						<fnm>TC</fnm>
					</au>
					<au>
						<snm>Buyske</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gabriel</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>1719</fpage>
				<lpage>1725</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">515317</pubid>
						<pubid idtype="pmpid" link="fulltext">15342557</pubid>
						<pubid idtype="doi">10.1101/gr.2855504</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease.</p>
				</title>
				<aug>
					<au>
						<snm>Tufarelli</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Stanley</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Garrick</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Sharpe</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Ayyub</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Wood</snm>
						<fnm>WG</fnm>
					</au>
					<au>
						<snm>Higgs</snm>
						<fnm>DR</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2003</pubdate>
				<volume>34</volume>
				<fpage>157</fpage>
				<lpage>165</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng1157</pubid>
						<pubid idtype="pmpid" link="fulltext">12730694</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Imprinting on distal chromosome 7 in the placenta involves repressive histone methylation independent of DNA methylation.</p>
				</title>
				<aug>
					<au>
						<snm>Lewis</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mitsuya</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Umlauf</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Dean</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Walter</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Feil</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Reik</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2004</pubdate>
				<volume>36</volume>
				<fpage>1291</fpage>
				<lpage>1295</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng1468</pubid>
						<pubid idtype="pmpid" link="fulltext">15516931</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Multiple imprinted sense and antisense transcripts, differential methylation and tandem repeats in a putative imprinting control region upstream of mouse Igf2.</p>
				</title>
				<aug>
					<au>
						<snm>Moore</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Constancia</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Zubair</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bailleul</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Feil</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Sasaki</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Reik</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1997</pubdate>
				<volume>94</volume>
				<fpage>12509</fpage>
				<lpage>12514</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">25020</pubid>
						<pubid idtype="pmpid" link="fulltext">9356480</pubid>
						<pubid idtype="doi">10.1073/pnas.94.23.12509</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>The non-coding Air RNA is required for silencing autosomal imprinted genes.</p>
				</title>
				<aug>
					<au>
						<snm>Sleutels</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Zwart</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Barlow</snm>
						<fnm>DP</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>415</volume>
				<fpage>810</fpage>
				<lpage>813</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11845212</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Neurons but not glial cells show reciprocal imprinting of sense and antisense transcripts of Ube3a.</p>
				</title>
				<aug>
					<au>
						<snm>Yamasaki</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Joh</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Ohta</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Masuzaki</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Ishimaru</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Mukai</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Niikawa</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Ogawa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Wagstaff</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kishino</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Hum Mol Genet</source>
				<pubdate>2003</pubdate>
				<volume>12</volume>
				<fpage>837</fpage>
				<lpage>847</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/hmg/ddg106</pubid>
						<pubid idtype="pmpid" link="fulltext">12668607</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>An antisense RNA regulates the bidirectional silencing property of the Kcnq1 imprinting control region.</p>
				</title>
				<aug>
					<au>
						<snm>Thakur</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Tiwari</snm>
						<fnm>VK</fnm>
					</au>
					<au>
						<snm>Thomassin</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Pandey</snm>
						<fnm>RR</fnm>
					</au>
					<au>
						<snm>Kanduri</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gondor</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Grange</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ohlsson</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Kanduri</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>2004</pubdate>
				<volume>24</volume>
				<fpage>7855</fpage>
				<lpage>7862</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">515059</pubid>
						<pubid idtype="pmpid" link="fulltext">15340049</pubid>
						<pubid idtype="doi">10.1128/MCB.24.18.7855-7862.2004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The mouse Murr1 gene is imprinted in the adult brain, presumably due to transcriptional interference by the antisense-oriented U2af1-rs1 gene.</p>
				</title>
				<aug>
					<au>
						<snm>Wang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Joh</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Masuko</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Yatsuki</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Soejima</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Nabetani</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Beechey</snm>
						<fnm>CV</fnm>
					</au>
					<au>
						<snm>Okinami</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Mukai</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>2004</pubdate>
				<volume>24</volume>
				<fpage>270</fpage>
				<lpage>279</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">303337</pubid>
						<pubid idtype="pmpid" link="fulltext">14673161</pubid>
						<pubid idtype="doi">10.1128/MCB.24.1.270-279.2004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Tsix, a gene antisense to Xist at the X-inactivation centre.</p>
				</title>
				<aug>
					<au>
						<snm>Lee</snm>
						<fnm>JT</fnm>
					</au>
					<au>
						<snm>Davidow</snm>
						<fnm>LS</fnm>
					</au>
					<au>
						<snm>Warshawsky</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>1999</pubdate>
				<volume>21</volume>
				<fpage>400</fpage>
				<lpage>404</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/7734</pubid>
						<pubid idtype="pmpid" link="fulltext">10192391</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Circadian clocks and natural antisense RNA.</p>
				</title>
				<aug>
					<au>
						<snm>Crosthwaite</snm>
						<fnm>SK</fnm>
					</au>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>2004</pubdate>
				<volume>567</volume>
				<fpage>49</fpage>
				<lpage>54</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.febslet.2004.04.073</pubid>
						<pubid idtype="pmpid" link="fulltext">15165892</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Antisense transcripts in the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Lehner</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Campbell</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Sanderson</snm>
						<fnm>CM</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>63</fpage>
				<lpage>65</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(02)02598-2</pubid>
						<pubid idtype="pmpid" link="fulltext">11818131</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Computational discovery of sense-antisense transcription in the human and mouse genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Shendure</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>GM</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>research0044.1</fpage>
				<lpage>research0044.14</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1186/gb-2002-3-9-research0044</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Widespread occurrence of antisense transcription in the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Yelin</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Dahary</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Sorek</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Levanon</snm>
						<fnm>EY</fnm>
					</au>
					<au>
						<snm>Goldstein</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Shoshan</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Diber</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Biton</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Tamir</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Khosravi</snm>
						<fnm>R</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2003</pubdate>
				<volume>21</volume>
				<fpage>379</fpage>
				<lpage>386</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt808</pubid>
						<pubid idtype="pmpid" link="fulltext">12640466</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Antisense transcripts with FANTOM2 clone set and their implications for gene regulation.</p>
				</title>
				<aug>
					<au>
						<snm>Kiyosawa</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Yamanaka</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Osato</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Kondo</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Hayashizaki</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1324</fpage>
				<lpage>1334</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403655</pubid>
						<pubid idtype="pmpid" link="fulltext">12819130</pubid>
						<pubid idtype="doi">10.1101/gr.982903</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Antisense transcripts with rice full-length cDNAs.</p>
				</title>
				<aug>
					<au>
						<snm>Osato</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Yamada</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Satoh</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Ooka</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Yamamoto</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Suzuki</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kawai</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Carninci</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Ohtomo</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Murakami</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2003</pubdate>
				<volume>5</volume>
				<fpage>R5</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/gb-2003-5-1-r5</pubid>
						<pubid idtype="pmpid" link="fulltext">14709177</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Empirical analysis of transcriptional activity in the <it>Arabidopsis </it>genome.</p>
				</title>
				<aug>
					<au>
						<snm>Yamada</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Lim</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Dale</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Shinn</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Palm</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Southwick</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Wu</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Nguyen</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2003</pubdate>
				<volume>302</volume>
				<fpage>842</fpage>
				<lpage>846</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1088305</pubid>
						<pubid idtype="pmpid" link="fulltext">14593172</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>A computer program for aligning a cDNA sequence with a genomic DNA sequence.</p>
				</title>
				<aug>
					<au>
						<snm>Florea</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Hartzell</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>1998</pubdate>
				<volume>8</volume>
				<fpage>967</fpage>
				<lpage>974</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">310774</pubid>
						<pubid idtype="pmpid" link="fulltext">9750195</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Analysis of the genome sequence of the flowering plant <it>Arabidopsis thaliana</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Initiative</snm>
						<fnm>AG</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>408</volume>
				<fpage>796</fpage>
				<lpage>815</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35048692</pubid>
						<pubid idtype="pmpid" link="fulltext">11130711</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>The Public <it>Arabidopsis </it>MPSS database</p>
				</title>
				<url>http://mpss.udel.edu</url>
			</bibl>
			<bibl id="B32">
				<title>
					<p>In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs.</p>
				</title>
				<aug>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Vermaas</snm>
						<fnm>EH</fnm>
					</au>
					<au>
						<snm>Storck</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Moon</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>McCollum</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Mao</snm>
						<fnm>JI</fnm>
					</au>
					<au>
						<snm>Luo</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kirchner</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Eletr</snm>
						<fnm>S</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>1665</fpage>
				<lpage>1670</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">26493</pubid>
						<pubid idtype="pmpid" link="fulltext">10677516</pubid>
						<pubid idtype="doi">10.1073/pnas.97.4.1665</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.</p>
				</title>
				<aug>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Johnson</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bridgham</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Golda</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Lloyd</snm>
						<fnm>DH</fnm>
					</au>
					<au>
						<snm>Johnson</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Luo</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>McCurdy</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Foy</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ewan</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2000</pubdate>
				<volume>18</volume>
				<fpage>630</fpage>
				<lpage>634</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/76469</pubid>
						<pubid idtype="pmpid" link="fulltext">10835600</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Antisense RNA control in bacteria, phages, and plasmids.</p>
				</title>
				<aug>
					<au>
						<snm>Wagner</snm>
						<fnm>EG</fnm>
					</au>
					<au>
						<snm>Simons</snm>
						<fnm>RW</fnm>
					</au>
				</aug>
				<source>Annu Rev Microbiol</source>
				<pubdate>1994</pubdate>
				<volume>48</volume>
				<fpage>713</fpage>
				<lpage>742</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.mi.48.100194.003433</pubid>
						<pubid idtype="pmpid" link="fulltext">7826024</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Maternal control of embryogenesis by MEDEA, a polycomb group gene in <it>Arabidopsis</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Grossniklaus</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Vielle-Calzada</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Hoeppner</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Gagliano</snm>
						<fnm>WB</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1998</pubdate>
				<volume>280</volume>
				<fpage>446</fpage>
				<lpage>450</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.280.5362.446</pubid>
						<pubid idtype="pmpid" link="fulltext">9545225</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Mutations in <it>FIE</it>, a WD polycomb group gene, allow endosperm development without fertilization.</p>
				</title>
				<aug>
					<au>
						<snm>Ohad</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Yadegari</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Margossian</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Hannon</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Michaeli</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Harada</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Goldberg</snm>
						<fnm>RB</fnm>
					</au>
					<au>
						<snm>Fischer</snm>
						<fnm>RL</fnm>
					</au>
				</aug>
				<source>Plant Cell</source>
				<pubdate>1999</pubdate>
				<volume>11</volume>
				<fpage>407</fpage>
				<lpage>416</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">144179</pubid>
						<pubid idtype="pmpid" link="fulltext">10072400</pubid>
						<pubid idtype="doi">10.1105/tpc.11.3.407</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Genes controlling fertilization-independent seed development in <it>Arabidopsis thaliana</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Luo</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bilodeau</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Koltunow</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Dennis</snm>
						<fnm>ES</fnm>
					</au>
					<au>
						<snm>Peacock</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Chaudhury</snm>
						<fnm>AM</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>296</fpage>
				<lpage>301</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">15133</pubid>
						<pubid idtype="pmpid" link="fulltext">9874812</pubid>
						<pubid idtype="doi">10.1073/pnas.96.1.296</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p><it>Arabidopsis </it>MSI1 is a component of the MEA/FIE Polycomb group complex and required for seed development.</p>
				</title>
				<aug>
					<au>
						<snm>Kohler</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Hennig</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Bouveret</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gheyselinck</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Grossniklaus</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Gruissem</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>EMBO J</source>
				<pubdate>2003</pubdate>
				<volume>22</volume>
				<fpage>4804</fpage>
				<lpage>4814</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">212713</pubid>
						<pubid idtype="pmpid" link="fulltext">12970192</pubid>
						<pubid idtype="doi">10.1093/emboj/cdg444</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>One-way control of FWA imprinting in <it>Arabidopsis </it>endosperm by DNA methylation.</p>
				</title>
				<aug>
					<au>
						<snm>Kinoshita</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Miura</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Choi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kinoshita</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Cao</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Jacobsen</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Fischer</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Kakutani</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2004</pubdate>
				<volume>303</volume>
				<fpage>521</fpage>
				<lpage>523</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1089835</pubid>
						<pubid idtype="pmpid" link="fulltext">14631047</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>UniGene: a unified view of the transcriptome.</p>
				</title>
				<aug>
					<au>
						<snm>Pontius</snm>
						<fnm>JU</fnm>
					</au>
					<au>
						<snm>Wagner</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Schuler</snm>
						<fnm>GD</fnm>
					</au>
				</aug>
				<source>The NCBI Handbook</source>
				<publisher>Bethesda, MD: National Center for Biotechnology Information</publisher>
				<pubdate>2003</pubdate>
				<url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=books</url>
			</bibl>
			<bibl id="B41">
				<title>
					<p><it>Arabidopsis </it>UniGene dataset.</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/repository/UniGene</url>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Functional annotation of a full-length <it>Arabidopsis </it>cDNA collection.</p>
				</title>
				<aug>
					<au>
						<snm>Seki</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Narusaka</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kamiya</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Ishida</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Satou</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sakurai</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Nakajima</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Enju</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Akiyama</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Oono</snm>
						<fnm>Y</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>296</volume>
				<fpage>141</fpage>
				<lpage>145</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1071006</pubid>
						<pubid idtype="pmpid" link="fulltext">11910074</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p>RIKEN <it>Arabidopsis </it>full-length cDNA dataset</p>
				</title>
				<url>http://pfgweb.gsc.riken.go.jp/projects/raflcdna.html</url>
			</bibl>
			<bibl id="B44">
				<title>
					<p>The <it>Arabidopsis thaliana </it>genome sequences</p>
				</title>
				<url>ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/</url>
			</bibl>
			<bibl id="B45">
				<title>
					<p>BLAT - the BLAST-like alignment tool.</p>
				</title>
				<aug>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>656</fpage>
				<lpage>664</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">187518</pubid>
						<pubid idtype="pmpid" link="fulltext">11932250</pubid>
						<pubid idtype="doi">10.1101/gr.229202. Article published online before March 2002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>Prediction of complete gene structures in human genomic DNA.</p>
				</title>
				<aug>
					<au>
						<snm>Burge</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Karlin</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1997</pubdate>
				<volume>268</volume>
				<fpage>78</fpage>
				<lpage>94</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1997.0951</pubid>
						<pubid idtype="pmpid" link="fulltext">9149143</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation.</p>
				</title>
				<aug>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>1997</pubdate>
				<volume>7</volume>
				<fpage>649</fpage>
				<lpage>656</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">310664</pubid>
						<pubid idtype="pmpid" link="fulltext">9199938</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
