<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2148-7-S1-S6</ui>
	<ji>1471-2148</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Rapid divergence of codon usage patterns within the rice genome</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Wang</snm>
					<fnm>Huai-Chun</fnm>
					<insr iid="I1"/>
					<email>hcwang@mathstat.dal.ca</email>
				</au>
				<au id="A2" ca="yes">
					<snm>Hickey</snm>
					<mi>A</mi>
					<fnm>Donal</fnm>
					<insr iid="I2"/>
					<email>dhickey@alcor.concordia.ca</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 2G1, Canada</p>
				</ins>
				<ins id="I2">
					<p>Department of Biology, Concordia University, 7141 Sherbrooke West, Montr&#233;al, Qu&#233;bec, H4B 1R6, Canada</p>
				</ins>
			</insg>
			<source>BMC Evolutionary Biology</source>
			<supplement>
				<title>
					<p>First International Conference on Phylogenomics</p>
				</title>
				<editor>Herv&#233; Philippe, Mathieu Blanchette</editor>
				<note>Proceedings</note>
			</supplement>
			<conference>
				<title>
					<p>First International Conference on Phylogenomics</p>
				</title>
				<location>Sainte-Ad&#232;le, Qu&#233;bec, Canada</location>
				<date-range>15&#8211;19 March, 2006</date-range>
				<url>http://www.bioinfo.umontreal.ca/evenements/phylogenomics.html</url>
			</conference>
			<issn>1471-2148</issn>
			<pubdate>2007</pubdate>
			<volume>7</volume>
			<issue>Suppl 1</issue>
			<fpage>S6</fpage>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17288579</pubid><pubid idtype="doi">10.1186/1471-2148-7-S1-S6</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>8</day>
					<month>2</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>Wang and Hickey; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Synonymous codon usage varies widely between genomes, and also between genes within genomes. Although there is now a large body of data on variations in codon usage, it is still not clear if the observed patterns reflect the effects of positive Darwinian selection acting at the level of translational efficiency or whether these patterns are due simply to the effects of mutational bias. In this study, we have included both intra-genomic and inter-genomic comparisons of codon usage. This allows us to distinguish more efficiently between the effects of nucleotide bias and translational selection.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We show that there is an extreme degree of heterogeneity in codon usage patterns within the rice genome, and that this heterogeneity is highly correlated with differences in nucleotide content (particularly GC content) between the genes. In contrast to the situation observed within the rice genome, <it>Arabidopsis </it>genes show relatively little variation in both codon usage and nucleotide content. By exploiting a combination of intra-genomic and inter-genomic comparisons, we provide evidence that the differences in codon usage among the rice genes reflect a relatively rapid evolutionary increase in the GC content of some rice genes. We also noted that the degree of codon bias was negatively correlated with gene length.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>Our results show that mutational bias can cause a dramatic evolutionary divergence in codon usage patterns within a period of approximately two hundred million years.</p>
					<p>The heterogeneity of codon usage patterns within the rice genome can be explained by a balance between genome-wide mutational biases and negative selection against these biased mutations. The strength of the negative selection is proportional to the length of the coding sequences. Our results indicate that the large variations in synonymous codon usage are not related to selection acting on the translational efficiency of synonymous codons.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Synonymous codon usage patterns can vary significantly among genomes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. In addition, one can also observe differences in synonymous codon usage among different genes within a single genome (e.g., <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>). For prokaryotes and unicellular eukaryotes such as yeast, the variation in codon usage within a genome is thought to be due to natural selection acting to optimize protein production <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Specifically, the most highly expressed genes use codons that are complementary to the most abundant tRNA anticodons (e.g., <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>). For multicellular eukaryotes, such as <it>Drosophila melanogaster </it>and <it>Caenorhabditis elegans</it>, there is also some evidence that codon bias might be caused by selection for translational efficiency <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. For the majority of multicellular organisms, however, it has been difficult to explain codon usage variation within a genome in terms of natural selection. Instead, the codon usage in mammalian genes appears to be correlated with the GC content of the chromosomal region that contains the genes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. This correlation has generally been interpreted as meaning that the codon usage of mammalian genes reflects mutational bias, but a recent report <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> suggests that high GC content increases mRNA levels in mammalian cells. This would mean that selection for gene high expression is the primary factor determining the codon usage bias in this case. Thus, although the correlation between codon usage and nucleotide bias is well documented, the question of whether the nucleotide bias is a cause or a consequence of the biased codon usage remains a matter of debate.</p>
			<p>In this study, we examined the patterns of synonymous codon usage that are seen in the genomes of angiosperm plants. It is already known that monocot plant genomes have a higher average GC content than dicot genomes, and that this difference is reflected in an average difference in codon usage between monocots and dicots <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. Here, we focused on the heterogeneity in synonymous codon usage within the rice genome. In particular, we looked for intra-genomic correlations between codon usage and nucleotide bias, and we compared the results found for the rice genes with the results for their homologs in the <it>Arabidopsis </it>genome. All of the previous studies of codon usage have focused on either: (i) the comparison of genes within a single genome (typically, a comparison of highly expressed genes and lowly expressed genes); or (ii) differences between genomes, such as differences in codon usage between prokaryotes and eukaryotes, or between thermophiles and mesophiles. Here, we have combined a study of contrasting patterns of codon usage within a genome (rice) with a comparison of homologous gene sequences between two genomes (rice and <it>Arabidopsis</it>). This "factorial" design allows for a number of unique controls in the interpretation of the data.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Nucleotide content of rice and <it>Arabidopsis </it>genes</p>
				</st>
				<p>The nucleotide content of rice and <it>Arabidopsis </it>coding sequences (expressed as percent GC) is summarized in Figure <figr fid="F1">1</figr>. The Figure shows that there is a distinctly bimodal distribution of GC content among the 14,005 rice genes, which is consistent with previous reports <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr><abbr bid="B23">23</abbr></abbrgrp>. In contrast to this, the <it>Arabidopsis </it>genes are characterized by a unimodal distribution with a relatively low average value for GC content. In the Figure, the vertical line at 60% GC indicates the point at which we separated the rice genes into two classes: High GC genes and Low GC genes. The average GC content of these two classes, along with the average for all <it>Arabidopsis </it>genes is shown in Table <tblr tid="T1">1</tblr>. From the Table, we can see that the GC content of the <it>Arabidopsis </it>genes (44.5%) is comparable to that of the Low GC rice genes (50.1%).</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>The distribution of GC contents in rice and <it>Arabidopsis </it>genes</p>
					</caption>
					<text>
						<p><b>The distribution of GC contents in rice and <it>Arabidopsis </it>genes</b>. The GC content of the 14,005 rice genes (shown in red) has a bimodal distribution, while the GC distribution of the 25,625 <it>Arabidopsis </it>genes (shown in blue) is unimodal. The vertical line (at 60% G+C) shows the point where we separated the rice genes into two classes: high GC and low GC rice genes.</p>
					</text>
					<graphic file="1471-2148-7-S1-S6-1"/>
				</fig>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Average GC content of rice and <it>Arabidopsis </it>genes.</p>
					</caption>
					<tblbdy cols="3">
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>All three codon positions</p>
							</c>
							<c ca="center">
								<p>Third codon positions only</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="1">
								<hr/>
							</c>
							<c cspan="1">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>High GC rice genes (n = 6291)</p>
							</c>
							<c ca="center">
								<p>67.4 &#177; 0.05</p>
							</c>
							<c ca="center">
								<p>80.4 &#177; 0.14</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Low GC rice genes (n = 7714)</p>
							</c>
							<c ca="center">
								<p>50.1 &#177; 0.06</p>
							</c>
							<c ca="center">
								<p>52.7 &#177; 0.12</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>Arabidopsis </it>genes (n = 25625)</p>
							</c>
							<c ca="center">
								<p>44.5 &#177; 0.02</p>
							</c>
							<c ca="center">
								<p>42.8 &#177; 0.04</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>The values shown are percentages of G+C. Standard errors are included.</p>
						<p>High GC rice genes are defined as those that have a G+C content equal or greater than 60%. Low GC rice genes have a G+C content less than 60%.</p>
					</tblfn>
				</tbl>
				<p>Table <tblr tid="T1">1</tblr> also presents the data for the third positions of codons only. In this case, we see the same trends as for all of the codon positions, but the differences are much greater. For instance, the GC content of the third codon positions of the High GC rice genes (80.4%) is almost twice the values for the <it>Arabidopsis </it>genes (42.8%). Given that variations in codon usage will affect the third codon position primarily, this result leads us to expect significant differences in codon usage between the two classes of rice genes. We investigated this using Correspondence Analysis (see below).</p>
				<p>We also wished to investigate the possible clustering of GC-rich genes within the rice genome. To do this, we took a sample of two rice chromosomes and plotted the GC content at the third codon positions (GC3) against the position of the genes along the chromosome. For comparison, we did the same analysis for the GC3 content of <it>Arabidopsis</it> genes along the chromosome. The results (see <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>) show that genes with varying levels of nucleotide composition are interspersed along the chromosome.</p>
			</sec>
			<sec>
				<st>
					<p>Correspondence analysis</p>
				</st>
				<p>Correspondence analysis <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> was used to explore the variation in Relative Synonymous Codon Usage (RSCU). Since there are a total of 59 synonymous codons (61 sense codons, less the unique methionine and tryptophan codons), this analysis partitions the variation along 59 orthogonal axes, with 41 degrees of freedom. The first axis is the one that captures most of the variation in codon usage, with each subsequent axis explaining a diminishing amount of the variance. In contrast to other types of variance component analysis, such as Principal Component Analysis (PCA), correspondence analysis has the advantage of allowing one to not only show the distribution of genes in the multidimensional space, but also to show the corresponding distribution of synonymous codons (as shown in Figures <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr>). Correspondence Analysis is primarily designed for use with data tables containing counts, e.g., numbers of synonymous codons, whereas PCA is a general method of data reduction that is more suitable for continuous measurement data. Perriere and Thioulouse <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> have provided a critical review of the use of Correspondence Analysis for studies of codon usage.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Correspondence Analysis of relative synonymous codon usage (RSCU) for all 14,005 rice genes</p>
					</caption>
					<text>
						<p><b>Correspondence Analysis of relative synonymous codon usage (RSCU) for all 14,005 rice genes</b>. Panel A. This panel shows the distribution of genes on the primary and secondary axes (accounting for 36.9% and 6.9% of the total variation, respectively). The two classes of genes (High GC and Low GC) are color coded; the high GC genes are shown in red and the low GC genes are shown in blue. Panel B. This panel shows the underlying distribution of codons on the same two axes as shown in Panel A. Codons ending with G or C are shown in red, and codons ending with A or U are shown in blue.</p>
					</text>
					<graphic file="1471-2148-7-S1-S6-2"/>
				</fig>
				<p>Figure <figr fid="F2">2</figr> shows a correspondence analysis of the synonymous codon usage (RSCU) among the rice genes. The origin in Figure <figr fid="F2">2A</figr> represents the average RSCU for all genes, with respect to the first two axes. The distance between genes on this plot is a reflection of their dissimilarity in RSCU, with respect to the two axes. In this case, the two axes account for 36.9% and 6.9% of the variation in the data, respectively. The third axis accounts for approximately 3% of the variation and the remaining axes for even smaller amounts of the variance each. Thus the first axis reflects the primary factor that explains the differences in codon usage among the rice genes. From Figure <figr fid="F2">2A</figr>, we can see that the rice High GC genes (colored red in the Figure) and Low GC genes (colored blue) separate along this primary axis. The corresponding distribution of synonymous codons (see Figure <figr fid="F2">2B</figr>) shows the separation of C/G-ending codons and A/U-ending codons along this same axis. This indicates that the variations in synonymous codon usage among the rice genes are based on the nucleotide content of the genes. The separation of genes on the second axis appears to be largely due to frequency differences in C-ending and G-ending codons among the GC rich genes (see right side of Fig. <figr fid="F2">2B</figr>).</p>
				<p>Although the color coding in Figure <figr fid="F2">2A</figr> suggests a general relationship between the nucleotide content of genes and their position on the first axis of the correspondence analysis, it does not give us any statistical measure of this relationship. To do this, we calculated the correlation between the GC content of individual rice genes and their location on the primary axis of the Correspondence Analysis. The results were highly significant (R = 0.96, p &lt; 0.00001), indicating that the variations in codon usage are strongly correlated with the nucleotide content (i.e., GC content) of the genes.</p>
			</sec>
			<sec>
				<st>
					<p>Effective number of codons</p>
				</st>
				<p>We further investigated the relationship between nucleotide content and codon usage by calculating the effective number of codons for each of the rice genes. The effective number of codons <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> is a measure of the evenness of codon usage among the 61 sense codons. At one extreme is all codons are used equally frequently (given the observed frequencies of amino acids) the effective number of codons is 61. If, at the other extreme, a single codon only is used for each amino acid, then the effective number of codons is reduced to 20. In most cases, the observed number falls somewhere between these extremes. Figure <figr fid="F3">3</figr> shows the relationship between the effective number of codons (Nc) and the GC content at the third position of each gene (GC3). This Figure also contains a reference line (GCref) showing the expected position of genes whose codon usage is constrained solely by the nucleotide composition at the third codon position <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. From the Figure, it can be seen that the observed value of Nc tracks the reference line quite closely. This indicates that the nucleotide composition at the third codon position is a major determinant of the effective number of codons. A polynomial line, to the power of 2, that regresses Nc on GC3s (not shown in the figure) fits the data very well (R<sup>2 </sup>= 0.82, p &lt; 0.00001). Essentially, the effective number of codons decreases as the GC content increases.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>The effective number of codons (Nc) plotted for 14005 rice genes</p>
					</caption>
					<text>
						<p><b>The effective number of codons (Nc) plotted for 14005 rice genes</b>. The ribosomal protein genes highlighted in red. The GC(ref) line &#8211; shown in green &#8211; is the expected position of genes whose codon usage is only determined by the GC content at the third positions of codons (GC3s).</p>
					</text>
					<graphic file="1471-2148-7-S1-S6-3"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Homologous gene pairs</p>
				</st>
				<p>Although the preceding results clearly establish a strong correlation between nucleotide content and codon usage within the rice genome, they do not tell us which of the two is the causal factor. In an effort to understand the biological basis of these differences in codon usage and nucleotide content within the rice genome, we compared these rice genes with their homologs in <it>Arabidopsis</it>. We used a BLAST search to identify 7,160 pairs of homologous genes in rice and <it>Arabidopsis </it>(see Methods).</p>
				<p>We first calculated the GC content of the homologous gene pairs. Among these gene pairs, we found that the GC content of the <it>Arabidopsis </it>homologs remains unimodal, as is shown for the full set <it>Arabidopsis </it>genes in Figure <figr fid="F1">1</figr>, whereas the content of the rice homologs remains bimodal, again as shown in Figure <figr fid="F1">1</figr> for the complete rice data set. Thus, the overall differences between the genomes that are seen in Figure <figr fid="F1">1</figr> cannot be due to differences in gene content between the two species because they are still present in the homologous gene set.</p>
				<p>We then computed relative synonymous codon usage values for the homologous genes and performed a new correspondence analysis. The results are shown in Figure <figr fid="F4">4</figr>. Here we see that the High GC and Low GC rice genes (now defined within the homologous set) again separate along the first axis of the analysis, whereas all of the <it>Arabidopsis </it>genes are clustered on the left side of the plot. In other words, the use of homologous genes does not alter the result that we observed in Figure <figr fid="F2">2</figr>. Furthermore, all of the <it>Arabidopsis </it>genes have generally similar patterns of codon usage, regardless of whether they are homologs of High GC or Low GC rice genes. This suggests that the divergence in codon usage patterns among rice genes has occurred since the evolutionary divergence of the dicots and monocots approximately 200 million years (My) ago, i.e., over a relatively short evolutionary time.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Correspondence Analysis of relative synonymous codon usage (RSCU) for 7,160 homologous gene pairs from rice and <it>Arabidopsis</it></p>
					</caption>
					<text>
						<p><b>Correspondence Analysis of relative synonymous codon usage (RSCU) for 7,160 homologous gene pairs from rice and <it>Arabidopsis</it></b>. The Figure shows the distribution of genes on the primary and secondary axes (accounting for 40.2% and 4.2% of the total variation, respectively). High GC rice genes are shown in red; Low GC rice genes are shown in blue; the <it>Arabidopsis </it>homologs are shown in yellow.</p>
					</text>
					<graphic file="1471-2148-7-S1-S6-4"/>
				</fig>
				<p>Although our results suggest that the GC content of the High GC rice genes has increased significantly since the divergence of the monocots and dicots, there remains the formal possibility that, instead, the <it>Arabidopsis</it> genes have recently converged toward a common, lower GC content. To distinguish between these possibilities, we extracted 92 homologous sequences from the genome of <it>Pinus taeda</it>, and we used these as an out-group to infer the direction of the change. Whereas the High GC rice genes have an average GC content of 66.6 (SE 0.08) and their <it>Arabidopsis </it>homologs have an average GC content of 46.0 (SE 0.07), the average GC content of the <it>P. taeda </it>homologs is 45.2 (SE 0.04). Thus we can infer that the ancestral condition was similar to that currently seen in <it>Arabidopsis</it>.</p>
			</sec>
			<sec>
				<st>
					<p>Correlation of gene length with GC content</p>
				</st>
				<p>Gene length has previously been shown to be negatively correlated with codon usage in <it>C. elegans</it>, <it>Drosophila </it>and <it>Arabidopsis </it><abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. We tested to see whether the same relationship holds true for rice genes. We compared the average gene lengths of the two groups of rice genes (High GC and Low GC), as defined in Figure <figr fid="F1">1</figr>. Not only did we find that the High GC genes were shorter, as suggested by previous studies in other species, but the magnitude of this length difference was surprisingly large and highly significant (p &lt; 0.0001). Specifically, the average length of the Low GC coding sequences (1417 +/- 13 bp) is approximately 500 bp larger than the average for the High GC genes (921 +/- 9 bp). Although there is a wide range of individual gene lengths within each class, this highly significant average difference suggests that the length of the rice genes was a significant factor in the evolutionary increase in GC content.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>Our survey of codon usage patterns among rice genes shows that there is a wide, multimodal distribution within this genome, in contrast to the much narrower, unimodal distribution of codon usage patterns seen among <it>Arabidopsis </it>genes. Our analysis of homologous gene pairs between the two species demonstrates that these contrasting patterns of codon usage cannot be explained by simple differences in gene content between the two genomes. The most parsimonious explanation is that, since the evolutionary divergence of the monocot and dicot plants approximately 200 My ago, there has been a general trend to increase the GC content of the coding sequences within the rice lineage. This increase, however, has occurred in only a subset of the genes. This heterogeneity in nucleotide content is correlated with a large difference in codon usage patterns among the rice genes. A previous study has noted a similar effect of GC content on codon usage in another monocot, <it>Zea mays </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, and in the <it>Gramineae </it>in general <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
			<p>This demonstration of a strong correlation between the nucleotide composition at the third codon positions (GC3) and codon usage suggests that the variation in codon usage among genes may be due to a mutational bias at the DNA level rather than natural selection acting at the level of mRNA translation. This correlation does not, by itself, prove that the cause is at the DNA level, however. Some inferences about the primary causes can be made by comparing the results seen in rice and <it>Arabidopsis</it>. If the large differences in codon usage among rice genes were primarily linked to broad functional classes <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, we would expect to see a parallel pattern among the <it>Arabidopsis </it>homologs &#8211; but this is not the case when we compare homologous gene pairs between the two species. Specifically, the <it>Arabidopsis </it>homologs do not fit into these two classes based on GC content. Moreover, previous studies have provided evidence that codon bias in <it>Arabidopsis </it>is correlated with gene expression levels rather than with variations in nucleotide content <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. These seemingly contradictory results can be reconciled if the patterns of codon usage in both rice and <it>Arabidopsis </it>are affected equally by weak translational selection. In the latter case, the absence of strong mutational bias facilitates the detection of the effects of translational selection <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> but, in the rice genome, this translational effect is swamped by the much larger effect of nucleotide bias. This view is consistent with recent findings <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B28">28</abbr></abbrgrp> that the relative strength of translational selection can vary widely among genomes.</p>
			<p>The question of translational selection versus mutational bias can be approached in a number of other ways. For instance, if codon bias is due to positive selective pressures then we would expect those genes with higher codon bias to have lower rates of synonymous substitution <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Such a negative correlation has been observed in bacteria <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, Drosophila <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and yeast <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. In contrast to these results, when we compare rice genes with their <it>Arabidopsis </it>homologs, we find instead that there is a positive correlation between codon bias and the rate of synonymous substitution. Specifically, there is a higher rate of synonymous substitutions between the High GC rice genes and their <it>Arabidopsis</it> homologs than between the Low GC rice genes and their homologs. In order to quantify this relationship between codon bias and divergence rate, we chose a sample of 895 <it>Arabidopsis</it> genes from chromosome 4 that had homologs in the rice genome. For each of the 895 rice homologs, we measured the effective number of codons (Nc) and calculated the rate of synonymous substitution (dS). We observed a significant negative correlation (R = -0.27, p &lt; 0.00001). Since the value of Nc is inversely proportional to the level of codon bias, this means that there is a highly significant positive correlation between codon bias and divergence rate in this case. This provides further support for the view that the bias is not due to positive selection for translational efficiency in this case.</p>
			<p>Yet another way to distinguish between the effects of mutational bias and translational selection is to compare the nucleotide contents of synonymous and nonsynonymous sites. For instance, if the high GC content at the third codon position of some rice genes were due to translational selection, we would not expect to see any correlation between synonymous and nonsynonymous sites. However, in a previous study <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> we did find correlated patterns of variation in the GC content of non-synonymous sites among rice genes. Finally, the fact that the highly expressed ribosomal protein genes are distributed throughout the entire range of GC contents (see Fig. <figr fid="F3">3</figr>) indicates that the codon bias is not correlated with gene expression level. In summary, it appears that the codon usage of the High GC rice genes is determined primarily by nucleotide bias.</p>
			<p>Although we have several lines of evidence that the variations in codon usage are due to the underlying variations in nucleotide content, we still need to explain why some rice genes have become extremely GC-rich while others remain relatively GC-neutral. We found that there is a strong negative correlation between the length of rice genes and their nucleotide content. The reasons why longer genes are more resistant to increases in GC content remain to be elucidated, but one possibility is that the longer genes provide a larger mutational target at the sequence level and that, consequently, they are subject to more purifying selection that counteracts the mutational changes that result in the increased GC content in shorter genes <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Another possibility is that the transcription of AT rich genes is, in general, more efficient than that of GC rich genes and this efficiency difference would be more important for longer genes. But if this were the case, we would expect the same forces to be at work among the <it>Arabidopsis </it>homologs where we observe the same length difference, but without the associated difference in GC content.</p>
			<p>In summary, the simple observation of large differences in codon usage among rice genes might lead us to speculate on functional differences between genes as a basis for the variations in codon usage and GC content. The comparison with homologous sequences from <it>Arabidopsis</it>, however, has allowed us to "cross-check" such a prediction and has lead us instead to the conclusion that most of the variation in codon usage among rice genes is <ul>not</ul> due to positive selection acting on synonymous codon positions. Rather, it is due to a balance between a directional mutational bias, counterbalanced by negative selection acting at all nucleotide positions.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Coding sequence data</p>
				</st>
				<p>14005 rice coding sequences that are longer than 75 codons were obtained from Gramene database <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and the EMBL as previously described <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. For the <it>A. thaliana </it>coding sequences we used the file containing 25,625 <it>Arabidopsis </it>coding sequences (all greater than 75 codons) that we obtained previously <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The homologous sequences from <it>Pinus taeda </it>(Loblolly pine) were extracted, using BLASTN searches with a cutoff Expect value of 1e-20, from the dataset of 14,198 <it>P. taeda </it>sequences in the NCBI Unigene database <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Identification of homologous sequences and computing synonymous substitution rates</p>
				</st>
				<p>Homologous pairs between <it>O. sativa </it>and <it>A. thaliana </it>were identified by performing BLASTP searches <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> of the rice protein sequences against <it>Arabidopsis </it>sequences with a cutoff Expect value of 1e-20. When a rice protein has more than one <it>Arabidopsis </it>protein hit, the pair having the lowest Expect value was retained. Using this method, we identified 7,160 homologous gene pairs between the two species, of which 895 gene pairs are rice genes homologous to <it>Arabidopsis </it>chromosome 4 genes. In order to see the relationship between codon bias and the evolutionary rate, we used the method of Yang and Nielsen <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> to calculate the synonymous rates for the 895 gene pairs.</p>
			</sec>
			<sec>
				<st>
					<p>Statistical analyses</p>
				</st>
				<p>Relative synonymous codon usage (RSCU) is the observed frequency of a codon divided by the frequency expected if all synonyms for that amino acid were used equally. An RSCU value close to 1.0 indicates a lack of codon bias. The RSCU was computed for each gene. The RSCU values were then analyzed using correspondence analysis (see below).</p>
				<p>The effective number of codons (Nc) is a commonly used measurement to quantify codon usage bias of a gene <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The Nc takes a value between 20, when only one synonymous codon is used for each amino acid, and 61, when all codons are uniformly used. Lower Nc values indicate stronger bias. Since Nc is constrained by G+C content of the gene, it is often plotted against GC3s (the frequency of G+C at the third synonymous codon positions) of the gene to investigate patterns of codon usage <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
				<p>Correspondence analysis <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, was used to explore the variation in the 59 RSCU values for each of the 61 sense codons, other than the unique methionine and tryptophan codons. This multivariate statistical method creates a series of orthogonal axes to identify trends that explain the data variation, with each subsequent axis explaining a decreasing amount of the variation. The method, as implemented in CodonW version 1.4 <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, was used in this study. Correspondence analysis assigns ordination for each gene and codon on these axes, and the ordination of the genes and codons can be superimposed. Since the first two axes capture a larger fraction of the variance of the data than any of the other axes, genes and codons were plotted on these two axes only.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>H-CW carried out the analyses and drafted the manuscript. DAH conceived of the study, and participated in its design and coordination and helped to draft the manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>This work was supported by a Research Grant from NSERC Canada (DAH) and an Ontario Graduate Scholarship (HCW).</p>
				<p>This article has been published as part of <it>BMC Evolutionary Biology </it>Volume 7 Supplement 1, 2007: First International Conference on Phylogenomics. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcevolbiol/7?issue=S1</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type</p>
				</title>
				<aug>
					<au>
						<snm>Grantham</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gautier</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Gouy</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1980</pubdate>
				<volume>8</volume>
				<fpage>1893</fpage>
				<lpage>1912</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">324046</pubid>
						<pubid idtype="pmpid" link="fulltext">6159596</pubid>
						<pubid idtype="doi">10.1093/nar/8.9.1893</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Analysis of Codon Usage</p>
				</title>
				<aug>
					<au>
						<snm>Peden</snm>
						<fnm>JF</fnm>
					</au>
				</aug>
				<source>PhD Thesis</source>
				<publisher>University of Nottingham</publisher>
				<pubdate>1999</pubdate>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Codon usage in <it>Escherichia coli</it>, <it>Bacillussubtilis</it>, <it>Saccharomyces cerevisiae</it>, <it>Schizosaccharomyces pombe</it>, <it>Drosophila melanogaster </it>and <it>Homo sapiens</it>; a review of the considerable within-species diversity</p>
				</title>
				<aug>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Cowe</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Shields</snm>
						<fnm>DC</fnm>
					</au>
					<au>
						<snm>Wolfe</snm>
						<fnm>KH</fnm>
					</au>
					<au>
						<snm>Wright</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1988</pubdate>
				<volume>16</volume>
				<fpage>8207</fpage>
				<lpage>8211</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">338553</pubid>
						<pubid idtype="pmpid" link="fulltext">3138659</pubid>
						<pubid idtype="doi">10.1093/nar/16.17.8207</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Analysis of codon usage patterns of bacterial genomes using the self-organizing map</p>
				</title>
				<aug>
					<au>
						<snm>Wang</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Badger</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kearney</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2001</pubdate>
				<volume>18</volume>
				<fpage>792</fpage>
				<lpage>800</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11319263</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Codon usage in bacteria: correlation with gene expressivity</p>
				</title>
				<aug>
					<au>
						<snm>Gouy</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gautier</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1982</pubdate>
				<volume>10</volume>
				<fpage>7055</fpage>
				<lpage>7074</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">326988</pubid>
						<pubid idtype="pmpid" link="fulltext">6760125</pubid>
						<pubid idtype="doi">10.1093/nar/10.22.7055</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Codon usage and genome evolution</p>
				</title>
				<aug>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Matassi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Curr Opin Genet Dev</source>
				<pubdate>1994</pubdate>
				<volume>4</volume>
				<fpage>851</fpage>
				<lpage>860</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0959-437X(94)90070-1</pubid>
						<pubid idtype="pmpid">7888755</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Variation in the strength of selected codon usage bias among bacteria</p>
				</title>
				<aug>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Bailes</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Grocock</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Peden</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Sockett</snm>
						<fnm>RE</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>1141</fpage>
				<lpage>1153</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">549432</pubid>
						<pubid idtype="pmpid" link="fulltext">15728743</pubid>
						<pubid idtype="doi">10.1093/nar/gki242</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Correlation between the abundance of <it>Escherichia coli </it>transfer RNAs and the occurrence of the respective codons in its protein genes</p>
				</title>
				<aug>
					<au>
						<snm>Ikemura</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1981</pubdate>
				<volume>146</volume>
				<fpage>1</fpage>
				<lpage>21</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0022-2836(81)90363-6</pubid>
						<pubid idtype="pmpid" link="fulltext">6167728</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of <it>Bacillus subtilis </it>tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis</p>
				</title>
				<aug>
					<au>
						<snm>Kanaya</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Yamada</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kudo</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Ikemura</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>1999</pubdate>
				<volume>238</volume>
				<fpage>143</fpage>
				<lpage>155</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(99)00225-5</pubid>
						<pubid idtype="pmpid" link="fulltext">10570992</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>"Silent" sites in <it>Drosophila </it>genes are not neutral: evidence of selection among synonymous codons</p>
				</title>
				<aug>
					<au>
						<snm>Shields</snm>
						<fnm>DC</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Wright</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1988</pubdate>
				<volume>5</volume>
				<fpage>704</fpage>
				<lpage>716</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">3146682</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Codon usage in <it>Caenorhabditis elegans</it>: delineation of translational selection and mutational biases</p>
				</title>
				<aug>
					<au>
						<snm>Stenico</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lloyd</snm>
						<fnm>AT</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1994</pubdate>
				<volume>22</volume>
				<fpage>2437</fpage>
				<lpage>2446</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">308193</pubid>
						<pubid idtype="pmpid" link="fulltext">8041603</pubid>
						<pubid idtype="doi">10.1093/nar/22.13.2437</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<aug>
					<au>
						<snm>Li</snm>
						<fnm>WH</fnm>
					</au>
				</aug>
				<source>Molecular Evolution</source>
				<publisher>Sunderland, MA: Sinauer Associates, Inc</publisher>
				<pubdate>1997</pubdate>
			</bibl>
			<bibl id="B13">
				<title>
					<p>High guanine and cytosine content increases mRNA levels in mammalian cells</p>
				</title>
				<aug>
					<au>
						<snm>Kudla</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Lipinski</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Caffin</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Helwak</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Zylicz</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>PLoS Biol</source>
				<pubdate>2006</pubdate>
				<volume>4</volume>
				<fpage>e180</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1463026</pubid>
						<pubid idtype="pmpid" link="fulltext">16700628</pubid>
						<pubid idtype="doi">10.1371/journal.pbio.0040180</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C and G-ending codons</p>
				</title>
				<aug>
					<au>
						<snm>Fennoy</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Bailey-serres</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1993</pubdate>
				<volume>21</volume>
				<fpage>5294</fpage>
				<lpage>5300</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">310561</pubid>
						<pubid idtype="pmpid" link="fulltext">8265340</pubid>
						<pubid idtype="doi">10.1093/nar/21.23.5294</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Two classes of genes in plants</p>
				</title>
				<aug>
					<au>
						<snm>Carels</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Bernardi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2000</pubdate>
				<volume>154</volume>
				<fpage>1819</fpage>
				<lpage>1825</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1461008</pubid>
						<pubid idtype="pmpid" link="fulltext">10747072</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Gramene: a resource for comparative grass genomics</p>
				</title>
				<aug>
					<au>
						<snm>Ware</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Jaiswal</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Ni</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Pan</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Chang</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Clark</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Teytelman</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Schmidt</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Zhao</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Cartinhour</snm>
						<fnm>S</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>103</fpage>
				<lpage>105</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">99157</pubid>
						<pubid idtype="pmpid" link="fulltext">11752266</pubid>
						<pubid idtype="doi">10.1093/nar/30.1.103</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Mutational bias affects protein evolution in flowering plants</p>
				</title>
				<aug>
					<au>
						<snm>Wang</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Singer</snm>
						<fnm>GAC</fnm>
					</au>
					<au>
						<snm>Hickey</snm>
						<fnm>DA</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2004</pubdate>
				<volume>21</volume>
				<fpage>90</fpage>
				<lpage>96</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msh003</pubid>
						<pubid idtype="pmpid" link="fulltext">14595101</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Basic local alignment search tool</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Gish</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Myers</snm>
						<fnm>EW</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1990</pubdate>
				<volume>215</volume>
				<fpage>403</fpage>
				<lpage>410</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
						<pubid idtype="pmpid" link="fulltext">2231712</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Nielsen</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2000</pubdate>
				<volume>17</volume>
				<fpage>32</fpage>
				<lpage>43</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10666704</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The 'effective number of codons' used in a gene</p>
				</title>
				<aug>
					<au>
						<snm>Wright</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>1990</pubdate>
				<volume>87</volume>
				<fpage>23</fpage>
				<lpage>29</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0378-1119(90)90491-9</pubid>
						<pubid idtype="pmpid" link="fulltext">2110097</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<aug>
					<au>
						<snm>Greenacre</snm>
						<fnm>MJ</fnm>
					</au>
				</aug>
				<source>Theory and Applications of Correspondence Analysis</source>
				<publisher>London: Academic Press</publisher>
				<pubdate>1984</pubdate>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Use and misuse of correspondence analysis in codon usgae studies</p>
				</title>
				<aug>
					<au>
						<snm>Perriere</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Thioulouse</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nucl Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>4548</fpage>
				<lpage>4555</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">137129</pubid>
						<pubid idtype="pmpid" link="fulltext">12384602</pubid>
						<pubid idtype="doi">10.1093/nar/gkf565</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Compositional transitions between <it>Oryza sativa </it>and <it>Arabidopsis thaliana </it>genes are linked to the functional change of encoded proteins</p>
				</title>
				<aug>
					<au>
						<snm>Banerjee</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Gupta</snm>
						<fnm>SK</fnm>
					</au>
					<au>
						<snm>Ghosh</snm>
						<fnm>TC</fnm>
					</au>
				</aug>
				<source>Plant Sci</source>
				<pubdate>2006</pubdate>
				<volume>170</volume>
				<fpage>267</fpage>
				<lpage>273</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1016/j.plantsci.2005.08.012</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Accounting for background nucleotide composition when measuring codon usage bias</p>
				</title>
				<aug>
					<au>
						<snm>Novembre</snm>
						<fnm>JA</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2002</pubdate>
				<volume>19</volume>
				<fpage>1390</fpage>
				<lpage>1394</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12140252</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and <it>Arabidopsis</it></p>
				</title>
				<aug>
					<au>
						<snm>Duret</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Mouchiroud</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>4482</fpage>
				<lpage>4487</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">16358</pubid>
						<pubid idtype="pmpid" link="fulltext">10200288</pubid>
						<pubid idtype="doi">10.1073/pnas.96.8.4482</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Synonymous codon usage and gene function are strongly related in Oryza sativa</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Dou</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ji</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Xue</snm>
						<fnm>Q</fnm>
					</au>
				</aug>
				<source>Biosystems</source>
				<pubdate>2005</pubdate>
				<volume>80</volume>
				<fpage>123</fpage>
				<lpage>131</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.biosystems.2004.10.008</pubid>
						<pubid idtype="pmpid" link="fulltext">15823411</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Classification of <it>Arabidopsis thaliana </it>gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction</p>
				</title>
				<aug>
					<au>
						<snm>Mathe</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Peresetsky</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Dehais</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Van Montagu</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rouze</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1999</pubdate>
				<volume>285</volume>
				<fpage>1977</fpage>
				<lpage>1991</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1998.2451</pubid>
						<pubid idtype="pmpid" link="fulltext">9925779</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Solving the riddle of codon usage preferences: a test for translational selection</p>
				</title>
				<aug>
					<au>
						<snm>Dos Reis</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Savva</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Wernisch</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>5036</fpage>
				<lpage>5044</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">521650</pubid>
						<pubid idtype="pmpid" link="fulltext">15448185</pubid>
						<pubid idtype="doi">10.1093/nar/gkh834</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>The signature of selection mediated by expression on human genes</p>
				</title>
				<aug>
					<au>
						<snm>Urrutia</snm>
						<fnm>AO</fnm>
					</au>
					<au>
						<snm>Hurst</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>2260</fpage>
				<lpage>2264</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403694</pubid>
						<pubid idtype="pmpid" link="fulltext">12975314</pubid>
						<pubid idtype="doi">10.1101/gr.641103</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias</p>
				</title>
				<aug>
					<au>
						<snm>Sharp</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>WH</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1987</pubdate>
				<volume>4</volume>
				<fpage>222</fpage>
				<lpage>230</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">3328816</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Evolution of codon usage bias in Drosophila</p>
				</title>
				<aug>
					<au>
						<snm>Powell</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Moriyama</snm>
						<fnm>EN</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1997</pubdate>
				<volume>94</volume>
				<fpage>7784</fpage>
				<lpage>7790</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">33704</pubid>
						<pubid idtype="pmpid" link="fulltext">9223264</pubid>
						<pubid idtype="doi">10.1073/pnas.94.15.7784</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Cytosine Usage Modulates the Correlation between CDS Length and CG Content in Prokaryotic Genomes</p>
				</title>
				<aug>
					<au>
						<snm>Xia</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Xie</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Carullo</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Hickey</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2006</pubdate>
				<volume>23</volume>
				<fpage>1450</fpage>
				<lpage>1454</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msl012</pubid>
						<pubid idtype="pmpid" link="fulltext">16687416</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<url>http://www.mathstat.dal.ca/~hcwang/Research/Manuscript/riceCodonUsage/rice_ArabidopsisGConchromosome.xls</url>
			</bibl>
			<bibl id="B34">
				<url>ftp://ftp.ncbi.nih.gov/repository/UniGene/Pinus_taeda/</url>
			</bibl>
		</refgrp>
	</bm>
</art>
