<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-191</ui>
   <ji>1471-2164</ji>
   <fm>
		<dochead>Research article</dochead>
		<bibl>
			<title>
				<p>C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Austin</snm>
					<mi>S</mi>
					<fnm>Ryan</fnm>
					<insr iid="I1"/>
					<email>ryan.austin@utoronto.ca</email>
				</au>
				<au id="A2" ca="yes">
					<snm>Provart</snm>
					<mi>J</mi>
					<fnm>Nicholas</fnm>
					<insr iid="I1"/>
					<email>nicholas.provart@utoronto.ca</email>
				</au>
				<au id="A3">
					<snm>Cutler</snm>
					<mi>R</mi>
					<fnm>Sean</fnm>
					<insr iid="I2"/>
					<email>sean.cutler@ucr.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Cell &amp; Systems Biology, University of Toronto, 25 Willcocks St., Toronto, ON. M5S 3B2, Canada</p>
				</ins>
				<ins id="I2">
					<p>Center for Plant Cell Biology (CEPCEB), Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA</p>
				</ins>
			</insg>
			<source>BMC Genomics</source>
			<issn>1471-2164</issn>
			<pubdate>2007</pubdate>
			<volume>8</volume>
			<issue>1</issue>
			<fpage>191</fpage>
			<url>http://www.biomedcentral.com/1471-2164/8/191</url>
			<xrefbib>
				<pubidlist>
					<pubid idtype="pmpid">17594486</pubid>
					<pubid idtype="doi">10.1186/1471-2164-8-191</pubid>
				</pubidlist>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>31</day>
					<month>10</month>
					<year>2006</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>26</day>
					<month>6</month>
					<year>2007</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>26</day>
					<month>6</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>Austin et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*), the ER-retention signal (K/HDEL*), the ER-retrieval signal for membrane bound proteins (KKxx*), the prenylation signal (CC*) and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists between species, among kingdoms and across eukaryotes. Motifs of note include a serine-acidic peptide (DSD*) as well as several lysine enriched motifs found in nearly all eukaryotic genomes examined.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>We have successfully generated a high confidence representation of eukaryotic motifs anchored at the C-terminus. A high incidence of true-positives in our results suggests that several previously unidentified tripeptide patterns are strong candidates for representing novel peptide motifs of a widely employed nature in the C-terminal biology of eukaryotes. Our application of comparative genomics, statistical over-representation and the adjustment for protein family homology has generated several hypotheses concerning the C-terminal topology as it pertains to sorting and potential protein interaction signals. This approach to background reduction could be expanded for application to protein motif prediction in the protein interior. A parallel N-terminal analysis is presented as supplementary data.</p>
				</sec>
			</sec>
		</abs>
	</fm>
   <meta>
		<classifications>
			<classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
		</classifications>
	</meta>
   <bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>The carboxy tails of proteins are frequent sites of post-translational modification, protein-protein interaction domains and sub-cellular protein sorting motifs. This is presumably due to a high-kinetic cost in burying the termini within the interior of the protein; leaving the head and tail regions of many proteins exposed to the cytoplasm and free to engage in static or dynamic biochemical interactions <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>. Although a variety of protein domains have been characterized to preferentially or even exclusively occur within the terminal regions, a class of signatures has been found to be effectively dependent upon their proximity to the C-terminal end for proper function. Members of this class of motifs include: the peroxisomal PTS1 signal (SKL-COOH), the ER retention signal (K/HDEL-COOH), the ER retrieval signal for membrane bound proteins (KKxx-COOH) and the protein C-terminal prenylation motif (Caxx-COOH). These motifs appear as a frequent sorting strategy in diverse protein groups and are mostly conserved throughout eukaryotes <abbrgrp>
					<abbr bid="B1">1</abbr>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
					<abbr bid="B4">4</abbr>
					<abbr bid="B5">5</abbr>
				</abbrgrp>. Since such signals are often critical to proper function, they are likely to be highly resistant to selective pressure and therefore evolutionarily conserved in numerous protein classes and species genomes. This conservation should be detectable in whole genome analysis as a statistical over-representation of motif derived tripeptides against a background of tripeptide expectation by chance alone.</p>
			<p>In general, protein motif prediction can be divided into two basic approaches, the <it>a priori </it>mapping of experimentally verified motifs to novel unannotated sequences (scanning) and the <it>ab initio </it>identification of potentially novel motifs without any prior knowledge of motif structure. Over the past decade, rapid advances in high-throughput proteomics and a large body of literature detailing the structure and function of numerous proteins in many species, have focused protein motif prediction on the annotation of novel sequences using motif scanning from an <it>a priori </it>collection of protein domain knowledge in the literature <abbrgrp>
					<abbr bid="B6">6</abbr>
					<abbr bid="B7">7</abbr>
				</abbrgrp>. Effective sequence alignment algorithms and an abundance of coding sequence data have allowed for the effective identification of conserved sequence domains among orthologous proteins, limiting the need for <it>ab initio </it>protein motif prediction methods. Nevertheless, <it>ab initio </it>prediction methods are likely to play a significant role in our completion of a comprehensive protein domain grammar. In addition to <it>ab initio </it>prediction, integrative methods have applied protein-protein interaction maps, crystallography data, NMR results and amino acid frequencies to the prediction of novel functional domains in diverse classes of proteins <abbrgrp>
					<abbr bid="B8">8</abbr>
					<abbr bid="B9">9</abbr>
					<abbr bid="B10">10</abbr>
					<abbr bid="B11">11</abbr>
				</abbrgrp>. <it>Ab initio </it>prediction of novel protein motifs from primary sequence using heuristical approaches, enumerative measures, orthologous sequences, functional annotation and statistical over-representation have all been explored using an integrative framework <abbrgrp>
					<abbr bid="B12">12</abbr>
					<abbr bid="B13">13</abbr>
					<abbr bid="B14">14</abbr>
					<abbr bid="B15">15</abbr>
				</abbrgrp>.</p>
			<p>Methods that assay sequence statistical over-representation apply chi-squared, p-values or z-scores to nmer frequencies, most often in association with one or more expectation values or a randomized background model. The reasoning behind such approaches is that motifs of critical functional significance are expected to be more highly conserved than benign stretches of primary sequence free from selective pressure. Thus short sequence stretches of critical function should exhibit higher statistical frequencies than non-critical regions more tolerant to changes and variation in residue make-up. Unfortunately, as a low signal-to-noise ratio is a frequent problem in sequence analysis, such studies require careful selection of a background model that will optimally reduce this biological 'noise' <abbrgrp>
					<abbr bid="B11">11</abbr>
					<abbr bid="B15">15</abbr>
				</abbrgrp>. Bayesian inference, sequence randomization and the use of hidden Markov models have all been explored to this effect. However, those approaches that most closely model the biological background appear to be the most effective in reducing the false positive rate <abbrgrp>
					<abbr bid="B16">16</abbr>
				</abbrgrp>. In addition to the complications of motif degeneracy, variability in the positioning of individual motifs along the length of genetic sequences introduces computationally expensive considerations into the analysis. Hence, the ability to define a biologically relevant reference point from which to examine sequence prevalence can greatly simplify statistical calculations <abbrgrp>
					<abbr bid="B17">17</abbr>
				</abbrgrp>. This has been applied to the prediction of transcription factor binding sites in relation to the transcription start site as well as in the examination of both nucleotide and peptide frequencies in relation to the protein termini <abbrgrp>
					<abbr bid="B14">14</abbr>
					<abbr bid="B18">18</abbr>
					<abbr bid="B19">19</abbr>
					<abbr bid="B20">20</abbr>
					<abbr bid="B21">21</abbr>
				</abbrgrp>.</p>
			<p>Statistical studies of nucleotide and peptide frequencies in the C-terminus of eukaryotic genomes have revealed non-random nucleotide, amino acid and short peptide biases <abbrgrp>
					<abbr bid="B17">17</abbr>
					<abbr bid="B18">18</abbr>
					<abbr bid="B19">19</abbr>
					<abbr bid="B20">20</abbr>
					<abbr bid="B22">22</abbr>
					<abbr bid="B23">23</abbr>
				</abbrgrp>. In 2003, Chung et al. tallied the frequencies of C-terminal 3mers and 4mers in several eukaryotic genomes to show that known targeting signals ranked highly in several species <abbrgrp>
					<abbr bid="B22">22</abbr>
				</abbrgrp>. In that same year, Gatto &amp; Berg likewise compared C-terminal tripeptide frequencies to a shuffled proteome to identify known motifs as over-represented in several eukaryotic proteomes <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>. However, efforts to increase the low signal-to-noise ratio inherent in such analyses have not been fully explored and a high-confidence snapshot of biologically relevant C-terminal topology has yet to be determined. We therefore reasoned that the exploration of randomized sequence background models along with additional data that incorporates protein family information and comparative genomics could reduce background levels enough to accurately depict a collection of eukaryotic conserved C-terminal anchored protein motifs (CTAMs). As C-terminal sequence homology between common members of large protein families has been postulated to heavily contribute to individual nmer counts in frequency calculations <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>, our test statistic (z-score) tallied any multiple tripeptide counts arising from members of a common gene family as a single instance. This effort was able to identify a collection of eukaryotic-conserved statistically over-represented C-terminal tripeptides (SOCTs), many of which correspond to known C-terminally anchored sequences, as well as several other novel and intriguing motif patterns within the C-terminal biology of the 7 species examined.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>We applied a novel methodology to the prediction of biologically active sequence motifs at the C-terminus of 7 eukaryotic genomes (<it>A. thaliana, O. sativa, S. cerevisiae, C. elegans, D. melanogaster, M. musculus </it>and <it>H. sapiens</it>). Generally, our methodology applied a penalized z-statistic that disregarded tripeptide frequencies arising from simple sequences or from C-terminal homology among members of protein families. Comparative genomics of SOCT frequency between species pairs and across all species was then used to filter for C-terminal protein motifs potentially involved in generalized protein biology roles such as protein sorting and post-translational modification (see Fig <figr fid="F1">1</figr>).</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Flowchart of the SOCT pipeline</p>
				</caption>
				<text>
					<p><b>Flowchart of the SOCT pipeline</b>. A combination of filters and pre-processing was performed against individual proteomes to obtain a comprehensive set of z-statistics for each possible tripeptide at all positions from the C-terminal end to 100 residues in from the C-terminus. Programs and scripts for data analysis are represented as barred boxes, while resulting datasets are depicted as polygons.</p>
				</text>
				<graphic file="1471-2164-8-191-1"/>
			</fig>
			<p>The general implementation of our method for each proteome is as follows:</p>
			<p>1) generate a randomized background of c-terminal peptide frequencies from proteome sequence</p>
			<p>2) mask low-complexity sequences within the c-terminal regions</p>
			<p>3) generate comprehensive position-specific z-statistics for all possible tripeptides occurring at positions from -3 to -100 residues in from the carboxy terminus.</p>
			<p>4) determine gene family clusters for the proteome</p>
			<p>5) adjust z-scores to exclude duplicate tripeptide counts arising from within individual gene families.</p>
			<p>Initial analysis, performed without sequence masking or protein family filtering, reconfirmed the strong terminal bias in tripeptide composition seen by Gatto &amp; Berg <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>. This bias has also been observed at the levels of amino acid <abbrgrp>
					<abbr bid="B19">19</abbr>
					<abbr bid="B24">24</abbr>
				</abbrgrp>, nucleotide and codon composition <abbrgrp>
					<abbr bid="B20">20</abbr>
				</abbrgrp> and decamer peptides <abbrgrp>
					<abbr bid="B17">17</abbr>
				</abbrgrp>. Our results extend the confirmed presence of a terminal tripeptide bias to include the genomes of <it>O. sativa</it>, <it>C. elegans</it>, <it>D. melanogaster </it>and <it>M. musculus </it>[see Additional file <supplr sid="S7">7</supplr> for the N-terminal data set]. It would appear that this composition bias exists at all levels of analysis in all species from bacteria to higher eukaryotes. In this study, we represent the terminal bias by the presence of a disproportionate amount of 'statistically over-represented C-terminal tripeptides' (SOCTs) in the extreme carboxy terminal positions (z &#8805; 3.0, see Fig <figr fid="F2">2</figr>).</p>
			<suppl id="S7">
				<title>
					<p>Additional File 7</p>
				</title>
				<text>
					<p>All filtered N-terminal tripeptide z-statistics for all species</p>
				</text>
				<file name="1471-2164-8-191-S7.zip">
					<p>Click here for file</p>
				</file>
			</suppl>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>Position-specific abundance of SOCTs in <it>A. thaliana</it></p>
				</caption>
				<text>
					<p><b>Position-specific abundance of SOCTs in <it>A. thaliana</it></b>. Graphical depictions of the number of statistically over-represented C-terminal tripeptides (z &#8805; 3) occurring in the C-terminal region (-3 to -100). <b>A</b>. The unfiltered assessment of statistical over-representation in the C-terminus, as compared to a randomized data set control. <b>B</b>. The reduction in site-specific SOCT abundance after successive rounds of filtering measures including sequence masking, protein family adjustment and the stipulation of at least 10 occurrences for each SOCT.</p>
				</text>
				<graphic file="1471-2164-8-191-2"/>
			</fig>
			<sec>
				<st>
					<p>Genomic data and sequence pre-filtering</p>
				</st>
				<p>Predicted protein databases for each species were downloaded in fasta format from NCBI with the exceptions of <it>A. thaliana</it>, which was obtained from TAIR, and <it>O. sativa</it>, which was downloaded from TIGR. As the <it>O. sativa </it>genome contains more than 17% transposable elements, these sequences represented a high potential for skewing tripeptide counts unfavourably and are recommended for removal from such whole genome analyses <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>. The <it>O. sativa </it>dataset was therefore, pre-filtered to remove all sequences annotated as a transposable element prior to the analysis. This measure dramatically reduced the level of background noise in our results. This is because the abundance of retro-element-type sequences in rice can not only obfuscate the biologically relevant background tripeptide frequencies, but result in numerous clusters of transposon derived gene families in our clustering efforts. These 'junk clusters' artificially inflate tripeptide counts and their respective z-scores. As rice was the only dataset to possess such an exceptionally large percentage of annotated retroelements, it was the only proteome pre-filtered in this manner [see Additional file <supplr sid="S2">2</supplr>].</p>
				<suppl id="S2">
					<title>
						<p>Additional File 2</p>
					</title>
					<text>
						<p>Background reduction from TE filtering (<it>O. sativa</it>)</p>
					</text>
					<file name="1471-2164-8-191-S2.png">
						<p>Click here for file</p>
					</file>
				</suppl>
				<p>Another confounding factor is simple sequences, which are stretches of low complexity residue repeats of a presumably benign or possibly structural function, and which are known to skew sequence statistics <abbrgrp>
						<abbr bid="B25">25</abbr>
					</abbrgrp>. Masking of these sequences prior to statistical analysis is a frequent strategy in sequence searching algorithms (e.g. <it>BLAST</it>) <abbrgrp>
						<abbr bid="B26">26</abbr>
					</abbrgrp>. Due to the presence of numerous simple sequence-like tripeptides with significant scores in our preliminary work and in prior studies <abbrgrp>
						<abbr bid="B18">18</abbr>
						<abbr bid="B22">22</abbr>
					</abbrgrp>, <it>seg </it>filtering was applied to each species proteome prior to obtaining individual tripeptide counts and comparison against the randomization model. It should be noted that the randomly generated fasta sets were not pre-filtered with <it>seg</it>. This measure results in greater background averages for simple-sequence-like tripeptides and translates into an increase in the stringency against such tripeptides via lower z-scores. Overall, these measures removed several simple-sequence-like tripeptides from our significant results and succeeded in lowering observed SOCT abundance levels slightly (see Fig <figr fid="F2">2</figr>).</p>
			</sec>
			<sec>
				<st>
					<p>Background randomization models</p>
				</st>
				<p>Our approach adopted the strategy of genome randomization for assessing expectant tripeptide frequencies. Each respective species proteome was randomized 100 times in order to obtain a frequency distribution for each possible tripeptide at all positions from the C-terminal positions of -3 to -100. The expected mean and standard deviation values derived from these random sets were compared to observed tripeptide counts in the actual proteome in order to derive a position-specific tripeptide z-score. Three peptide randomization models were tested for their ability to affect the level of position specific SOCT abundance. Briefly, peptide sequences of equal length to every protein in the proteome were iteratively generated using a program <it>fastarand </it>written in the C programming language. The randomization models included: 1) randomization based on amino acid frequencies for the entire proteome, 2) shuffling amino composition in individual proteins, and 3) the sampling of tripeptide content in each protein, with the potential for the resampling of any particular tripeptide in the sequence. Methods 2 and 3 proceed iteratively on a protein-by-protein basis using the composition of each protein in the proteome to generate randomized versions of each sequence. Of the three methods, the third was chosen as the model for further filtering, as it resulted in the largest reduction in overall frequencies of statistically significant tripeptides at each position in the terminal tail [see Additional file <supplr sid="S1">1</supplr>]. Model 2 was the next best method at reducing background noise, with model 1 being least effective.</p>
				<suppl id="S1">
					<title>
						<p>Additional File 1</p>
					</title>
					<text>
						<p>Background reduction in randomization models (<it>A. thaliana</it>)</p>
					</text>
					<file name="1471-2164-8-191-S1.png">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
			<sec>
				<st>
					<p>Protein family prediction</p>
				</st>
				<p>In their 2003 analysis of C-terminal tripeptide frequencies, Gatto &amp; Berg identified several over-represented tripeptides as arising from homology within the C-termini of large protein families <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp>. In such instances, the tallies of individual tripeptides could be exaggerated beyond what is expected by chance. Since our objective was to predict general protein targeting or PTM signals occurring among many diverse proteins, the exclusion of tripeptide counts arising from large homologous protein families was used to lower the high position-specific SOCT frequencies seen in our unfiltered results (see Fig <figr fid="F2">2</figr>), an approach not taken by Bahir &amp; Linial <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. This evaluation of tripeptide frequencies at the level of the protein family instead of the individual protein then allows for the specific assaying for such signatures that occur as genome-wide over-represented signals due to generalized structural or functional requirements in C-terminal biology.</p>
				<p>To determine tripeptide significance levels across protein families, each proteome was first clustered into gene families using our short UNIX shell script <it>famMCL</it>. <it>famMCL </it>performs: 1) an all-against-all <it>BLASTP </it>comparison between proteins in a proteome; 2) parses the <it>BLAST </it>output for bitscore values (cutoff: E &lt; 1e-10); 3) submits an MCL matrix of bitscores to the <it>Markov Clustering Algorithm </it>(<it>MCL</it>) <abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>; and 4) renders the <it>MCL </it>output into a user readable list of gene families. The data output format and interface to the <it>MCL </it>algorithm was modeled after Enright et al.'s work of 2002, using bitscores in place of E-values and adding in an automatic all-by-all blasting routine <abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp>. Comprehensive bitscore parsing of <it>BLASTP </it>output provides for a straightforward implementation with more complete and accurate similarity matrices and overall better cluster approximations. This strategy is used in both the <it>MCL </it>implementation of gene family prediction <it>mclblast </it>as well as in the prediction of clusters of orthologous genes in <it>orthoMCL </it><abbrgrp>
						<abbr bid="B29">29</abbr>
					</abbrgrp>.</p>
				<p>Clustering gene families in this manner, we obtained an average of almost 4000 gene families with 2 or more members for each of the 6 higher eukaryotes, with <it>C. elegans </it>possessing the fewest clusters at 2725, and <it>O. sativa </it>possessing the most at 5452. <it>S. cerevisiae</it>, in accordance with its smaller genome size, possessed considerably fewer predicted protein families at only 749 clusters with 2 or more members [see Additional file <supplr sid="S5">5</supplr>]. In each case, the number of 2 member clusters accounted for approximately 50% of the total cluster number, with <it>S. cerevisiae </it>having the most 2 member clusters (68%) and <it>A. thaliana </it>the least (42%). When gene familes of at least 10 members were considered, the number of gene clusters dropped to 80 and 109 for the plants, 31 and 36 for the lower animals, 52 and 68 for the mammals and 3 for yeast. Individual protein identifiers from within separate clusters were then appended with their annotations to confirm consistencies in functional annotation and to ensure that the algorithm was working correctly (data not shown).</p>
				<suppl id="S5">
					<title>
						<p>Additional File 5</p>
					</title>
					<text>
						<p>famMCL generated protein families in all species</p>
					</text>
					<file name="1471-2164-8-191-S5.zip">
						<p>Click here for file</p>
					</file>
				</suppl>
				<p>The resulting clusters for each species were then used to assess tripeptide over-representation across protein families. Basically, all tripeptides frequencies were assessed in a manner that allowed for only a single tripeptide count to arise from within any single gene family. This measure prevents multiple tripeptide counts due to C-terminal homology in gene families from artificially inflating our tripeptide frequencies and unrealistically skewing our over-representation statistics. Overall, these efforts improved the signal-to-noise ratio considerably; as evident in a significantly reduced number of SOCTs at each C-terminal position (see Fig <figr fid="F2">2</figr>). Additionally, numerous CTAMs were now readily identifiable in the results in all species and the C-terminal biases observed could represent targeting motifs, post-translational modification signals, protein-protein interaction domains or structural tendencies in C-terminal biology such as capping and orientation strategies. This technique of assaying peptide frequencies as they pertain to protein family tendencies would appear an effective measure for the prediction of trends in biological sequence preferences at a genome-wide level and could be adapted to the prediction of protein domains in the protein interior.</p>
			</sec>
			<sec>
				<st>
					<p>Terminal biases persist after aggressive filtering</p>
				</st>
				<p>Our analysis defined the terminal bias as a dramatic rise in the number of statistically over-represented (z &#8805; 3) C-terminal tripeptides (SOCTs) in the last 15 to 20 tripeptide positions of each "C-terminome". In <it>A. thaliana</it>, the filtering of tripeptide tallies using a maximum count of 1 occurrence from separate protein families reduced background levels by approximately 70 SOCTs per position, while simple sequence masking reduced background noise by approximately 10 SOCTs per position (see Fig <figr fid="F2">2</figr>). Comparable results were seen in the other species examined. Interestingly, the ratio of extreme C-terminal SOCT count (-3) to average SOCT counts at positions proximal to the terminal region (-100 to -10) increases with each successive filtering. We believe this reflects, with each round of filtering, a progression from a terminal bias arising due to genome-wide selective pressures in C-terminal residue composition to the most functionally distinct and biologically relevant C-terminal tripeptides.</p>
			</sec>
			<sec>
				<st>
					<p>Eukaryotic protein tails share conserved tripeptides at positions -3 and -4</p>
				</st>
				<p>Overall, our analysis identified numerous statistically over-represented C-terminal tripeptides in all species, the majority of which existed in the C-terminal bias region from -3 to -5 (see Fig <figr fid="F2">2</figr>). Specifically, the number of SOCTs occurring in each species at the extreme terminal end was: <it>A. thaliana</it>, 42; <it>O. sativa</it>, 77; <it>S. cerevisiae</it>, 108; <it>D. melanogaster</it>, 25; <it>C. elegans</it>, 90; <it>M. musculus</it>, 41; and <it>H. sapiens</it>, 45. The elevated levels of SOCTs in worm and yeast may be a result of the smaller genome sizes. It was our assumption that many of these SOCTs would be false positives and since we wished to identify sequences conserved as general biological strategies, individual intersections of SOCT totals were taken between pairs of more closely related species.</p>
				<p>Using our final filtered z-scores for all species, comparisons were made in SOCT conservation between the plants, the lower animals and the mammals (see Fig <figr fid="F3">3</figr>). In each case, similar to the C-terminal bias, the number of SOCTs occurring in each species pair overlapped most frequently in the last 2 tripeptide positions (i.e. positions -3 and -4). Intersections of SOCTs between the two plant species (rice and Arabidopsis) and the two lower animals (fly and worm) showed the presence of several Caax box motifs and the canonical PTS1 consensus of SKL. The ER retention signal [HK]DEL was conspicuously absent from the plant intersections, although this was due to its under-representation in <it>O. sativa</it>. As the <it>O. sativa </it>genome is the least well annotated of all the predicted proteomes, this lack of significance for the ER retention signal is likely an artifact. Indeed, a total of 40 proteins in the <it>O. satvia </it>genome match the ER retention consensus of [KH]DEL. The DEL SOCT was also absent in the lower animals due to its lack of significance in <it>C. elegans</it>. This may represent the presence of an alternate preferred ER retention consensus motif in worms. The 2 mammalian species (human and mouse) possessed Caax motifs, several PTS1 consensus variants and the HDEL form of the ER retention signal, see Table <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>SOCT intersections between species</p>
					</caption>
					<text>
						<p><b>SOCT intersections between species</b>. Intersections of statistically over-represented tripeptides at the C-terminus of <b>A. </b>the two plant species (<it>A. thaliana, O. sativa</it>), <b>B</b>. the two lower animals (<it>C. elegans, D. melanogaster</it>) and <b>C</b>. the two mammalian proteomes (<it>H. sapiens, M. musculus</it>). The SOCT abundance at each C-terminal position is graphed for each species with the the number of commonly occurring SOCTs between the two species depicted with blue boxes.</p>
					</text>
					<graphic file="1471-2164-8-191-3"/>
				</fig>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>SOCT intersections between plants, lower animals and mammals &#8211; z statistics. Final filtered z-statistics for 7 eukaryotic C-terminal tripeptides (z &#8805; 3.0) occurring at positions -3 and -4 and intersected between the two plant species (<it>A. thaliana </it>&#8211; AT, <it>O. sativa </it>&#8211; OS), the two lower animals (<it>C. elegans </it>&#8211; CE,<it>D. melanogaster </it>&#8211; DM), and the mammals (<it>M. musculus </it>&#8211; MM,<it>H. sapiens </it>&#8211; HS). B. Species-specific lists of all SOCTs (z &#8805; 3.0) at the -3 and -4 positions and occurring in at least 10 genes in each respective species proteome. This latter stipulation is provided for the sake of brevity and the reader is referred to the Additional files section for the complete data set [see Additional file <supplr sid="S6">6</supplr>]."</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c ca="left">
								<p><b><it>C. elegans &amp; D. melanogaster</it></b></p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>peptide</b></p>
							</c>
							<c ca="center">
								<p><b>offset</b></p>
							</c>
							<c ca="center">
								<p><b>CE z-score</b></p>
							</c>
							<c ca="center">
								<p><b>DM z-score</b></p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>SKL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>8.4</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>HKY</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.7</p>
							</c>
							<c ca="center">
								<p>3.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>GKK</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5.8</p>
							</c>
							<c ca="center">
								<p>3.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>RRK</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>FNF</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5.1</p>
							</c>
							<c ca="center">
								<p>4.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>KKK</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>10.0</p>
							</c>
							<c ca="center">
								<p>8.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>DSD</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>6.2</p>
							</c>
							<c ca="center">
								<p>3.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>RPW</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.6</p>
							</c>
							<c ca="center">
								<p>3.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>DED</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>HDE</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>4.3</p>
							</c>
							<c ca="center">
								<p>5.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CTI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>4.3</p>
							</c>
							<c ca="center">
								<p>6.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CSI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>6.0</p>
							</c>
							<c ca="center">
								<p>3.9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CVI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>6.9</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b><it>O.sativa &amp; A.thaliana</it></b></p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>peptide</b></p>
							</c>
							<c ca="center">
								<p><b>offset</b></p>
							</c>
							<c ca="center">
								<p><b>OS z-score</b></p>
							</c>
							<c ca="center">
								<p><b>AT z-score</b></p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>SKL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>7.8</p>
							</c>
							<c ca="center">
								<p>6.5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>SIM</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>6.2</p>
							</c>
							<c ca="center">
								<p>4.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>DFM</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.1</p>
							</c>
							<c ca="center">
								<p>4.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>RCC</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>4.2</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>KCP</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5.7</p>
							</c>
							<c ca="center">
								<p>3.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>YRY</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.9</p>
							</c>
							<c ca="center">
								<p>4.9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>FYS</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
							<c ca="center">
								<p>3.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>PKC</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.8</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CTI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>4.7</p>
							</c>
							<c ca="center">
								<p>8.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>WWW</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>4.1</p>
							</c>
							<c ca="center">
								<p>4.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CCI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>6.7</p>
							</c>
							<c ca="center">
								<p>3.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CSI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>9.1</p>
							</c>
							<c ca="center">
								<p>8.1</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b><it>M.musculus &amp; H.sapiens</it></b></p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>peptide</b></p>
							</c>
							<c ca="center">
								<p><b>offset</b></p>
							</c>
							<c ca="center">
								<p><b>MM z-score</b></p>
							</c>
							<c ca="center">
								<p><b>HS z-score</b></p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>THL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>4.2</p>
							</c>
							<c ca="center">
								<p>5.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>DEL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>8.1</p>
							</c>
							<c ca="center">
								<p>7.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>TEL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>4.6</p>
							</c>
							<c ca="center">
								<p>4.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>DEF</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.7</p>
							</c>
							<c ca="center">
								<p>4.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>TRL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>6.2</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>SRK</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.4</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>KKK</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>4.3</p>
							</c>
							<c ca="center">
								<p>3.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>DSD</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>4.5</p>
							</c>
							<c ca="center">
								<p>4.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>SCC</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.7</p>
							</c>
							<c ca="center">
								<p>4.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>YMW</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.1</p>
							</c>
							<c ca="center">
								<p>5.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>TTV</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5.8</p>
							</c>
							<c ca="center">
								<p>7.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>RKK</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5.3</p>
							</c>
							<c ca="center">
								<p>3.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>TKL</b></p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>3.5</p>
							</c>
							<c ca="center">
								<p>5.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>HDE</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>6.5</p>
							</c>
							<c ca="center">
								<p>6.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>FWW</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.1</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CTK</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.1</p>
							</c>
							<c ca="center">
								<p>4.6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>WRP</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>5.4</p>
							</c>
							<c ca="center">
								<p>5.0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CTI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.6</p>
							</c>
							<c ca="center">
								<p>7.7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>RWT</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.3</p>
							</c>
							<c ca="center">
								<p>4.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>QYN</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>ESE</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>3.2</p>
							</c>
							<c ca="center">
								<p>3.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><b>CVI</b></p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>5.1</p>
							</c>
							<c ca="center">
								<p>4.5</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Species-specific lists of all SOCTs. &#8211; Species-specific lists of all SOCTs (z &#8805; 3.0) at the -3 and -4 positions and occurring in at least 10 genes in each respective species proteome. This latter stipulation is provided for the sake of brevity and the reader is referred to the Additional files section for the complete data set [see Additional file 6]<supplr sid="S6"/></p>
					</caption>
					<tblbdy cols="9">
						<r>
							<c ca="left">
								<p><b><it>AT</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>OS</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>OS</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>SC</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>CE</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>CE</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>DM</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>MM</it></b></p>
							</c>
							<c ca="left">
								<p><b><it>HS</it></b></p>
							</c>
						</r>
						<r>
							<c cspan="9">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ADS.3</p>
							</c>
							<c ca="left">
								<p>DIF.3</p>
							</c>
							<c ca="left">
								<p>CSV.4</p>
							</c>
							<c ca="left">
								<p>SKL.3</p>
							</c>
							<c ca="left">
								<p>FGK.3</p>
							</c>
							<c ca="left">
								<p>RFF.3</p>
							</c>
							<c ca="left">
								<p>SKL.3</p>
							</c>
							<c ca="left">
								<p>QSR.3</p>
							</c>
							<c ca="left">
								<p>THL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>KRT.3</p>
							</c>
							<c ca="left">
								<p>QLI.3</p>
							</c>
							<c ca="left">
								<p>SYY.4</p>
							</c>
							<c ca="left">
								<p>LKK.3</p>
							</c>
							<c ca="left">
								<p>SIF.3</p>
							</c>
							<c ca="left">
								<p>FEF.3</p>
							</c>
							<c ca="left">
								<p>TEL.3</p>
							</c>
							<c ca="left">
								<p>TAL.3</p>
							</c>
							<c ca="left">
								<p>DEL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>SKL.3</p>
							</c>
							<c ca="left">
								<p>IFL.3</p>
							</c>
							<c ca="left">
								<p>QFV.4</p>
							</c>
							<c ca="left">
								<p>SKK.3</p>
							</c>
							<c ca="left">
								<p>TRF.3</p>
							</c>
							<c ca="left">
								<p>VKN.3</p>
							</c>
							<c ca="left">
								<p>KSK.3</p>
							</c>
							<c ca="left">
								<p>THL.3</p>
							</c>
							<c ca="left">
								<p>TEL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>DEL.3</p>
							</c>
							<c ca="left">
								<p>SKL.3</p>
							</c>
							<c ca="left">
								<p>KAN.4</p>
							</c>
							<c ca="left">
								<p>KKK.3</p>
							</c>
							<c ca="left">
								<p>PPQ.3</p>
							</c>
							<c ca="left">
								<p>RKK.3</p>
							</c>
							<c ca="left">
								<p>AKL.3</p>
							</c>
							<c ca="left">
								<p>DEL.3</p>
							</c>
							<c ca="left">
								<p>DEF.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>SRL.3</p>
							</c>
							<c ca="left">
								<p>TSN.3</p>
							</c>
							<c ca="left">
								<p>IEE.4</p>
							</c>
							<c ca="left">
								<p>DEL.3</p>
							</c>
							<c ca="left">
								<p>RSL.3</p>
							</c>
							<c ca="left">
								<p>RRF.3</p>
							</c>
							<c ca="left">
								<p>TKS.3</p>
							</c>
							<c ca="left">
								<p>TEL.3</p>
							</c>
							<c ca="left">
								<p>TKK.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>SIM.3</p>
							</c>
							<c ca="left">
								<p>FED.3</p>
							</c>
							<c ca="left">
								<p>HSK.4</p>
							</c>
							<c ca="left">
								<p>AKK.3</p>
							</c>
							<c ca="left">
								<p>QKI.3</p>
							</c>
							<c ca="left">
								<p>RRR.3</p>
							</c>
							<c ca="left">
								<p>GKK.3</p>
							</c>
							<c ca="left">
								<p>GSC.3</p>
							</c>
							<c ca="left">
								<p>TSL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>FTS.3</p>
							</c>
							<c ca="left">
								<p>KIN.3</p>
							</c>
							<c ca="left">
								<p>DQE.4</p>
							</c>
							<c ca="left">
								<p>LSK.3</p>
							</c>
							<c ca="left">
								<p>SKL.3</p>
							</c>
							<c ca="left">
								<p>KSE.3</p>
							</c>
							<c ca="left">
								<p>RRK.3</p>
							</c>
							<c ca="left">
								<p>SKI.3</p>
							</c>
							<c ca="left">
								<p>KSN.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RKR.3</p>
							</c>
							<c ca="left">
								<p>TKN.3</p>
							</c>
							<c ca="left">
								<p>IDK.4</p>
							</c>
							<c ca="left">
								<p>LLK.3</p>
							</c>
							<c ca="left">
								<p>TVE.3</p>
							</c>
							<c ca="left">
								<p>VSS.3</p>
							</c>
							<c ca="left">
								<p>LKK.3</p>
							</c>
							<c ca="left">
								<p>KDI.3</p>
							</c>
							<c ca="left">
								<p>TSV.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>QTL.3</p>
							</c>
							<c ca="left">
								<p>CIL.3</p>
							</c>
							<c ca="left">
								<p>PKK.4</p>
							</c>
							<c ca="left">
								<p>KKK.4</p>
							</c>
							<c ca="left">
								<p>KIN.3</p>
							</c>
							<c ca="left">
								<p>DED.3</p>
							</c>
							<c ca="left">
								<p>KKK.3</p>
							</c>
							<c ca="left">
								<p>GQS.3</p>
							</c>
							<c ca="left">
								<p>TRL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>KQD.3</p>
							</c>
							<c ca="left">
								<p>KGN.3</p>
							</c>
							<c ca="left">
								<p>KII.4</p>
							</c>
							<c ca="left">
								<p>HDE.4</p>
							</c>
							<c ca="left">
								<p>SKK.3</p>
							</c>
							<c ca="left">
								<p>IGK.3</p>
							</c>
							<c ca="left">
								<p>DSD.3</p>
							</c>
							<c ca="left">
								<p>SHL.3</p>
							</c>
							<c ca="left">
								<p>SRK.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>KRR.3</p>
							</c>
							<c ca="left">
								<p>VTS.3</p>
							</c>
							<c ca="left">
								<p>YFL.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>VIN.3</p>
							</c>
							<c ca="left">
								<p>RKL.3</p>
							</c>
							<c ca="left">
								<p>DED.3</p>
							</c>
							<c ca="left">
								<p>ESH.3</p>
							</c>
							<c ca="left">
								<p>KKK.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>HSS.3</p>
							</c>
							<c ca="left">
								<p>SIM.3</p>
							</c>
							<c ca="left">
								<p>DYS.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SKA.3</p>
							</c>
							<c ca="left">
								<p>TNN.3</p>
							</c>
							<c ca="left">
								<p>KAK.3</p>
							</c>
							<c ca="left">
								<p>AAS.3</p>
							</c>
							<c ca="left">
								<p>RRC.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>IFF.3</p>
							</c>
							<c ca="left">
								<p>MGI.3</p>
							</c>
							<c ca="left">
								<p>CSI.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>DEL.3</p>
							</c>
							<c ca="left">
								<p>FSF.3</p>
							</c>
							<c ca="left">
								<p>KIK.3</p>
							</c>
							<c ca="left">
								<p>KTT.3</p>
							</c>
							<c ca="left">
								<p>DSD.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NQS.3</p>
							</c>
							<c ca="left">
								<p>RKN.3</p>
							</c>
							<c ca="left">
								<p>TTV.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>FKF.3</p>
							</c>
							<c ca="left">
								<p>KRK.3</p>
							</c>
							<c ca="left">
								<p>KNK.3</p>
							</c>
							<c ca="left">
								<p>TRL.3</p>
							</c>
							<c ca="left">
								<p>CCA.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PSY.3</p>
							</c>
							<c ca="left">
								<p>SLY.3</p>
							</c>
							<c ca="left">
								<p>TNK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>IKN.3</p>
							</c>
							<c ca="left">
								<p>LFN.3</p>
							</c>
							<c ca="left">
								<p>KRR.4</p>
							</c>
							<c ca="left">
								<p>SNV.3</p>
							</c>
							<c ca="left">
								<p>SCL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TRT.3</p>
							</c>
							<c ca="left">
								<p>FFS.3</p>
							</c>
							<c ca="left">
								<p>SSK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>IKF.3</p>
							</c>
							<c ca="left">
								<p>FGR.4</p>
							</c>
							<c ca="left">
								<p>KSN.4</p>
							</c>
							<c ca="left">
								<p>SRK.3</p>
							</c>
							<c ca="left">
								<p>SCC.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RRH.3</p>
							</c>
							<c ca="left">
								<p>ISF.3</p>
							</c>
							<c ca="left">
								<p>QIR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>DEE.3</p>
							</c>
							<c ca="left">
								<p>GTR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KKK.3</p>
							</c>
							<c ca="left">
								<p>KKN.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TKD.3</p>
							</c>
							<c ca="left">
								<p>SYY.3</p>
							</c>
							<c ca="left">
								<p>FQR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>IEN.3</p>
							</c>
							<c ca="left">
								<p>GSR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>DSD.3</p>
							</c>
							<c ca="left">
								<p>TTV.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>DSD.3</p>
							</c>
							<c ca="left">
								<p>LKH.3</p>
							</c>
							<c ca="left">
								<p>CVI.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>QKF.3</p>
							</c>
							<c ca="left">
								<p>NLK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SCC.3</p>
							</c>
							<c ca="left">
								<p>NHL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NTN.3</p>
							</c>
							<c ca="left">
								<p>KNN.3</p>
							</c>
							<c ca="left">
								<p>YFF.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>DDE.3</p>
							</c>
							<c ca="left">
								<p>KSK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TTV.3</p>
							</c>
							<c ca="left">
								<p>TDV.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TSH.3</p>
							</c>
							<c ca="left">
								<p>TVR.3</p>
							</c>
							<c ca="left">
								<p>KRK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>PSA.3</p>
							</c>
							<c ca="left">
								<p>ISK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>EEL.3</p>
							</c>
							<c ca="left">
								<p>RKK.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RRR.3</p>
							</c>
							<c ca="left">
								<p>EIN.3</p>
							</c>
							<c ca="left">
								<p>LNY.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>FFN.3</p>
							</c>
							<c ca="left">
								<p>DDE.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RKK.3</p>
							</c>
							<c ca="left">
								<p>KTD.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>MSL.3</p>
							</c>
							<c ca="left">
								<p>QQK.3</p>
							</c>
							<c ca="left">
								<p>TSR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SKN.3</p>
							</c>
							<c ca="left">
								<p>YLG.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>QES.3</p>
							</c>
							<c ca="left">
								<p>TKL.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>FYS.3</p>
							</c>
							<c ca="left">
								<p>PKY.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KML.3</p>
							</c>
							<c ca="left">
								<p>VFD.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TKL.3</p>
							</c>
							<c ca="left">
								<p>KRK.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>KPF.3</p>
							</c>
							<c ca="left">
								<p>YKL.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KKI.3</p>
							</c>
							<c ca="left">
								<p>TKK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SFY.3</p>
							</c>
							<c ca="left">
								<p>TSI.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>HDE.4</p>
							</c>
							<c ca="left">
								<p>HFL.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LGP.3</p>
							</c>
							<c ca="left">
								<p>NSK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>HDE.4</p>
							</c>
							<c ca="left">
								<p>TVV.3</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>SNT.4</p>
							</c>
							<c ca="left">
								<p>IPK.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TQF.3</p>
							</c>
							<c ca="left">
								<p>EDS.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RKT.4</p>
							</c>
							<c ca="left">
								<p>HDE.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>SRR.4</p>
							</c>
							<c ca="left">
								<p>HRF.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SRR.3</p>
							</c>
							<c ca="left">
								<p>PIN.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SSM.4</p>
							</c>
							<c ca="left">
								<p>KKA.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>KKQ.4</p>
							</c>
							<c ca="left">
								<p>IQV.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KNN.3</p>
							</c>
							<c ca="left">
								<p>GKK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>FSK.4</p>
							</c>
							<c ca="left">
								<p>ETV.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>CTI.4</p>
							</c>
							<c ca="left">
								<p>IRS.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>QIF.3</p>
							</c>
							<c ca="left">
								<p>SRK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KPK.4</p>
							</c>
							<c ca="left">
								<p>CTI.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RSR.4</p>
							</c>
							<c ca="left">
								<p>NQN.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>IDF.3</p>
							</c>
							<c ca="left">
								<p>KKK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KTD.4</p>
							</c>
							<c ca="left">
								<p>RKI.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>DSD.4</p>
							</c>
							<c ca="left">
								<p>LIN.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SKY.3</p>
							</c>
							<c ca="left">
								<p>SDS.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>ESE.4</p>
							</c>
							<c ca="left">
								<p>ETS.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>CSI.4</p>
							</c>
							<c ca="left">
								<p>KCP.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>PGY.3</p>
							</c>
							<c ca="left">
								<p>KKS.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>ISQ.4</p>
							</c>
							<c ca="left">
								<p>SCC.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PSK.4</p>
							</c>
							<c ca="left">
								<p>KKN.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KKQ.3</p>
							</c>
							<c ca="left">
								<p>NKK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>CVL.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RRR.4</p>
							</c>
							<c ca="left">
								<p>HQS.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TRL.3</p>
							</c>
							<c ca="left">
								<p>KKD.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>ESE.4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ISR.4</p>
							</c>
							<c ca="left">
								<p>RKK.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>IIN.3</p>
							</c>
							<c ca="left">
								<p>ETS.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PPS.4</p>
							</c>
							<c ca="left">
								<p>LKL.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>GKK.3</p>
							</c>
							<c ca="left">
								<p>LRN.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>ISR.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RRK.3</p>
							</c>
							<c ca="left">
								<p>CSI.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SVM.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TDF.3</p>
							</c>
							<c ca="left">
								<p>FKK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LKV.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>FNF.3</p>
							</c>
							<c ca="left">
								<p>KKN.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SDQ.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RNN.3</p>
							</c>
							<c ca="left">
								<p>APG.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LKR.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KKK.3</p>
							</c>
							<c ca="left">
								<p>SSK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>QNV.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RRH.3</p>
							</c>
							<c ca="left">
								<p>RRR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>DKI.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>IRF.3</p>
							</c>
							<c ca="left">
								<p>SFR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KNK.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>TRR.3</p>
							</c>
							<c ca="left">
								<p>INF.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>VGH.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>GRK.3</p>
							</c>
							<c ca="left">
								<p>CVI.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>RHH.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LLH.3</p>
							</c>
							<c ca="left">
								<p>PSN.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>CQL.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LKI.3</p>
							</c>
							<c ca="left">
								<p>SIK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>VAW.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LQN.3</p>
							</c>
							<c ca="left">
								<p>LQK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>PSH.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>DSD.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>MLR.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>SSN.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>MEK.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KKL.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>FFS.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>FKK.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>CAI.4</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>KKN.3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>To examine SOCT co-occurrence across all 7 eukaryotic species, the SOCTs were filtered for statistical prevalence (z &#8805; 3) in at least 2 genomes and for presence in at least 10 proteins within a species. This later stipulation was introduced to remove rare tripeptides possessing significant z-scores due solely to their genome-wide infrequency and not their terminal abundance. A total of 37 SOCTs emerged at the terminal and second-to-last carboxy positions (see Fig <figr fid="F4">4</figr>). To our knowledge, all reported C-terminal anchored motifs reported in the literature are readily identifiable in these data (Fig <figr fid="F4">4</figr>). These are the peroxisomal targeting PTS1 signal, the ER retrieval/retention signals, Caax box prenylation signals and the Rab protein prenylation motif variant. In addition, several SOCTs match to the PTS1 consensus sequence (ACGST/HKLNR/ILMY*) identified by Mullen et al. <abbrgrp>
						<abbr bid="B5">5</abbr>
					</abbrgrp>; and several variants of the Caax box motif were present <abbrgrp>
						<abbr bid="B1">1</abbr>
						<abbr bid="B5">5</abbr>
						<abbr bid="B30">30</abbr>
					</abbrgrp>.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Heatmap of SOCTs intersected across all genomes examined</p>
					</caption>
					<text>
						<p><b>Heatmap of SOCTs intersected across all genomes examined</b>. SOCTs present in at least two species and occurring in at least 10 genes in each proteome represented in two blocks of heatmapped z-scores. Positions for the extreme terminal end (-3) and one position in (-4) are shown on the left and right respectively. SOCTs of interest are sorted in increasing significance row-wise with columns listing the species. Tripeptides matching characterized consensus sequences are highlited. Generated with <it>Heatmapper </it>[55].</p>
					</text>
					<graphic file="1471-2164-8-191-4"/>
				</fig>
				<p>In total, 35% of the multi-species tripeptide signals occurring in the last two terminal positions matched well characterized C-terminal anchored peptide motifs in the literature. As well as known C-terminal signals, a variety of uncharacterised and potentially functionally important motifs were identified. These motifs may represent, as of yet, unidentified sorting signals but may also represent components of generic C-terminal biology ranging from structural strategies to protein-protein interaction and post-translational modification motifs. For a complete list of identified SOCTs the reader is referred to the supplemental data [see Additional file <supplr sid="S6">6</supplr>].</p>
				<suppl id="S6">
					<title>
						<p>Additional File 6</p>
					</title>
					<text>
						<p>All filtered C-terminal z-statistics for all species</p>
					</text>
					<file name="1471-2164-8-191-S6.xls">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>The amino and carboxyl termini of proteins are critical components, uniquely positioned to fill a variety of roles in protein biology. Our study has focused on the prediction and identification of novel protein motifs dependent upon C-terminal proximity for proper function. Characterized protein motifs known to function in this manner are largely involved with protein sorting and lipidation <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>. Using integrative genomics and active filtering at the level of sequence and gene family, we have been able to successfully predict a variety of CTAMs and their consensus variants in 7 eukaryotic genomes.</p>
			<p>Of all resulting SOCTs, the peroxisomal targeting signal SKL was most prominent (see Table <tblr tid="T1">1</tblr>, <tblr tid="T2">2</tblr>; Fig <figr fid="F4">4</figr>). Curiously however, SKL was significantly represented in all species save mammals. Several other PTS1 consensus signals were present in the results and significant only in mammals. These are TKL, THL, TRL and SRK (TRL is also significant in <it>C. elegans)</it>. Although these motifs have been demonstrated as conforming to the PTS1 consensus <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>, it is unknown if their statistical significance represents a true functional PTS1 signal in mammals or possibly a functional preference for Thr among mammalian PTS1 signals. The addition of functional annotation and protein-protein interaction data could help prove or disprove both of these possibilities.</p>
			<p>Given the efficiency with which our analysis was capable of identifying existing C-terminally anchored protein sorting signals, several SOCTs represented across species and within the results of <it>A. thaliana </it>were examined for their potential for targeting sufficiency. Unfortunately, none of the SOCTs tested (KNN, KPF, KRR, DSD, SDSD, SDSDSD) using C-terminal GFP fusions exhibited differential sub-cellular localization from an EGFP:AAA control during transient assays in <it>A. thaliana </it>and <it>N. benthamiana </it>(data not shown). However, other components of a low-level C-terminal protein grammar, such as structural strategies, protein-protein interaction or post-translational modification may be responsible for the high motif frequencies observed in these particular SOCTs.</p>
			<p>The terminal tripeptide DSD was highly significant in all species save the proteomes of rice and yeast and similar in significance level to SKL (see Fig. <figr fid="F4">4</figr>). Moreover, 45% of all proteins possessing a DSD motif in all proteomes examined also possessed the terminal sequence of SDSD. Although Ser-Asp repeats did not seem to play a role in targeting, anti-GFP immunoblotting against constitutively expressing GFP and GFP:SDSDSD transgenic <it>A. thaliana </it>seedlings showed a slowed migration of a GFP:SDSDSD fusion protein [see Additional file <supplr sid="S4">4</supplr>]. This preliminary result suggests a potential PTM on the SDSDSD sequence. It is interesting to note that there is a high tendency for proximal serine and acidic residues in proteins possessing the DSD SOCT. Likewise, there are 11 significantly represented serine acidic tripeptides occurring within the terminal 3 positions across all species. The phosphorylation of the terminal DSD in the tumour suppressing protein p53 is known to influence its ability to bind and linearly diffuse along DNA <abbrgrp>
					<abbr bid="B31">31</abbr>
				</abbrgrp>. Similarly, the serine-acidic high mobility group I (HMG1) domains that occur in the C-terminus of HMG proteins, are known to affect both DNA binding and protein stability <abbrgrp>
					<abbr bid="B32">32</abbr>
				</abbrgrp>. HMG proteins dHMGD and dHMGZ both possess the <it>H. sapiens </it>SOCT ESE. Also of potential interest are the DSD-6 in RNA polymerase II and ESD-8 in topoisomerase II alpha of <it>H. sapiens</it>. Modification of these residues have also been shown to influence DNA binding and protein stability in their respective proteins <abbrgrp>
					<abbr bid="B33">33</abbr>
					<abbr bid="B34">34</abbr>
				</abbrgrp>. Although any similarity between these examples and the DSD SOCT itself is uncertain at best, they are nonetheless interesting considering approximately one quarter of DSD possessing proteins in <it>A. thaliana </it>are functionally annotated (Gene Ontology) as nucleic acid binding [see Additional file <supplr sid="S3">3</supplr>]. It does not appear that the prevalence of DSD is a result of an underlying primary nucleotide sequence preference, as the codons in DSD possessing proteins roughly match the codon preferences for each species. However, DSD does conform to the consensus sequences for the di-acid ER export signal, caspase cleavage recognition signals and the CKII consensus sequence <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>, the latter two of which are frequently C-terminal focused. In any case, it would seem that serine-acidic motifs in the C-termini of eukaryotes are likely functionally active and potentially fulfill a variety of roles such as PTM and signal transduction.</p>
			<suppl id="S3">
				<title>
					<p>Additional File 3</p>
				</title>
				<text>
					<p>GO annotation for DSD possessing genes in <it>A. thaliana</it></p>
				</text>
				<file name="1471-2164-8-191-S3.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S4">
				<title>
					<p>Additional File 4</p>
				</title>
				<text>
					<p>Immunoblotting against EGFP:SDSDSD transgenic <it>A. thaliana</it></p>
				</text>
				<file name="1471-2164-8-191-S4.pdf">
					<p>Click here for file</p>
				</file>
			</suppl>
			<p>An interesting, albeit unexpected result within the SOCT intersections of <it>A. thaliana </it>and <it>O. sativa</it>, was the presence of a highly conserved sequence (FSDENPNA-4) proximal to the Caax motif in a group of iso-prenylated plant metalloproteins [see Additional files <supplr sid="S5">5</supplr> and <supplr sid="S6">6</supplr>]. Although the highly divergent nature of the family prevented this motif from being filtered out, its proximity to a prenylation signal makes this conserved region of special interest. Recent bioinformatics has suggested that residue biases in hydrophobicity exist in sequences proximal to many Caax boxes <abbrgrp>
					<abbr bid="B30">30</abbr>
				</abbrgrp>. Does the Caax proximal sequence play a role in the successful prenylation of these proteins? Based on its degree of conservation, it would appear to be critical to this metalloprotein family's function. There is evidence that the prenylation reaction performed by farnesyltransferase is dependent upon a metal ion nucleophile provided by a metalloprotein cofactor <abbrgrp>
					<abbr bid="B35">35</abbr>
				</abbrgrp>.</p>
			<p>There is a strong presence of Lys among many of the uncharacterised tripeptides at the terminal end of the 7 species "C-terminome". These include: KNK, KNN, KKN, KRK, RRK, KKK, GKK and LKK (see Fig <figr fid="F4">4</figr>). In 2003, Chung et al. proposed the C-terminal lysine preference in yeast was due to capping preferences in protein stability <abbrgrp>
					<abbr bid="B22">22</abbr>
				</abbrgrp>. Di-basic or C-terminal basic residues regulating a proteins trafficking have also been reported. Both the nucleotide receptor P2X7 and the GluR6 kainate receptor possess basic C-terminal tails in which the mutation or deletion of basic residues from the terminus motif disrupted proper protein targeting <abbrgrp>
					<abbr bid="B36">36</abbr>
					<abbr bid="B37">37</abbr>
				</abbrgrp>. Another basic motif involved in targeting is the di-Lys motif at -4, which assists in protein sorting via retrieval of proteins to the ER <abbrgrp>
					<abbr bid="B1">1</abbr>
				</abbrgrp>. The possibility exists that these basic SOCTs reflect a loose consensus for the core residues of a protein-protein interaction domain specific to a class of subcellular targeting chaperones.</p>
			<p>Overall several intriguing patterns in peptide compositional preferences have been identified. Although the present analysis focuses on the C-terminus, it should be noted that an N-terminal examination was run in parallel and similar biases were observed at the N-terminus [see Additional file <supplr sid="S7">7</supplr>]. A couple observations of note in the N-terminal statistically over-represented tripeptides are the high prevalence of alanines at the penultimate position. This agrees with bias tendencies seen in other studies and corresponds to strategies in protein half-life as dictated by the N-end rule <abbrgrp>
					<abbr bid="B38">38</abbr>
				</abbrgrp>. A very prominent motif was the MASS motif, which has been implicated in transcript stability at the codon level <abbrgrp>
					<abbr bid="B39">39</abbr>
				</abbrgrp>. Data obtained from studies at both termini are available on the paper's web-site <abbrgrp>
					<abbr bid="B40">40</abbr>
				</abbrgrp> and are offered to the public for further study [see also Additional files <supplr sid="S6">6</supplr> and <supplr sid="S7">7</supplr>].</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>Several properties of the C-terminal class of anchored motifs make them attractive for <it>ab initio </it>motif discovery. Since the carboxyl group provides a point of reference, C-terminal anchored peptides should appear among peptide frequencies calculated at distinct C-terminal positions <abbrgrp>
					<abbr bid="B18">18</abbr>
				</abbrgrp>. Likewise, their low information content allows for a direct examination of short peptides (tripeptides in this study). These factors greatly simplify probability calculations, as complex considerations for motif size and positioning can be excluded. Additionally, as characterized C-terminal anchored motifs are known to function across a variety of proteins and families, the removal of tripeptide counts from large C-terminal conserved protein families should not affect the significance score of a true motif, but rather should reduce false positives arising from family-specific homology. Indeed, this filter proves most effective in improving the signal-to-noise ratio, as seen in Figure <figr fid="F2">2B</figr>. This integration of C-terminal tripeptide statistics with protein family information, in combination with simple sequence masking and comparative genomics, was successfully applied to the prediction of C-terminal specific motifs <it>ab initio</it>. Given our success in predicting known motifs, the likelihood of novel yet undefined motifs present in the results seems likely. However, among the previously known motifs identified, the majority are widely prevalent with strong significance values. This suggests that any novel uncharacterised signals present in the data may function more specifically or subtly than other confirmed CTAMs present in the analysis.</p>
			<p>Since the C-terminus is a frequent site for protein regulation and is often utilized in recombinant protein experiments, it would seem that C-terminal peptide function will continue to increase in relevance as our knowledge of its biological importance progresses. The novel SOCTs identified in our analysis may represent C-terminal peptide motifs functioning in biological roles ranging from protein sorting, post-translational modification or capping and structural strategies. However, based on the prominence of known targeting signals and the lack of novel SOCTs with a distinct pattern, any protein sorting motifs that remain to be characterized are likely to be conserved to a small number of protein families, exhibit species-specific functionality or possess a considerable degree of degeneracy. Overall, our results appear to depict a highly accurate representation of the statistical topography of the "C-terminome" and the methodology could be adapted to protein motif prediction efforts in the protein interior.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<p>Prior to statistical analysis, each predicted proteome was clustered into protein families using the shell script <it>famMCL </it>and masked for simple sequence stretches using the program <it>seg </it><abbrgrp>
					<abbr bid="B25">25</abbr>
				</abbrgrp>. Mean and standard deviation values, derived from randomized sets for each species, were used to calculate individual z-scores for each possible tripeptide at each position from the extreme C-terminal position to 100 residues in from the carboxyl group. This yielded a comprehensive collection of 776,000 C-terminal z-statistics (8000 possible tripeptides &#215; 97 positions: -100 to -3). Results were then intersected between species proteomes, to test for the presence of SOCTs (z &#8805; 3) in at least 2 species and tripeptide presence in at least 10 different proteins within each respective proteome.</p>
			<sec>
				<st>
					<p>Datasets</p>
				</st>
				<p>Translated datasets for each species were obtained in fasta format. All datasets were downloaded from NCBI with the exceptions of <it>A. thaliana </it>which was obtained from TAIR and <it>O. sativa</it>, which was downloaded from TIGR. <it>O. sativa </it>was downloaded in conjunction with a list of accessions corresponding to transposable elements. This list was then used to filter out transposable elements from the protein summary file with a short shell script.</p>
				<p><it>A. thaliana </it>downloaded from TAIR as ATH1_pep_cm_20040228. <abbrgrp>
						<abbr bid="B41">41</abbr>
					</abbrgrp></p>
				<p><it>C. elegans </it>by chromosome translated faa. <abbrgrp>
						<abbr bid="B42">42</abbr>
					</abbrgrp></p>
				<p><it>D. melanogaster </it>by chromosome translated faa. <abbrgrp>
						<abbr bid="B43">43</abbr>
					</abbrgrp></p>
				<p><it>H. sapiens </it>by protein summary. <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp></p>
				<p><it>M. musculus </it>by protein summary. <abbrgrp>
						<abbr bid="B45">45</abbr>
					</abbrgrp></p>
				<p><it>S. cerevisiae </it>by chromosome translated faa file. <abbrgrp>
						<abbr bid="B46">46</abbr>
					</abbrgrp></p>
				<p><it>O. sativa </it>by protein summary. <abbrgrp>
						<abbr bid="B47">47</abbr>
					</abbrgrp></p>
				<p><it>O. sativa </it>transposable element list by accession for the filtering of above. <abbrgrp>
						<abbr bid="B48">48</abbr>
					</abbrgrp></p>
			</sec>
			<sec>
				<st>
					<p>Proteome randomization</p>
				</st>
				<p>To generate a collection of randomized fasta sequences, the program <it>fastarand </it>was written in the C programming language <abbrgrp>
						<abbr bid="B49">49</abbr>
					</abbrgrp>. Given a fasta formatted file, <it>fastarand </it>will create an equal size fasta file on a sequence by sequence basis using one of three randomization models.</p>
				<p>1) shuffle the amino acids within each protein in the file</p>
				<p>2) generate each sequence based on the amino acid frequencies in the entire proteome</p>
				<p>3) resample nmers from the query protein until an equal length protein is reached</p>
				<p>The user is able to specify how many randomized proteomes are to be created with a commandline flag (set to 100 in our analysis). Model 3 using 3mers was employed in our study.</p>
			</sec>
			<sec>
				<st>
					<p>Protein family classification</p>
				</st>
				<p>Each species proteome was clustered into gene families using the short shell script <it>famMCL</it>. <it>famMCL </it>should compile on any POSIX based system and depends upon a functional installation of NCBI standalone blast <abbrgrp>
						<abbr bid="B50">50</abbr>
					</abbrgrp>, and the <it>MCL </it>clustering algorithm <abbrgrp>
						<abbr bid="B51">51</abbr>
					</abbrgrp>. <it>famMCL </it>and its supporting documentation are available under the GPL <abbrgrp>
						<abbr bid="B52">52</abbr>
					</abbrgrp>.</p>
				<p><it>famMCL </it>performs an all-by-all <it>BLASTP </it>of the provided proteome using the concise output format option of the NCBI standalone <it>BLAST </it>program (-m 8). All bit scores for individual protein comparisons are parsed from the <it>BLAST </it>results to produce an <it>MCL </it>format matrix that is submitted to the <it>MCL </it>clustering algorithm. The resulting <it>MCL </it>output is then parsed to generate lists of protein families by accession and corresponding cluster number. A <it>BLAST </it>similarity cutoff of E &#8804; 1e-10 and the default <it>MCL </it>granularity were used. Our strategy for implementation resulted in little to no variation in cluster composition in successive runs over the recommended range of <it>MCL </it>granularity settings and fluctuations in cluster size and composition within <it>famMCL </it>were found to be primarily dependent upon selection of the E-value cutoff. As an all-by-all genome <it>BLAST </it>is computationally intensive, it was performed on an openSSI parallelized cluster of workstations running Debian GNU Linux at the Botany Bioinformatics Cluster in the Department of Cell &amp; Systems Biology, University of Toronto.</p>
			</sec>
			<sec>
				<st>
					<p>The SOCT pipeline</p>
				</st>
				<p>The program <it>tripepper </it>was written in the C programming language to determine mean and standard deviation background statistics for respective tripeptides from 100 <it>fastarand </it>randomized proteomes. The PERL script <it>cluster adjuster </it>was then used to calculate position specific tripeptide counts for each simple sequence masked proteome and adjust these counts by subtracting duplicate tallies arising from members of a common gene family as derived from <it>famMCL</it>. These programs collectively constitute the SOCT pipeline software <abbrgrp>
						<abbr bid="B53">53</abbr>
					</abbrgrp> and return a comprehensive set of z-statistics for all 8000 possible tripeptides at each position from the terminal end (-3) to 97 positions in from the carboxy-terminus (-100). Protein family information is used to reduce tripeptide counts artificially inflated due to homologous gene clusters, using a penalized z-statistic as calculated by:</p>
				<p><display-formula>&#931;<sub>i</sub>&#931;<sub>j </sub>Z<sub>&#955; </sub>= (<it>k</it>
						<sub>ij </sub>- <it>x</it>
						<sub>ij </sub>- &#955;<sub>ij</sub>)/<it>s</it>
						<sub>ij</sub>
					</display-formula></p>
				<p>where i is a tripeptide permutation between AAA and YYY, j is the position in from the C-terminus (-3 to -100), <it>k </it>is the number of counts for the tripeptide in the masked proteome, &#955; is the number of duplicate tripeptide counts due to a common gene family and <it>x </it>and <it>s </it>are the mean and standard deviation respectively for occurrences of tripeptide<sub>ij </sub>across 100 randomized proteomes.</p>
				<p>The program <it>tripepper </it>was given a <it>fastarand </it>generated directory of randomized proteomes and a corresponding proteome with stretches of simple sequences masked out. Masked proteomes were created by running <it>seg </it>at default settings <abbrgrp>
						<abbr bid="B25">25</abbr>
					</abbrgrp>. Note that the randomized sets are not masked and that the masked proteome as produced by <it>seg </it>is used to replace the original proteome read by tripepper. Protein clusters as determined by <it>famMCL </it>were then input to the Perl program <it>cluster adjuster</it>, which uses the <it>tripepper </it>results and the masked proteome to adjust total tripeptides counts by the number of common family occurrences and produce a set of final penalized z-scores.</p>
			</sec>
			<sec>
				<st>
					<p>Data integration, filtering and visualization of raw data</p>
				</st>
				<p>Comprehensive z-statistics generated for each eukaryotic proteome were processed using common UNIX shell scripting tools (e.g.: <it>grep</it>, <it>sed</it>, <it>awk</it>) to identify all significant tripeptides (z-score &#8805; 3) occurring in at least 2 species and present in at least 10 genes in each proteome. These data were then analysed using the open source plotting program <it>gnuplot </it><abbrgrp>
						<abbr bid="B54">54</abbr>
					</abbrgrp> or fed to the web-based application <it>Heatmapper </it><abbrgrp>
						<abbr bid="B55">55</abbr>
					</abbrgrp> to generate a z-score based visual heatmap of all intersecting SOCTs (see Fig <figr fid="F4">4</figr>).</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Abbreviations</p>
			</st>
			<p>CTAM &#8211; Carboxy-terminal anchored motif</p>
			<p>SOCT &#8211; Statistically Over-represented Carboxy-terminal Tripeptide</p>
			<p>PTM &#8211; Post-translational modification</p>
			<p>GPL &#8211; Gnu General Public Licence</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>SRC conceived of the concept, RSA performed the analyses under the supervision of SRC and NJP. RSA wrote the manuscript, which was edited by RSA, NJP and SRC. All authors read and approved the final manuscript.</p>
		</sec>
	</bdy>
   <bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>NJP and SRC are supported by grants from NSERC. RSA is funded in part by a University of Toronto fellowship. The Botany Beowulf Cluster was funded by a Genome Canada grant administered through the Ontario Genomics Institute.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Functional diversity of protein C-termini: more than zipcoding?</p>
				</title>
				<aug>
					<au>
						<snm>Chung</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Shikano</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Hanyu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Trends Cell Biol</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>146</fpage>
				<lpage>150</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0962-8924(01)02241-3</pubid>
						<pubid idtype="pmpid" link="fulltext">11859027</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Protein prenylation: molecular mechanisms and functional consequences</p>
				</title>
				<aug>
					<au>
						<snm>Zhang</snm>
						<fnm>FL</fnm>
					</au>
					<au>
						<snm>Casey</snm>
						<fnm>PJ</fnm>
					</au>
				</aug>
				<source>Annu Rev Biochem</source>
				<pubdate>1996</pubdate>
				<volume>65</volume>
				<fpage>241</fpage>
				<lpage>269</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.bi.65.070196.001325</pubid>
						<pubid idtype="pmpid" link="fulltext">8811180</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Opinion: peroxisomal-protein import: is it really that complex?</p>
				</title>
				<aug>
					<au>
						<snm>Gould</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Collins</snm>
						<fnm>CS</fnm>
					</au>
				</aug>
				<source>Nat Rev Mol Cell Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>382</fpage>
				<lpage>389</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrm807</pubid>
						<pubid idtype="pmpid" link="fulltext">11988772</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Signal-mediated sorting of membrane proteins between the endoplasmic reticulum and the golgi apparatus</p>
				</title>
				<aug>
					<au>
						<snm>Teasdale</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Jackson</snm>
						<fnm>MR</fnm>
					</au>
				</aug>
				<source>Annu Rev Cell Dev Biol</source>
				<pubdate>1996</pubdate>
				<volume>12</volume>
				<fpage>27</fpage>
				<lpage>54</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.cellbio.12.1.27</pubid>
						<pubid idtype="pmpid">8970721</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Diverse amino acid residues function within the type 1 peroxisomal targeting signal. Implications for the role of accessory residues upstream of the type 1 peroxisomal targeting signal</p>
				</title>
				<aug>
					<au>
						<snm>Mullen</snm>
						<fnm>RT</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Flynn</snm>
						<fnm>CR</fnm>
					</au>
					<au>
						<snm>Trelease</snm>
						<fnm>RN</fnm>
					</au>
				</aug>
				<source>Plant Physiol</source>
				<pubdate>1997</pubdate>
				<volume>115</volume>
				<fpage>881</fpage>
				<lpage>889</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">158551</pubid>
						<pubid idtype="pmpid" link="fulltext">9390426</pubid>
						<pubid idtype="doi">10.1104/pp.115.3.881</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>InterProScan--an integration platform for the signature-recognition methods in InterPro</p>
				</title>
				<aug>
					<au>
						<snm>Zdobnov</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Apweiler</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>847</fpage>
				<lpage>848</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/17.9.847</pubid>
						<pubid idtype="pmpid" link="fulltext">11590104</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Progress in protein structural class prediction and its impact to bioinformatics and proteomics</p>
				</title>
				<aug>
					<au>
						<snm>Chou</snm>
						<fnm>KC</fnm>
					</au>
				</aug>
				<source>Curr Protein Pept Sci</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>423</fpage>
				<lpage>436</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.2174/138920305774329368</pubid>
						<pubid idtype="pmpid" link="fulltext">16248794</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Integrative approach for computationally inferring protein domain interactions</p>
				</title>
				<aug>
					<au>
						<snm>Ng</snm>
						<fnm>SK</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Tan</snm>
						<fnm>SH</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<fpage>923</fpage>
				<lpage>929</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg118</pubid>
						<pubid idtype="pmpid" link="fulltext">12761053</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Recent developments in structural proteomics for protein structure determination</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>HL</fnm>
					</au>
					<au>
						<snm>Hsu</snm>
						<fnm>JP</fnm>
					</au>
				</aug>
				<source>Proteomics</source>
				<pubdate>2005</pubdate>
				<volume>5</volume>
				<fpage>2056</fpage>
				<lpage>2068</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/pmic.200401104</pubid>
						<pubid idtype="pmpid" link="fulltext">15846841</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Availability of short amino acid sequences in proteins</p>
				</title>
				<aug>
					<au>
						<snm>Otaki</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Ienaka</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gotoh</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Yamamoto</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Protein Sci</source>
				<pubdate>2005</pubdate>
				<volume>14</volume>
				<fpage>617</fpage>
				<lpage>625</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1110/ps.041092605</pubid>
						<pubid idtype="pmpid" link="fulltext">15689510</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes</p>
				</title>
				<aug>
					<au>
						<snm>Karlin</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>1990</pubdate>
				<volume>87</volume>
				<fpage>2264</fpage>
				<lpage>2268</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">53667</pubid>
						<pubid idtype="pmpid" link="fulltext">2315319</pubid>
						<pubid idtype="doi">10.1073/pnas.87.6.2264</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>PlantCARE, a plant cis-acting regulatory element database</p>
				</title>
				<aug>
					<au>
						<snm>Rombauts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Dehais</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Van Montagu</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rouze</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>295</fpage>
				<lpage>296</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148162</pubid>
						<pubid idtype="pmpid" link="fulltext">9847207</pubid>
						<pubid idtype="doi">10.1093/nar/27.1.295</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Computational approaches to identify promoters and cis-regulatory elements in plant genomes</p>
				</title>
				<aug>
					<au>
						<snm>Rombauts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Florquin</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Lescot</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Rouze</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>van de Peer</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Plant Physiol</source>
				<pubdate>2003</pubdate>
				<volume>132</volume>
				<fpage>1162</fpage>
				<lpage>1176</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">167057</pubid>
						<pubid idtype="pmpid" link="fulltext">12857799</pubid>
						<pubid idtype="doi">10.1104/pp.102.017715</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Detecting DNA regulatory motifs by incorporating positional trends in information content</p>
				</title>
				<aug>
					<au>
						<snm>Kechris</snm>
						<fnm>KJ</fnm>
					</au>
					<au>
						<snm>van Zwet</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Bickel</snm>
						<fnm>PJ</fnm>
					</au>
					<au>
						<snm>Eisen</snm>
						<fnm>MB</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R50</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">463320</pubid>
						<pubid idtype="pmpid" link="fulltext">15239835</pubid>
						<pubid idtype="doi">10.1186/gb-2004-5-7-r50</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Assessing computational tools for the discovery of transcription factor binding sites</p>
				</title>
				<aug>
					<au>
						<snm>Tompa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Bailey</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Eskin</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Favorov</snm>
						<fnm>AV</fnm>
					</au>
					<au>
						<snm>Frith</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Fu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Makeev</snm>
						<fnm>VJ</fnm>
					</au>
					<au>
						<snm>Mironov</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Noble</snm>
						<fnm>WS</fnm>
					</au>
					<au>
						<snm>Pavesi</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Pesole</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Regnier</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Simonis</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Sinha</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>van Helden</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Vandenbogaert</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Weng</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Workman</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ye</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Zhu</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2005</pubdate>
				<volume>23</volume>
				<fpage>137</fpage>
				<lpage>144</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt1053</pubid>
						<pubid idtype="pmpid" link="fulltext">15637633</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Genomic strategies to identify mammalian regulatory sequences</p>
				</title>
				<aug>
					<au>
						<snm>Pennacchio</snm>
						<fnm>LA</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>100</fpage>
				<lpage>109</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35052548</pubid>
						<pubid idtype="pmpid" link="fulltext">11253049</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>COOH-terminal decamers in proteins are non-random</p>
				</title>
				<aug>
					<au>
						<snm>Berezovsky</snm>
						<fnm>IN</fnm>
					</au>
					<au>
						<snm>Kilosanidze</snm>
						<fnm>GT</fnm>
					</au>
					<au>
						<snm>Tumanyan</snm>
						<fnm>VG</fnm>
					</au>
					<au>
						<snm>Kisselev</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>1997</pubdate>
				<volume>404</volume>
				<fpage>140</fpage>
				<lpage>142</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(97)00112-9</pubid>
						<pubid idtype="pmpid" link="fulltext">9119051</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Nonrandom tripeptide sequence distributions at protein carboxyl termini</p>
				</title>
				<aug>
					<au>
						<snm>Gatto</snm>
						<fnm>GJ</fnm>
						<suf>Jr.</suf>
					</au>
					<au>
						<snm>Berg</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>617</fpage>
				<lpage>623</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">430173</pubid>
						<pubid idtype="pmpid" link="fulltext">12671002</pubid>
						<pubid idtype="doi">10.1101/gr.667603</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Amino acid composition of protein termini are biased in different manners</p>
				</title>
				<aug>
					<au>
						<snm>Berezovsky</snm>
						<fnm>IN</fnm>
					</au>
					<au>
						<snm>Kilosanidze</snm>
						<fnm>GT</fnm>
					</au>
					<au>
						<snm>Tumanyan</snm>
						<fnm>VG</fnm>
					</au>
					<au>
						<snm>Kisselev</snm>
						<fnm>LL</fnm>
					</au>
				</aug>
				<source>Protein Eng</source>
				<pubdate>1999</pubdate>
				<volume>12</volume>
				<fpage>23</fpage>
				<lpage>30</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/protein/12.1.23</pubid>
						<pubid idtype="pmpid" link="fulltext">10065707</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Comparative analysis of the base biases at the gene terminal portions in seven eukaryote genomes</p>
				</title>
				<aug>
					<au>
						<snm>Niimura</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Terabe</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gojobori</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Miura</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>5195</fpage>
				<lpage>5201</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">212801</pubid>
						<pubid idtype="pmpid" link="fulltext">12930971</pubid>
						<pubid idtype="doi">10.1093/nar/gkg701</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Genome wide analysis of Arabidopsis core promoters</p>
				</title>
				<aug>
					<au>
						<snm>Molina</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Grotewold</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>BMC Genomics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>25</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">554773</pubid>
						<pubid idtype="pmpid" link="fulltext">15733318</pubid>
						<pubid idtype="doi">10.1186/1471-2164-6-25</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Genome-wide Analyses of Carboxyl-terminal Sequences</p>
				</title>
				<aug>
					<au>
						<snm>Chung</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Cell Proteomics</source>
				<pubdate>2003</pubdate>
				<volume>2</volume>
				<fpage>173</fpage>
				<lpage>181</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/mcp.M300008-MCP200</pubid>
						<pubid idtype="pmpid" link="fulltext">12682279</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Functional grouping based on signatures in protein termini</p>
				</title>
				<aug>
					<au>
						<snm>Bahir</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Linial</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Proteins: Structure, Function, and Bioinformatics</source>
				<pubdate>2006</pubdate>
				<volume>63</volume>
				<issue>4</issue>
				<fpage>996</fpage>
				<lpage>1004</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/prot.20903</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Terminal residues in protein chains: residue preference, conformation, and interaction</p>
				</title>
				<aug>
					<au>
						<snm>Pal</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Chakrabarti</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Biopolymers</source>
				<pubdate>2000</pubdate>
				<volume>53</volume>
				<fpage>467</fpage>
				<lpage>475</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/(SICI)1097-0282(200005)53:6&lt;467::AID-BIP3>3.0.CO;2-9</pubid>
						<pubid idtype="pmpid" link="fulltext">10775062</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Analysis of compositionally biased regions in sequence databases</p>
				</title>
				<aug>
					<au>
						<snm>Wootton</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Federhen</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1996</pubdate>
				<volume>266</volume>
				<fpage>554</fpage>
				<lpage>571</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8743706</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Graph clustering by flow simulation.</p>
				</title>
				<aug>
					<au>
						<snm>van Dongen</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Mathematics and Computer Science</source>
				<publisher>Utrech, University of Utrecht, The Netherlands</publisher>
				<pubdate>2000</pubdate>
			</bibl>
			<bibl id="B28">
				<title>
					<p>An efficient algorithm for large-scale detection of protein families</p>
				</title>
				<aug>
					<au>
						<snm>Enright</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Van Dongen</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>1575</fpage>
				<lpage>1584</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">101833</pubid>
						<pubid idtype="pmpid" link="fulltext">11917018</pubid>
						<pubid idtype="doi">10.1093/nar/30.7.1575</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>OrthoMCL: identification of ortholog groups for eukaryotic genomes</p>
				</title>
				<aug>
					<au>
						<snm>Li</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Stoeckert</snm>
						<fnm>CJ</fnm>
						<suf>Jr.</suf>
					</au>
					<au>
						<snm>Roos</snm>
						<fnm>DS</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>2178</fpage>
				<lpage>2189</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403725</pubid>
						<pubid idtype="pmpid" link="fulltext">12952885</pubid>
						<pubid idtype="doi">10.1101/gr.1224503</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Refinement and prediction of protein prenylation motifs</p>
				</title>
				<aug>
					<au>
						<snm>Maurer-Stroh</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Eisenhaber</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>R55</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1175975</pubid>
						<pubid idtype="pmpid" link="fulltext">15960807</pubid>
						<pubid idtype="doi">10.1186/gb-2005-6-6-r55</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>p53 linear diffusion along DNA requires its C terminus</p>
				</title>
				<aug>
					<au>
						<snm>McKinney</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Mattia</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gottifredi</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Prives</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Mol Cell</source>
				<pubdate>2004</pubdate>
				<volume>16</volume>
				<fpage>413</fpage>
				<lpage>424</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.molcel.2004.09.032</pubid>
						<pubid idtype="pmpid" link="fulltext">15525514</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Constitutive phosphorylation of the acidic tails of the high mobility group 1 proteins by casein kinase II alters their conformation, stability, and DNA binding specificity</p>
				</title>
				<aug>
					<au>
						<snm>Wisniewski</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Szewczuk</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Petry</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Schwanbeck</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Renner</snm>
						<fnm>U</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1999</pubdate>
				<volume>274</volume>
				<fpage>20116</fpage>
				<lpage>20122</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.274.40.28175</pubid>
						<pubid idtype="pmpid" link="fulltext">10400623</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>The last CTD repeat of the mammalian RNA polymerase II large subunit is important for its stability</p>
				</title>
				<aug>
					<au>
						<snm>Chapman</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Palancade</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Lang</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Bensaude</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Eick</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>35</fpage>
				<lpage>44</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">373282</pubid>
						<pubid idtype="pmpid" link="fulltext">14704341</pubid>
						<pubid idtype="doi">10.1093/nar/gkh172</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Casein kinase II stabilizes the activity of human topoisomerase IIalpha in a phosphorylation-independent manner</p>
				</title>
				<aug>
					<au>
						<snm>Redwood</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Davies</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Wells</snm>
						<fnm>NJ</fnm>
					</au>
					<au>
						<snm>Fry</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Hickson</snm>
						<fnm>ID</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1998</pubdate>
				<volume>273</volume>
				<fpage>3635</fpage>
				<lpage>3642</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.273.6.3635</pubid>
						<pubid idtype="pmpid" link="fulltext">9452492</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Role of metals in the reaction catalyzed by protein farnesyltransferase</p>
				</title>
				<aug>
					<au>
						<snm>Saderholm</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Hightower</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Fierke</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2000</pubdate>
				<volume>39</volume>
				<fpage>12398</fpage>
				<lpage>12405</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/bi0011781</pubid>
						<pubid idtype="pmpid" link="fulltext">11015220</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>A C-terminal determinant of GluR6 kainate receptor trafficking</p>
				</title>
				<aug>
					<au>
						<snm>Yan</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Sanders</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Xu</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Contractor</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Swanson</snm>
						<fnm>GT</fnm>
					</au>
				</aug>
				<source>J Neurosci</source>
				<pubdate>2004</pubdate>
				<volume>24</volume>
				<fpage>679</fpage>
				<lpage>691</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1523/JNEUROSCI.4985-03.2004</pubid>
						<pubid idtype="pmpid" link="fulltext">14736854</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Mutation of a dibasic amino acid motif within the C terminus of the P2X7 nucleotide receptor results in trafficking defects and impaired function</p>
				</title>
				<aug>
					<au>
						<snm>Denlinger</snm>
						<fnm>LC</fnm>
					</au>
					<au>
						<snm>Sommer</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Parker</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Gudipaty</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Fisette</snm>
						<fnm>PL</fnm>
					</au>
					<au>
						<snm>Watters</snm>
						<fnm>JW</fnm>
					</au>
					<au>
						<snm>Proctor</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Dubyak</snm>
						<fnm>GR</fnm>
					</au>
					<au>
						<snm>Bertics</snm>
						<fnm>PJ</fnm>
					</au>
				</aug>
				<source>J Immunol</source>
				<pubdate>2003</pubdate>
				<volume>171</volume>
				<fpage>1304</fpage>
				<lpage>1311</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12874219</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>The N-end rule pathway of protein degradation</p>
				</title>
				<aug>
					<au>
						<snm>Varshavsky</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genes Cells</source>
				<pubdate>1997</pubdate>
				<volume>2</volume>
				<fpage>13</fpage>
				<lpage>28</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2443.1997.1020301.x</pubid>
						<pubid idtype="pmpid">9112437</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Sequence architecture downstream of the initiator codon enhances gene expression and protein stability in plants</p>
				</title>
				<aug>
					<au>
						<snm>Sawant</snm>
						<fnm>SV</fnm>
					</au>
					<au>
						<snm>Kiran</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Singh</snm>
						<fnm>PK</fnm>
					</au>
					<au>
						<snm>Tuli</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Plant Physiol</source>
				<pubdate>2001</pubdate>
				<volume>126</volume>
				<fpage>1630</fpage>
				<lpage>1636</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">117162</pubid>
						<pubid idtype="pmpid" link="fulltext">11500561</pubid>
						<pubid idtype="doi">10.1104/pp.126.4.1630</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>SOCT Website</p>
				</title>
				<url>http://bbc.botany.utoronto.ca/~raustin/soct</url>
			</bibl>
			<bibl id="B41">
				<title>
					<p>A. thaliana FTP site</p>
				</title>
				<url>ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/</url>
			</bibl>
			<bibl id="B42">
				<title>
					<p>C.elegans FTP site</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/genomes/Caenorhabditis_elegans/</url>
			</bibl>
			<bibl id="B43">
				<title>
					<p>D. melanogaster FTP site</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/genomes/Drosophila_melanogaster/</url>
			</bibl>
			<bibl id="B44">
				<title>
					<p>H. sapiens FTP site</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/protein/</url>
			</bibl>
			<bibl id="B45">
				<title>
					<p>M. musculus FTP site</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/genomes/M_musculus/protein/</url>
			</bibl>
			<bibl id="B46">
				<title>
					<p>S. cerevisiae FTP site</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/genomes/Fungi/Saccharomyces_cerevisiae/</url>
			</bibl>
			<bibl id="B47">
				<title>
					<p>O. sativa FTP site</p>
				</title>
				<url>ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_3.0/all_chrs.tar.gz</url>
			</bibl>
			<bibl id="B48">
				<title>
					<p>O. sativa TE elements list</p>
				</title>
				<url>ftp://ftp.tigr.org/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/pseudomolecules/version_3.0/archive/all_chrs/all.TE-related.gz</url>
			</bibl>
			<bibl id="B49">
				<title>
					<p>fastarand</p>
				</title>
				<url>http://bbc.botany.utoronto.ca/~raustin/soct/fastarand.tgz</url>
			</bibl>
			<bibl id="B50">
				<title>
					<p>NCBI Stand Alone BLAST</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/blast/executables/LATEST</url>
			</bibl>
			<bibl id="B51">
				<title>
					<p>MCL</p>
				</title>
				<url>http://micans.org/mcl</url>
			</bibl>
			<bibl id="B52">
				<title>
					<p>famMCL</p>
				</title>
				<url>http://bbc.botany.utoronto.ca/~raustin/soct/famMCL.tgz</url>
			</bibl>
			<bibl id="B53">
				<title>
					<p>SOCT</p>
				</title>
				<url>http://bbc.botany.utoronto.ca/~raustin/soct/soct.tgz</url>
			</bibl>
			<bibl id="B54">
				<title>
					<p>Gnuplot</p>
				</title>
				<url>http://www.gnuplot.org</url>
			</bibl>
			<bibl id="B55">
				<title>
					<p>Heatmapper</p>
				</title>
				<url>http://bbc.botany.utoronto.ca/ntools/heatmapper.cgi</url>
			</bibl>
		</refgrp>
	</bm>
</art>
