<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-3-21</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>The SGS3 protein involved in PTGS finds a family</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Bateman</snm>
               <fnm>Alex</fnm>
               <insr iid="I1"/>
               <email>agb@sanger.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2002</pubdate>
         <volume>3</volume>
         <issue>1</issue>
         <fpage>21</fpage>
         <url>http://www.biomedcentral.com/1471-2105/3/21</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-2105-3-21</pubid>
               <pubid idtype="pmpid">12162795</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>11</day>
               <month>7</month>
               <year>2002</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>5</day>
               <month>8</month>
               <year>2002</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>5</day>
               <month>8</month>
               <year>2002</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2002</year>
         <collab>Bateman; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Post transcriptional gene silencing (PTGS) is a recently discovered phenomenon that is an area of intense research interest. Components of the PTGS machinery are being discovered by genetic and bioinformatics approaches, but the picture is not yet complete.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The gene for the PTGS impaired Arabidopsis mutant <it>sgs3</it> was recently cloned and was not found to have similarity to any other known protein. By a detailed analysis of the sequence of SGS3 we have defined three new protein domains: the XH domain, the XS domain and the zf-XS domain, that are shared with a large family of uncharacterised plant proteins. This work implicates these plant proteins in PTGS.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The enigmatic SGS3 protein has been found to contain two predicted domains in common with a family of plant proteins. The other members of this family have been predicted to be transcription factors, however this function seems unlikely based on this analysis. A bioinformatics approach has implicated a new family of plant proteins related to SGS3 as potential candidates for PTGS related functions.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Post transcriptional gene silencing (PTGS) is a recently discovered phenomenon <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The components of PTGS are being cloned and experiment combined with sequence analysis is helping to elucidate its mechanisms. Study of PTGS is providing links between diverse biological processes such as defence against viruses, RNA metabolism <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp> and development <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The gene for the PTGS impaired Arabidopsis mutant <it>sgs3</it> was recently cloned <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. An initial analysis of the protein did not reveal any motifs, domains or similarity to any other protein. To help shed light on the function of SGS3 a more detailed analysis of the protein has been carried out.'</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>After initial PSI-BLAST searches with the sequence of SGS3, weak matches were found to a number of plant proteins. Reciprocal matches can often verify the significance of weak matches. Using residues 85 to 225 of a weakly matching <it>Sorghum bicolor</it> protein (SWISSPROT accession O48878) as a PSI-BLAST <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> query at the NCBI site, using an inclusion E-value of 0.002, SGS3 was found in the second round with an E-value of 0.001. This search also found a number of other plant proteins including the rice gene X product (also known as gene X1) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>I have termed the main region of similarity the XS domain after 'rice gene <ul>X</ul> and <ul>S</ul>GS3'. This presumed domain is around 140 amino acid residues in length (see figure <figr fid="F1">1</figr>). The XS domain contains a completey conserved aspartate that could suggest an enzymatic active site. Prediction of the secondary structure using the Jnet server <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> suggests a mixed alpha and beta structure.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Multiple sequence alignments of the XS domain.</p>
            </caption>
            <text>
               <p><b>Multiple sequence alignments of the XS domain.</b> Figures have been generated using the Jalview program written by Michele Clamp. The protein identifiers are given as name_accession number_species/start-end. The five letter species designations are those used in SWISS-PROT. Alignments are colored using the ClustalX scheme in Jalview (orange: glycine (G); gold: Proline (P); blue: small and hydrophobic amino-acids (A, V, L, I, M, F, W); green: hydroxyl and amine amino-acids (S, T, N, Q); magenta: negative-charged amino-acids (D, E); red: positive-charged amino-acids (R, K); dark-blue: histidine (H) and tyrosine (Y)).</p>
            </text>
            <graphic file="1471-2105-3-21-1"/>
         </fig>
         <p>After initial alignment of the protein sequences the DNA sequence for each was inspected for possible frameshift errors and incorrect splicing boundaries using tblastn and genewise.</p>
         <p>The XS domain containing proteins are predicted by ncoils <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> to contain coiled-coils, which suggests that they will oligomerise. Most coiled-coil proteins form either a dimeric or a trimeric structure. It is possible that different members of the XS domain family could oligomerise via their coiled-coils forming a variety of complexes.</p>
         <p>Analysis of the C-terminal region after the central coiled-coil region of rice gene X identifies a second region of conservation termed the XH domain, for 'rice gene <ul>X</ul> Homology'. The XH domain is between 124 and 145 residues in length. All the members can be found with any XH domain sequences as a PSI-BLAST query. XH domains exist in some proteins that do not contain an XS domain, for example AT2G16490 from <it>Arabidopsis thaliana</it>. Figure <figr fid="F2">2</figr> shows an alignment of this presumed domain and figure <figr fid="F4">4</figr> shows the complete domain organisation of these proteins. The XH domain was not found in the SGS3 protein. As the XS and XH domains are fused in most of these proteins, these two domains may interact. The XS domain of SGS3 may also interact with XH domains of other proteins. The XH domain contains one completely conserved glutamate that could potentially be part of an active site or other functionally important region.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Multiple sequence alignments of the XH domain.</p>
            </caption>
            <text>
               <p><b>Multiple sequence alignments of the XH domain.</b> See caption of figure <figr fid="F1">1</figr> for details</p>
            </text>
            <graphic file="1471-2105-3-21-2"/>
         </fig>
         <p>A global alignment of the full length of the sequences with an XS domain using the T-Coffee alignment program <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, suggested that many of the proteins contained an N-terminal cysteine/histidine cluster. An alignment of the N-terminal cluster is shown in figure <figr fid="F3">3</figr>. This pattern of conservation suggests a zinc binding domain. Although SGS3 is included in the alignment and conserves the putative zinc ligands, there is no statistical support for its inclusion with standard methods. However given the conservation pattern and presence of the shared XS domain and coiled-coil it seems likely that this is an evolutionarily conserved domain. The rice gene X homologues conserve several lysines within the putative zinc binding domain that suggest it may be a nucleic acid binding domain. An RNA binding function would seem plausible if this larger family of proteins were involved in PTGS as the SGS3 protein appears to be.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Multiple sequence alignments of zf-XS domain.</p>
            </caption>
            <text>
               <p><b>Multiple sequence alignments of zf-XS domain.</b> An alignment of SGS3-type C2H2 zinc binding domain. See caption of figure <figr fid="F1">1</figr> for details</p>
            </text>
            <graphic file="1471-2105-3-21-3"/>
         </fig>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>A schematic figure showing the architectures of proteins containing XS domains.</p>
            </caption>
            <text>
               <p><b>A schematic figure showing the architectures of proteins containing XS domains.</b> More information about the ring finger domain, the CBS domain and the GADD45/L7/L30 domain can be found in the Pfam database <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> at <url>http://www.sanger.ac.uk/Software/Pfam</url> with accession numbers PF00097, PF00571 and PF01248 respectively. The asterisk denotes where the domain architecture shown is not from ring finger in rice gene X (Q9SBW2) sequence deposited in the protein database but is that found in the manuscript by Chen <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The XS, XH and the zinc finger found in SGS3 have been submitted to Pfam and given accession numbers PF03468, PF03469, PF03470 respectively.</p>
            </text>
            <graphic file="1471-2105-3-21-4"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Can we infer the function of SGS3 based on its similarity to other XS domain proteins? Unfortunately the members of this family are functionally uncharacterised. However, rice gene X was predicted by Chen <it>et al</it>. to be a transcription factor based on two pieces of evidence: (i) the presence of the coiled-coil as found in other transcription factors such as GCN4 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, (ii) the rice gene X contains a ring zinc finger <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. However, many coiled-coils are found in non-transcription factor proteins weakening the first argument. Ring fingers are now thought to mediate the protein interactions of ubiquitin ligases rather than interact with DNA <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The Rice gene X is the only member of this family that contains a ring finger. This second piece of evidence no longer points to a transcription factor function, but potentially to a role in ubiquitination. Therefore the evidence used to infer these protein are transcription factors is weak and the inference unlikely to be correct.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>In summary this analysis suggests that SGS3 may have a nucleic acid binding function and that a large family of plant proteins containing the novel XS and XH domains may be uncharacterised components in PTGS.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>AB is supported by the Wellcome Trust. I would like to thank William Mifsud, Richard Durbin for comments on the manuscript.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Gene silencing: fleshing out the bones.</p>
            </title>
            <aug>
               <au>
                  <snm>Finnegan</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Waterhouse</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>R99</fpage>
            <lpage>R102</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(01)00039-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11231168</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Hutvagner</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>McLachlan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pasquinelli</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Balint</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zamore</snm>
                  <fnm>PD</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <fpage>834</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1062961</pubid>
                  <pubid idtype="pmpid" link="fulltext">11452083</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing.</p>
            </title>
            <aug>
               <au>
                  <snm>Grishok</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pasquinelli</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Conte</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Parrish</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ha</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Baillie</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Fire</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mello</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2001</pubdate>
            <volume>106</volume>
            <fpage>23</fpage>
            <lpage>34</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11461699</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>A Role for the RNase III Enzyme DCR-1 in RNA Interference and Germ Line Development in C. elegans.</p>
            </title>
            <aug>
               <au>
                  <snm>Knight</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Bass</snm>
                  <fnm>BL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>2</fpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Arabidopsis SGS2 and SGS3 genes are required for posttranscriptional gene silencing and natural virus resistance.</p>
            </title>
            <aug>
               <au>
                  <snm>Mourrain</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Beclin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Elmayan</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Feuerbach</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Godon</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Morel</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Jouette</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lacombe</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Nikic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Picault</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>101</volume>
            <fpage>533</fpage>
            <lpage>542</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10850495</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Sequence composition and organization in the Sh2/A1-homologous region of rice.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bennetzen</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Plant Mol Biol</source>
            <pubdate>1996</pubdate>
            <volume>32</volume>
            <fpage>999</fpage>
            <lpage>1001</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9002598</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Application of multiple sequence alignment profiles to improve protein secondary structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Cuff</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2000</pubdate>
            <volume>40</volume>
            <fpage>502</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/1097-0134(20000815)40:3&lt;502::AID-PROT170>3.0.CO;2-Q</pubid>
                  <pubid idtype="pmpid" link="fulltext">10861942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Predicting coiled coils from protein sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Lupas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Van Dyke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stock</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1991</pubdate>
            <volume>252</volume>
            <fpage>1162</fpage>
            <lpage>1164</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2031185</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>T-Coffee: A novel method for fast and accurate multiple sequence alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil.</p>
            </title>
            <aug>
               <au>
                  <snm>O'Shea</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Klemm</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Alber</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1991</pubdate>
            <volume>254</volume>
            <fpage>539</fpage>
            <lpage>544</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1948029</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>RING finger proteins: mediators of ubiquitin ligase activity.</p>
            </title>
            <aug>
               <au>
                  <snm>Joazeiro</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>102</volume>
            <fpage>549</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11007473</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The Pfam protein families database.</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>263</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102420</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592242</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.263</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
