<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2180-3-3</ui>
   <ji>1471-2180</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>New Knowledge from Old: <it>In silico </it>discovery of novel protein domains in <it>Streptomyces coelicolor</it></p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Yeats</snm>
               <fnm>Corin</fnm>
               <insr iid="I1"/>
               <email>cay@sanger.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Bentley</snm>
               <fnm>Stephen</fnm>
               <insr iid="I1"/>
               <email>sdb@sanger.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Bateman</snm>
               <fnm>Alex</fnm>
               <insr iid="I1"/>
               <email>agb@sanger.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</p>
            </ins>
         </insg>
         <source>BMC Microbiology</source>
         <issn>1471-2180</issn>
         <pubdate>2003</pubdate>
         <volume>3</volume>
         <issue>1</issue>
         <fpage>3</fpage>
         <url>http://www.biomedcentral.com/1471-2180/3/3</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-2180-3-3</pubid>
               <pubid idtype="pmpid">12625841</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>11</month>
               <year>2002</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>6</day>
               <month>2</month>
               <year>2003</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>6</day>
               <month>2</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>Yeats et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p><it>Streptomyces coelicolor </it>has long been considered a remarkable bacterium with a complex life-cycle, ubiquitous environmental distribution, linear chromosomes and plasmids, and a huge range of pharmaceutically useful secondary metabolites. Completion of the genome sequence demonstrated that this diversity carried through to the genetic level, with over 7000 genes identified. We sought to expand our understanding of this organism at the molecular level through identification and annotation of novel protein domains. Protein domains are the evolutionary conserved units from which proteins are formed.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Two automated methods were employed to rapidly generate an optimised set of targets, which were subsequently analysed manually. A final set of 37 domains or structural repeats, represented 204 times in the genome, was developed. Using these families enabled us to correlate items of information from many different resources. Several immediately enhance our understanding both of <it>S. coelicolor </it>and also general bacterial molecular mechanisms, including cell wall biosynthesis regulation and streptomycete telomere maintenance.</p>
            </sec>
            <sec>
               <st>
                  <p>Discussion</p>
               </st>
               <p>Delineation of protein domain families enables detailed analysis of protein function, as well as identification of likely regions or residues of particular interest. Hence this kind of prior approach can increase the rate of discovery in the laboratory. Furthermore we demonstrate that using this type of <it>in silico </it>method it is possible to fairly rapidly generate new biological information from previously uncorrelated data.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p><it>Streptomyces coelicolor </it>&#8211; a complex prokaryote</p>
            </st>
            <p><it>Streptomyces coelicolor </it>is a representative of a group of high G+C Gram positive bacteria whose successful adaptation to their niche is demonstrated by their almost ubiquitous presence in soil. This is largely accounted for by their broad metabolic capacity allowing them to cope with the many variables in their environment. They are able to utilise a wide range of food sources including the debris from plants, insects and fungi. Streptomycetes are also famed for their production of a range of secondary metabolites including antibiotics and other chemotherapeutic compounds.</p>
            <p>Unusually for bacteria, streptomycetes exhibit complex multicellular development, with branching, filamentous mycelia giving rise to aerial hyphae which in turn bear long chains of reproductive spores. These three developmental stages also display differential 'tissue-specific' gene expression.</p>
            <p>Also unusual is the size and structure of streptomycete chromosomes. <it>Streptomyces coelicolor </it>has a linear chromosome which at 8,667,507 base pairs is the largest complete bacterial genome sequence currently available <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It is predicted to encode a remarkable 7825 proteins, around twice as many as most sequenced bacterial genomes and more than the eukaryote <it>Saccharomyces cerevisiae</it>. This plethora of proteins reflects both a multiplicity of novel protein families and an expansion within known families when compared to other bacteria and thus is a good resource in the search for novel protein domains</p>
         </sec>
         <sec>
            <st>
               <p>Protein Domains</p>
            </st>
            <p>The direct functional and structural determination of all the proteins in an organism is prohibitively expensive and time consuming. The sequencing of a genome is a powerful aid to understanding the molecular biology of an organism even in the absence of direct experimental work on the organism. Given a complete genome sequence one can begin to ask global questions about the organism's metabolic potential as well as what molecular systems it contains. The transfer of information between related proteins is of fundamental importance into studies of the proteome. While comparison of whole protein sequences is a useful tool in finding close and direct relationships, it also misses the subtler relationships between proteins. A more sophisticated method of analysing proteins is through the determination of their domain content <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
            <p>Protein domains are discrete stable amino acids structures, typically globular and formed from between 40 and 400 amino acids. Homologous domains exhibit highly similar tertiary structure, with the overall structure of the protein being a composite of its domains and connecting sections. To a varying extent biochemical and physiological functions can also be transferred between homologous domains. Some domain families exhibit a wide-range of activities, specificities or interactions, whereas others show far less variation. Of note, and analogous to domains, are structural repeats, such as the WD40 repeat. Typically such repeats are between 5 and 60 amino acid residues in length, and occur in a tandem array in a protein. These fold together to form stable, and often very regular, 3-dimensional structures. A common example is the &#946;-propeller (covered in detail in <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>). It is important to realize that repeats are different from repeated domains. Repeated domains would be expected to be stable in isolation, contrasting with repeats which would not be.</p>
         </sec>
         <sec>
            <st>
               <p>Inference of Domain Function</p>
            </st>
            <p>In annotation of bacterial genomes a key step is to infer the function of a protein by similarity to other known proteins. This step usually takes each protein in the genome and searches a large non-redundant database using a sequence search method such as BLAST or FastA <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. The list of matches is then examined to find if any similar protein has a function that can reliably be transferred. Care must be exercised in this process, as this approach can lead to missannotation. In cases of multidomain proteins the similarity to another protein may be due to a domain similarity. For example, in the original annotation of the <it>Methannococcus jannaschii </it>genome <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> several proteins were annotated as inosine-monophosphate dehydrogenase (IMPDH) enzymes. The similarity to IMPDH lay not in the enzymatic domain but to a regulatory domain <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Hence analysis of protein domain content is an important component of the annotation process.</p>
            <p>In this paper we attempt to identify novel protein domains in <it>Streptomyces coelicolor</it>. To be useful in understanding the biology of <it>Streptomyces coelicolor </it>and other organisms we wish to infer the function of these novel domains. There are two complementary approaches to this problem. Firstly, similarity to other protein domains can be used. By examining the function of each protein containing the domain we try to infer what the common function might be between the proteins and hence the function of the domain. This process is often hampered by a lack of information about any of the proteins. Secondly and more recently methods using genomic context have been developed that allow increased confidence for functional prediction. These approaches include using gene order such as appearance of proteins in operons, the appearance of fusion proteins and phylogenetic profiles <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. We can also use the knowledge of the biology of <it>Streptomyces coelicolor </it>to provide a species context. This allows interpretation of domains and proteins in the context of the whole organism's biology.</p>
            <p>We use this principle to help elucidate putative biological mechanisms and deepen our understanding of described systems within the soil-dwelling prokaryote <it>Streptomyces coelicolor</it>. Firstly a set of novel domains is predicted using the recently completed genome sequence. Homologues in other organisms were searched for and descriptive information obtained through literature searching and other analytical tools. This information was then viewed within the context of the <it>Streptomyces coelicolor </it>organism. These results provide functions for many proteins leading to a number of testable hypotheses.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>The Domain Hunt Methodology</p>
            </st>
            <p>The simplest way to accurately identify novel domains is through examination of high resolution protein structures, usually derived crystallographic studies; however only a small proportion of sequences have representative structures. To get maximum value from the large amounts of sequence data being produced, a variety of detailed sequence comparison methods are employed to predict domain families. Such predicted domains are actually representative of evolutionary conserved sequences rather than discrete protein structures; however experience shows that they mostly represent such structures. This finding has led to the consideration of domains as the building blocks of protein evolution (reviewed by <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>).</p>
            <p>Predictions of novel domains are normally derived from one of two general methods. At one extreme a researcher will take a single protein sequence and search for partial matches against other sequences. They can then use these short matches as starting points for building new families. The success and ease of such manual building is often dependant on the experience of the researcher. At the other extreme are the fully automated methods that work on large protein sets. An example is the ProDom database <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, from which Pfam-B is derived. We used two methods to investigate the <it>S. coelicolor </it>genome, using a combination of rapid automatic identification of potential novel domains followed by detailed manual analyses. All derived families were deposited in the Pfam database <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
            <sec>
               <st>
                  <p>Method One</p>
               </st>
               <p>A significant mechanism in the evolution of novel proteins is internal duplication. It has been suggested that some types of domain &#8211; especially ligand binding domains &#8211; often occur tandemly within a protein. Examples of this are PDZ (PF00595), ubiquitin (PF00240) and cadherin domains (PF00028). Self-self comparisons of proteins are a powerful way of taking advantage of this occurrence of internal duplications, providing greater sensitivity than all-against-all searching <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The reduction of the number of sequences being compared increases the likelihood that an apparent match is genuine and hence gives an increased sensitivity. An additional advantage is that duplications allow easier recognition of domain boundaries &#8211; often a difficult task. The approach described below for domain discovery has in essence been used previously, with noted success (for example see <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>). The following steps describe the procedure that we have implemented to identify novel domains by detecting internal protein duplications. These steps are also described in the flow diagram Figure <figr fid="F1">1</figr>.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>Flowchart of the domain hunt process</p>
                  </caption>
                  <text>
                     <p>Flowchart of the domain hunt process. Note: results that end up in the 'Revise Pfam-A' category are not discussed.</p>
                  </text>
                  <graphic file="1471-2180-3-3-1"/>
               </fig>
               <sec>
                  <st>
                     <p>Step 1</p>
                  </st>
                  <p>A set of 7846 potential and known coding sequences from <it>Streptomyces coelicolor </it>was used as the starting point. Low complexity regions were masked using 'seg' <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A comparison of each protein against itself was carried out using Prospero <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Prospero returns the highest scoring self-self matches with an E-value score measuring the significance of each alignment.</p>
               </sec>
               <sec>
                  <st>
                     <p>Step 2</p>
                  </st>
                  <p>Highest scoring matches were retained for each sequence and a series of filters were applied to remove matches that are unlikely to be novel domains. Firstly, all matches that had an E-value greater than 0.001 were discarded. Given the size of the Streptomyces coelicolor genome we would expect very few false alignments to be detected at this threshold. Secondly, alignments with a length of less than 30 residues were removed. Thirdly, alignments where the start points of each subsequence were separated by less than 45 residues ('shift') were discarded. Such short duplications are unlikely to be genuine domains. These are more likely to be structural repeats that are not stable in isolation. From this set any that overlapped a Pfam-A family were also discarded unless both subsequences occurred within the boundaries of single Pfam-A family. Such an occurrence indicates that the family contains more than one domain or repeat and needs refining. An overlap is defined as there being residues that occur in both the test alignment and the Pfam-A family alignment.</p>
               </sec>
               <sec>
                  <st>
                     <p>Step 3</p>
                  </st>
                  <p>The alignments generated by Prospero were used as an initial alignment to make profile-HMMs using the HMMER 2.2 software <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. If the pair of sequences in the Prospero alignment overlapped then these overlap regions were removed from the alignment. Profile HMMs were built in local (fs) and global (ls) mode. The resulting profile-HMMs were scanned against the SWISS-PROT and TrEMBL databases <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. An inclusion threshold of 0.01 was chosen and an alignment of all homologues detected was constructed using the hmmalign program from the HMMER package. This alignment was then compared again to the Pfam-A database to see if the profile-HMM searches had detected any similarities to known families. This step removed distant homologues of previously described families. In some cases the missing members were subsequently added to the Pfam SEED alignments.</p>
               </sec>
               <sec>
                  <st>
                     <p>Step 4</p>
                  </st>
                  <p>The previous three steps help to narrow down the number of potential domains to analyze. The final step is a careful manual inspection of the family to extend its membership as well as improve the multiple sequence alignment and hopefully to determine the domains function. This analysis uses a wide variety of tools and methods (see below).</p>
               </sec>
            </sec>
            <sec>
               <st>
                  <p>Method 2</p>
               </st>
               <p>A complementary method was also used to try to identify novel domains that may be of significance to the biology of <it>S. coelicolor</it>. The initial assumption of this process is that short proteins are likely to consist of single domain. Furthermore it seems likely that if a short protein family is represented multiple times in the genome, it should be of some importance. Using these principles we developed a second four-step process:</p>
               <sec>
                  <st>
                     <p>Step 1</p>
                  </st>
                  <p>A set of 597 short proteins (&#8804; 100 residues) was assembled. An all-against-all BLAST was carried out and the proteins clustered using single-linkage clustering with a cut-off threshold of 50 bits, which we determined was sufficiently high to prevent clustering of unrelated proteins.</p>
               </sec>
               <sec>
                  <st>
                     <p>Step 2</p>
                  </st>
                  <p>All clusters that corresponded to Pfam-A families and single proteins that did not cluster were then removed from the set. This step also provides a useful check on the stringency of the clustering cut-off score. The clustered sequences were then aligned using T-Coffee <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
               </sec>
               <sec>
                  <st>
                     <p>Step 3</p>
                  </st>
                  <p>The aligned clusters were then used as seeds for an iterative search process using HMMER 2.2, similar to above. The families were iterated until convergence. They were then realigned with T-Coffee and a single round of searching carried out. If any new family members were identified then the iterative search process was repeated.</p>
               </sec>
               <sec>
                  <st>
                     <p>Step 4</p>
                  </st>
                  <p>Manual analysis as carried out in Step 4 of Method 1 (also see below).</p>
               </sec>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Software/Servers Used in Manual Analysis</p>
            </st>
            <p>All sequences provided in the alignments were obtained from SWISS-PROT/TrEMBL. Known domains were identified in these sequences using the SMART <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, ProSite <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, Pfam and InterPro <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>To improve the accuracy of the sequence alignments, the automatic alignment software ClustalW <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and T-Coffee were employed. These alignments were viewed using Belvu (Sonnhammer ELL) and manually edited with Jalview (Clamp M).</p>
            <p>Although our primary interest is in detecting novel domains, other features are of interest. For each sequence in an alignment the following set of programs were run: SignalP <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, for secretory signal peptide prediction; TMHMM <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> to determine likely transmembrane regions; NCOILS <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> to predict coiled-coil regions.</p>
            <p>The final domain alignments were submitted to the PredictProtein server and a secondary structure prediction made using PROF <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The results are shown in the sequence alignment figures for each domain provided.</p>
            <p>In order to determine genomic context the position of the domains in the <it>S. coelicolor </it>genome was viewed using the Artemis <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> genome viewer.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>Overview of the Novel Domains</p>
            </st>
            <sec>
               <st>
                  <p>Method 1 Results</p>
               </st>
               <p>From an initial set of 124 possible domain targets, 31 novel domains were identified, giving a 25% success rate. Sixteen targets were removed by the step 3 of the process. Of the targets that lay within Pfam families, most related to the same set of overlapping families &#8211; Patched (PF02460), SecD_SecF (PF02355), and MMPL (PF03176). These targets probably identify a highly divergent transmembrane domain that occurs in pairs, and is found within these families. Table <tblr tid="T1">1</tblr> lists and briefly describes all novel domains identified in the domain hunt processes. There were also significant extensions to two Pfam-A families &#8211; the SCP domain and FG-GAP repeats. SCP has not been previously reported in bacteria.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>List of all domains identified by described methods, as well as their likely function and number in <it>S. coelicolor</it>.</p>
                  </caption>
                  <tblbdy cols="10">
                     <r>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Pfam Accession No</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Family Name</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Pfam Type</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Basic Function</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>No of copies in S. coelicolor</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Antibiotic biosynthesis</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Cell Wall Biosynth</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Cell Wall/Periplasm</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Replication</it>
                              </b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>
                                 <it>Secreted</it>
                              </b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10" ca="left">
                           <p>
                              <b>A) Novel Families</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03457</p>
                        </c>
                        <c ca="center">
                           <p>HA</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Putative RNA binding domain</p>
                        </c>
                        <c ca="center">
                           <p>21</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03621</p>
                        </c>
                        <c ca="center">
                           <p>MbtH</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Possibly involved in antibiotic biosynthesis</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03625</p>
                        </c>
                        <c ca="center">
                           <p>DUF302</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03640</p>
                        </c>
                        <c ca="center">
                           <p>Lipoprotein_15</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03703</p>
                        </c>
                        <c ca="center">
                           <p>DUF304</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03704</p>
                        </c>
                        <c ca="center">
                           <p>BTAD</p>
                        </c>
                        <c ca="center">
                           <p>Family</p>
                        </c>
                        <c ca="left">
                           <p>Bacterial transcriptional activator domain</p>
                        </c>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03710</p>
                        </c>
                        <c ca="center">
                           <p>GlnE</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Glutamate-ammonia ligase adenylyltransferase</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03713</p>
                        </c>
                        <c ca="center">
                           <p>DUF305</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03714</p>
                        </c>
                        <c ca="center">
                           <p>PUD</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Putative carbohydrate binding domain</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03724</p>
                        </c>
                        <c ca="center">
                           <p>DUF306</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03729</p>
                        </c>
                        <c ca="center">
                           <p>DUF308</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03733</p>
                        </c>
                        <c ca="center">
                           <p>DUF307</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03752</p>
                        </c>
                        <c ca="center">
                           <p>ALF</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Putative signal transduction domains</p>
                        </c>
                        <c ca="center">
                           <p>16</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03756</p>
                        </c>
                        <c ca="center">
                           <p>AfsA_repeat</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>A-factor biosynthesis</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03771</p>
                        </c>
                        <c ca="center">
                           <p>SPDB</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>(Probably) mobile element replication</p>
                        </c>
                        <c ca="center">
                           <p>16</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03777</p>
                        </c>
                        <c ca="center">
                           <p>DUF320</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>11</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03779</p>
                        </c>
                        <c ca="center">
                           <p>SPW</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03793</p>
                        </c>
                        <c ca="center">
                           <p>PASTA</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Cell wall peptidoglycan sensor domain</p>
                        </c>
                        <c ca="center">
                           <p>9</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03794</p>
                        </c>
                        <c ca="center">
                           <p>HHE</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>7</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03795</p>
                        </c>
                        <c ca="center">
                           <p>YCII</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Probably enzymatic domain</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03860</p>
                        </c>
                        <c ca="center">
                           <p>DUF326</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03984</p>
                        </c>
                        <c ca="center">
                           <p>DUF346</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function (&#946;-propeller)</p>
                        </c>
                        <c ca="center">
                           <p>7</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03988</p>
                        </c>
                        <c ca="center">
                           <p>DUF347</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03990</p>
                        </c>
                        <c ca="center">
                           <p>DUF348</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03992</p>
                        </c>
                        <c ca="center">
                           <p>ABM</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Antibiotic biosynthesis monooxygenase</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03993</p>
                        </c>
                        <c ca="center">
                           <p>DUF349</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03994</p>
                        </c>
                        <c ca="center">
                           <p>DUF350</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03995</p>
                        </c>
                        <c ca="center">
                           <p>DUF351</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF04151</p>
                        </c>
                        <c ca="center">
                           <p>PPC</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>PKD-like peptidase C-terminal domain</p>
                        </c>
                        <c ca="center">
                           <p>3</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF04205</p>
                        </c>
                        <c ca="center">
                           <p>FMN_bind</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>FMN-binding domain</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF05120</p>
                        </c>
                        <c ca="center">
                           <p>GvpG</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Gas vesicle protein G</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF05121</p>
                        </c>
                        <c ca="center">
                           <p>GvpK</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Gas vesicle protein K</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF05122</p>
                        </c>
                        <c ca="center">
                           <p>SpdB</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Mobile element transfer proteins</p>
                        </c>
                        <c ca="center">
                           <p>2</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10" ca="left">
                           <p>
                              <b>B) Previously Described New Pfam Families</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03458</p>
                        </c>
                        <c ca="center">
                           <p>UPF0126<sup>1</sup></p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03459</p>
                        </c>
                        <c ca="center">
                           <p>TOBE<sup>2</sup></p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Transport-associated OB fold domain</p>
                        </c>
                        <c ca="center">
                           <p>9</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03707</p>
                        </c>
                        <c ca="center">
                           <p>MHYT<sup>3</sup></p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Putative ligand receptor</p>
                        </c>
                        <c ca="center">
                           <p>6</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF03989</p>
                        </c>
                        <c ca="center">
                           <p>DNA_gyraseA_C4</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>DNA-binding &#946;-propeller</p>
                        </c>
                        <c ca="center">
                           <p>8</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c cspan="10" ca="left">
                           <p>
                              <b>C) Significantly Extended Families</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="10">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF00188</p>
                        </c>
                        <c ca="center">
                           <p>SCP</p>
                        </c>
                        <c ca="center">
                           <p>Domain</p>
                        </c>
                        <c ca="left">
                           <p>Unknown function</p>
                        </c>
                        <c ca="center">
                           <p>4</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>PF01839</p>
                        </c>
                        <c ca="center">
                           <p>FG-GAP</p>
                        </c>
                        <c ca="center">
                           <p>Repeat</p>
                        </c>
                        <c ca="left">
                           <p>Putative &#946;-propeller</p>
                        </c>
                        <c ca="center">
                           <p>57</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>X</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>This table shows all new Pfam families added during this investigation. Part A shows entirely novel families. Part B shows families that are new to Pfam but have been previously described in the literature: (1) SWISS-PROT; (2) <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>; (3) <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>; (4) <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. Part C shows families that have had significant extensions to them &#8211; for instance SCP was previously thought to be present only in eukaryotes. Domains highlighted in blue are discussed in further detail in section 3.2.</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Method 2 Results</p>
               </st>
               <p>From an initial set of 597 short proteins 35 clusters were derived, accounting for a total of 102 proteins. There were 26 size two (two proteins) clusters, 4 size three clusters, 2 size five's, a size six, a size seven, and a size 15 cluster. All the clusters above size three were part of Pfam-A families - DUF397 (PF04149), CSD (PF00313), Whib (PF02467) and DUF320 (PF03777). DUF397 accounted for the size fifteen and the size six clusters. DUF320 was identified by both hunt processes. As a positive control the iterative search steps were carried out on the annotated clusters. All produced larger alignments that were simple to further develop to good approximations of the Pfam-A families. When run on the test set of clusters only one family significantly extended &#8211; the MbtH family (see below). Three small families (&lt;10 sequences) &#8211; GvpG (PF05120), GvpK (PF05121) and spdb (PF05122) &#8211; were also produced.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Domains of Significant Interest</p>
            </st>
            <sec>
               <st>
                  <p>Novel Families</p>
               </st>
               <sec>
                  <st>
                     <p>HA (Helicase Associated domain; PF03457)</p>
                  </st>
                  <p>See Figure <figr fid="F2">2</figr> for example alignment. The domain is typically seventy residues in length and is predicted to have an &#945;-helix-only fold. It appears to mostly only be found in the streptomycetes, though an HA-containing helicase is found in <it>Chlamydia muridarum</it>, and a protein consisting of three copies of the domain (Swiss:Q98RX4) in the lower eukaryote <it>Guillardia theta</it>. The gene in <it>C. muridarum </it>is likely to be a result of a lateral transfer event <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Examination of the position of the HA domain-containing proteins, using Artemis, on the <it>Streptomyces coelicolor </it>genome revealed a surprising result. From each end of the linear <it>S. coelicolor </it>chromosome the second and third ORFs contain HA domains. The second gene from each end is identical to the other (SCO0002 and SCO7845) as are the HA-containing genes third from each end (SCO003 and SCO7844). SCO0002 and SCO7845 have an N-terminal DEAH/D helicase domain and 4 C-terminal HA repeats; SCO003 and SCO7844 have 6 C-terminal HA repeats and N-terminal region of unknown function, though it may contain a helix-turn-helix DNA-binding motif (score = 3.12, ~50% probability as predicted at <url>http://npsa-pbil.ibcp.fr/cgi-bin/primanal_hth.pl</url>). One more gene encoding a single HA domain is found more centrally on the chromosome (SCO0034).</p>
                  <fig id="F2">
                     <title>
                        <p>Figure 2</p>
                     </title>
                     <caption>
                        <p>HA domain alignment</p>
                     </caption>
                     <text>
                        <p>HA domain alignment. All proteins are from <it>S. coelicolor </it>except Q9PK68 (<it>Chlamydia muridarum</it>), Q9L8V8 (<it>S. lividans</it>), Q98RX4 (<it>Guillardia theta</it>). The line marked HA_SS is the predicted secondary structure of the HA domain. The line marked HtrF is the secondary structure of the HtrF protein (PDB: 1BA5).</p>
                     </text>
                     <graphic file="1471-2180-3-3-2"/>
                  </fig>
                  <p>Specific complexes are required for maintaining the ends of the linear streptomycete chromosomes, and the appearance of the genes encoding these domains specifically at the ends suggests that the proteins may be involved in forming these complexes. This is further evidenced by the observation <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> that similar helicases appeared at the end of several of the steptomycete chromosomes investigated as well as the linear plasmids. A knockout mutation experiment they carried out was inconclusive; chromosome linearity was maintained, but the region of protein substituted lay between the helicase domain and the HA domains, so it is possible that the helicases still retained functionality. The identification of an HA-containing helicase (SCP1.136) in the SCP1 plasmid, which is also linear and has the same type of telomere, further confirms this hypothesis.</p>
                  <p>There are no clear conserved catalytic residues in the alignment, suggesting that these domains have a binding function. The secondary structure prediction of the HA domain as a three-helical bundle is also suggestive of the Myb-like domain &#8211; a general DNA-binding domain. Aligning the sequence of the DNA-binding domain of Htrf1 (human telomeric protein) against the Pfam SEED alignment with T-Coffee showed interesting similarities between them. Two of the three key tryptophan residues in Myb-like DNA binding domain align to tryptophan residues in HA; in the place of a third is a leucine, which is a structurally conservative replacement. The first helix appears to align well, however the second is longer in HA whereas the third is shorter. As to whether there is a true evolutionary or functional relationship between the HA domain and the Myb-like domain, the evidence is not conclusive but the number of similarities is at least striking. Eukaryotic and Streptomycete telomeres are significantly different in structure, but the Myb-like domain may provide a plausible structure model for determining if and how the HA domains interact with DNA.</p>
               </sec>
               <sec>
                  <st>
                     <p>BTAD (Bacterial transcriptional activator domain; PF03704)</p>
                  </st>
                  <p>The following family was an interesting case, and has been previously mentioned as an uncharacterized domain <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Although a repeat was detected with an E-value of 4.73 &#215; 10<sup>-4 </sup>using Prospero on the masked sequence, the validity of the repeat could not be verified by other means. However the amino terminal region was related to a number of other bacterial proteins and was investigated further; see Figure <figr fid="F3">3</figr> for alignment. The BTAD domain is found in small set of bacterial regulatory proteins that occur in the streptomycetes and the closely related Mycobacteria, though one is also found in <it>Rhizobium loti </it>(MLR2443/Q98IE9). One of the proteins it is found in &#8211; AfsR &#8211; is a global secondary metabolite regulator of <it>S. coelicolor </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. This protein has two basic functions &#8211; binding DNA and recruiting RNA polymerase. The first of these is carried out by the OmpR-like DNA-binding domain (PF00486), whereas the second is carried out by the region C-terminal to the BTAD domain. This region includes the ATP-binding NB-ARC domain (PF00931) and three TPR repeats (PF00515). AfsR's DNA-binding activity is modulated by serine/threonine phosphorylation <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>; however there are no conserved serines or threonines in the BTAD domain so the phosphorylatyed residues are likely to occur in the DNA-binding domain. A mutation analysis by <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> on DnrI suggests that the BTAD domain is essential to its function. A possible explanation is that it mediates oligomerisation with other transcription complex proteins, or even mediates interactions between DnrI monomers that are binding tandem repeats in a promoter region. There are eleven pathway-specific regulatory proteins in <it>S. coelicolor </it>that contain this domain, including DnrI and RedD, five of which are found in antibiotic synthesis clusters. It is possible that the BTAD domain mediates interactions between the global regulator AfsR and the downstream pathway-specific regulators.</p>
                  <fig id="F3">
                     <title>
                        <p>Figure 3</p>
                     </title>
                     <caption>
                        <p>BTAD domain alignment</p>
                     </caption>
                     <text>
                        <p>BTAD domain alignment. The predicted secondary structure is shown on the line BTAD_SS.</p>
                     </text>
                     <graphic file="1471-2180-3-3-3"/>
                  </fig>
               </sec>
               <sec>
                  <st>
                     <p>ALF (Adenine-Leucine-rich conserved (F)phenylalanine; PF03752)</p>
                  </st>
                  <p>This family occurs as two sets of four forty-five residue tandem repeats in three <it>S. coelicolor </it>proteins. The repeats have a predicted secondary structure of three &#945;-helices (See Figures <figr fid="F4">4</figr> &amp;<figr fid="F5">5</figr>). The unusual architecture of these proteins is of note. To the C-terminus of each set of repeats is a low-complexity or coiled-coil region. For all three proteins InterProScan <url>http://www.ebi.ac.uk/interpro/scan.html</url> finds a chemotaxis sensory transducer region (IPR:004089; PS50111) between the two ALF-repeat regions. However searching these regions with HMMER 2.2 against SWISS-PROT and TrEMBL found no significant homology to other chemotaxis proteins; similarly using PSI-BLAST at the NCBI found several false-positives (data not shown), but no chemotaxis signal transduction proteins. The sequence in this stretch is very alanine rich, and so could lead to high-scoring matches on the basis of the apparent conservation of the alanines despite a lack of conservation in other positions. So it seems likely that the apparent homology is incorrect. One of the proteins, SCP1.201 (Swiss: Q9ACV2), also contained an intein (N-terminus: SM00306, IPR003587; C-terminus: PS50818, IPR002203) at its C-terminus, which is the first identified in <it>S. coelicolor</it>.</p>
                  <fig id="F4">
                     <title>
                        <p>Figure 4</p>
                     </title>
                     <caption>
                        <p>ALF repeat alignment</p>
                     </caption>
                     <text>
                        <p>ALF repeat alignment. Predicted secondary structure is shown on the line ALF_SS.</p>
                     </text>
                     <graphic file="1471-2180-3-3-4"/>
                  </fig>
                  <fig id="F5">
                     <title>
                        <p>Figure 5</p>
                     </title>
                     <caption>
                        <p>Domain architectures of the ALF-containing proteins</p>
                     </caption>
                     <text>
                        <p>Domain architectures of the ALF-containing proteins. ALF repeats are represented by the blue ovals; the coiled-coil/low complexity regions are signified by the green boxes; Intein N and C-terminal domains are indicated by the yellow ovals.</p>
                     </text>
                     <graphic file="1471-2180-3-3-5"/>
                  </fig>
                  <p>Two of the proteins, SCO6198 (Swiss: Q9Z5A4) and SCO6593 (Swiss: O87848), are located on the chromosome adjacent or close to secreted esterases (SCO6199 and SCO6590) and several other probable secreted proteins of unknown function (SCO6197; SCO6592, SCO6591, SCO6594). SCP1.201 is located on the SCP1 plasmid. Again this gene is located near a secreted esterase (SCP1.199) and a secreted protein of unknown function (SCP1.200). Homology searches showed that SCO6197, SCO6591 and SCP1.200 are all homologues, though no other homologues were found. No relationships were found for SCO6592, while SCO6594 was found to be homologous to the C-terminal portion of SCO0545. SCO0545 does not have a known function but there are several catabolic enzymes in the same region.</p>
                  <p>Given the conservation of the associated genes it seems likely that they represent a conserved pathway and that the ALF regions act as a substrate- or product-recognition domain that passes a signal to or from the secreted esterases. The intein does not contain the homing endonuclease, and so is probably no longer an active mobile genetic element; this concurs with the apparent lack of other inteins in the <it>S. coelicolor </it>genome. This implies that the plasmid has passaged through another species that has mobile intein elements.</p>
               </sec>
               <sec>
                  <st>
                     <p>SPDY (Serine-Proline-Aspartate-Tyrosine motif; PF03771)</p>
                  </st>
                  <p>This domain typically occurs in pairs, is approximately 90 residues in length and has two conserved tryptophans and a proline (See Figure <figr fid="F6">6</figr>). It is only found in a region of the <it>S. coelicolor </it>that is believed to be an integrated genetic element, e.g. a plasmid or transposon <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The region appears to consist of two sections: a 'core' mobile element region with the essential replication genes and a flanking region containing a polyketide synthase and arsenic resistance genes. So this element may be important in mobilising these loci between strains. All of the SPDY domains occur in the core region, indicating that they are important in the replication of the element &#8211; though it is not possible to assign them a precise role. The lack of occurrences of this domain in any other known proteins indicates that this region of the genome represents a previously undescribed type of mobile genetic element.</p>
                  <fig id="F6">
                     <title>
                        <p>Figure 6</p>
                     </title>
                     <caption>
                        <p>SPDY domain alignment</p>
                     </caption>
                     <text>
                        <p>SPDY domain alignment. Predicted secondary structure is shown on the line SPDY_SS.</p>
                     </text>
                     <graphic file="1471-2180-3-3-6"/>
                  </fig>
               </sec>
               <sec>
                  <st>
                     <p>PASTA (Pbp And Serine/Threonine kinase Associated; PF03793)</p>
                  </st>
                  <p>The PASTA domain is discussed in greater detail in <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. It is a small (~70 residues) globular domain that binds cell wall peptidoglycan. With regards to <it>S. coelicolor</it>'s genome it shows an unusual distribution. Typically organisms that have PASTA domains have one PASTA-containing serine/threonine protein kinase (pPSTK), which is, putatively, the master regulator of cell wall peptidoglycan cross-linking and essential to growth and development, and one PASTA-containing penicillin-binding protein (pPBP), which is the primary cross-linking enzyme. For a type example see <it>Streptococcus pneumoniae</it>. However, uniquely amongst the sequenced microbial genomes, <it>S. coelicolor </it>has three pPSTKs and no pPBP. The PASTA domains show very little identity to each other in each PSTK. The simplest explanation is that each pPSTK regulates different stages of growth and division, each of which uses different peptidoglycans. This also fits there being no pPBP as it would be specific to a single peptidoglycan structure; so we propose it uses an alternative localisation system, perhaps similar to that used by <it>Deinococcus radiodurans </it>or Gram-ve bacteria. Intriguingly <it>S. coelicolor </it>has three principle cell morphologies and it may be that each pPSTK regulates the development of each type.</p>
               </sec>
               <sec>
                  <st>
                     <p>HHE (Histidine-Histidine-Glutamate motif; PF03794)</p>
                  </st>
                  <p>This domain normally occurs as tandem repeats, is approximately 70 residues in length, and is predicted to be composed of 2 &#945;-helices (See Figure <figr fid="F7">7</figr>). It is mostly found in prokaryotes, though four <it>Arabidopsis </it>proteins were identified with multiple HHE repeats and a <it>Schizosaccharomyces pombe </it>protein. Typically an HHE-containing protein consists of two HHE domains only, though there are exceptions like the <it>Arabidopsis </it>proteins (e.g. Q9LJQ1). There are two conserved histidines, both in the middle of predicted helices, and a conserved glutamate. It shows a slightly disparate phylogenetic distribution, but is found in eubacteria, archaea, fungi and plants. In several cases it appears to be involved in NO response &#8211; for instance DnrN from <it>Pseudomonas stutzeri </it><abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. Deletion of <it>dnrN </it>leads to slower response to nitrite of the <it>nirSTB </it>operon, so it may be involved in regulation or signal recognition. However, in <it>Ralstonia eutropha </it>deletion of the HHE-containing genes <it>norA1 </it>and <it>norA2</it>, despite being co-transcribed with the NO reductases-encoding <it>norB1 </it>and <it>norB2</it>, does not appear to affect growth or ability to cope with NO stress <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. It is also found in the ScdA protein of <it>Staphylococcus aureus</it>, which has been implicated in growth, development, and peptidoglycan cross-linking <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The two conserved histidines and the glutamate are suggestive of a cation-binding site, such as the binding of Zn<sup>2+ </sup>in Carboxypeptidase A. This hypothesis is supported by its occurrence in the putative cation-transporting ATPase SCO0164 (Swiss:Q9RJ01) where it might sequester cations for transport.</p>
                  <fig id="F7">
                     <title>
                        <p>Figure 7</p>
                     </title>
                     <caption>
                        <p>HHE domain alignment</p>
                     </caption>
                     <text>
                        <p>HHE domain alignment. The predicted secondary structure is shown in the line marked HHE_SS. The conserved histidines and glutamate are indicated with purple arrows.</p>
                     </text>
                     <graphic file="1471-2180-3-3-7"/>
                  </fig>
               </sec>
               <sec>
                  <st>
                     <p>PPC (Bacterial Pre-peptidase C-terminal domain; PF04151)</p>
                  </st>
                  <p>These domains are typically ninety residues in length and found at the C-termini of secreted peptidases (See Figure <figr fid="F8">8</figr>). Surprisingly these domains are found in at least four different classes of peptidases. The PPC domain is found in some members of metallopeptidase families M4, M9 and M28 as well as the serine peptidase family S8 <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The PPC domains are cleaved off subsequent to secretion, but prior to activation of the peptidase. The actual function of them is not clear but they may aid secretion/localisation or inhibit the peptidase until needed. Visual inspection of the alignment, as well as predicted similarities in the secondary structure, suggests that it may be related to the PKD domain (PF00801), but no significant homology was detected using computational methods. They are often found in the same protein as the PKD domain and in very similar contexts, and it is tempting to suggest that they are functionally interchangeable (see Figure <figr fid="F9">9</figr> for example domain architectures). PKD domains are thought to be involved in protein-protein interactions. Unlike the PKD domain the PPC domain is only found in bacteria and archaea, and not in eukaryotes.</p>
                  <fig id="F8">
                     <title>
                        <p>Figure 8</p>
                     </title>
                     <caption>
                        <p>PPC domain alignment</p>
                     </caption>
                     <text>
                        <p>PPC domain alignment. The predicted secondary structure is shown in the line marked PPC_SS.</p>
                     </text>
                     <graphic file="1471-2180-3-3-8"/>
                  </fig>
                  <fig id="F9">
                     <title>
                        <p>Figure 9</p>
                     </title>
                     <caption>
                        <p>Example domain architectures of PPC-containing proteins</p>
                     </caption>
                     <text>
                        <p>Example domain architectures of PPC-containing proteins. Example collagenase precursors Q9X4F8 (<it>Vibrio cholerae</it>), Q9X721 (<it>Clostridium histolyticum</it>), Q46085 (<it>Clostridium histolyticum</it>) and O54108 (<it>S. coelicolor</it>; SCO5912) demonstrate the apparent interchangeability of PPC and PKD domains. Q9LCJ5 (Protease precursor; <it>Aeromonas punctata</it>) represents a common protease architecture. Q59208 (esterase; <it>Bacillus licheniformis</it>) is an example of the PPC domain occurring at the N-terminus rather than the C-terminus. Domain names shown are Pfam identifiers.</p>
                     </text>
                     <graphic file="1471-2180-3-3-9"/>
                  </fig>
               </sec>
               <sec>
                  <st>
                     <p>FMN_bind (Flavin MonoNucleotide-binding; PF04205)</p>
                  </st>
                  <p>This domain represents a sixty residue region that includes an FMN-binding site (indicated in alignment, Figure <figr fid="F10">10</figr>), as determined in the NqrC proteins of <it>Vibrio cholerae </it><abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and <it>Vibrio alginolyticus </it><abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Interestingly the NqrB proteins, which also bind FMN through a threonine residue and are part of the same complex, do not show any homology. The region is found in several electron transport chain proteins; for example the RnfG electron transport protein, part of a chain that supplies electrons to both nitrogen fixation and DNP reduction in <it>Rhodobacter capsulatus </it><abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Other examples include the NosR/NirI nitrous oxide reduction regulatory proteins. FMN_bind-containing proteins appear to split into two groups, which relate length to function. The shorter proteins, typically 200&#8211;350 residues, are components of electron transport chains whereas the longer proteins, typically 680&#8211;800 residues, have a regulatory function. The regulatory proteins typically have five transmembrane helices in the C-terminal half of the protein. Members of both groups often have 4Fe-4S domains present, suggesting that the regulatory mechanisms also involve charge movement.</p>
                  <fig id="F10">
                     <title>
                        <p>Figure 10</p>
                     </title>
                     <caption>
                        <p>FMN_bind domain alignment</p>
                     </caption>
                     <text>
                        <p>FMN_bind domain alignment. The FMN-binding residue is indicated by the green arrow. The predicted secondary structure is shown in the line marked FMN_bind_SS.</p>
                     </text>
                     <graphic file="1471-2180-3-3-10"/>
                  </fig>
               </sec>
               <sec>
                  <st>
                     <p>MbtH (MbtH-like proteins; PF03621)</p>
                  </st>
                  <p>This domain is named after the MbtH protein from <it>Mycobacterium tuberculosis </it>(Swiss: O05821). The domain is typically 70 residues in length and covers the full length of the protein, though NikP1 from <it>Streptomyces tendae </it>(Swiss:Q9F2E7) also contains two domains common to antibiotic synthesis proteins: an AMP-binding domain (PF00501) and a Phosphopantetheine attachment site domain (PF00550). It is found in the Actinomycetes, the Proteobacteria gamma subdivision and <it>Rhizobium leguminosarum</it>. Several of these proteins have been implicated in antibiotic biosynthesis in several streptomycetes (for instance nikkomycins: <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>; simocyclinone: <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>; coumermycin A1: <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, and the formation of siderophores such as <it>E. coli</it>'s enterobactin or <it>M. tuberculosis</it>'s mycobactin (reviewed in <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>). In the biosynthesis of siderophores they do not seem to have a direct role, as a complete synthetic pathway can be built up of mycobactin without assigning to a role to MbtH (and similarly with enterobactin and the Mbth-like YbdZ); so it is likely that it is involved in either regulation of expression or transport of the siderophores out of the cell, with a similar role in antibiotic synthesis. There are several conserved residues, including three tryptophans that may have functional importance (See alignment in Figure <figr fid="F11">11</figr>).</p>
                  <fig id="F11">
                     <title>
                        <p>Figure 11</p>
                     </title>
                     <caption>
                        <p>MbtH domain alignment</p>
                     </caption>
                     <text>
                        <p>MbtH domain alignment. Conserved tryptophans are marked with purple arrows. Predicted secondary structure is shown on the line MbtH_SS</p>
                     </text>
                     <graphic file="1471-2180-3-3-11"/>
                  </fig>
               </sec>
            </sec>
            <sec>
               <st>
                  <p>Extended Families</p>
               </st>
               <sec>
                  <st>
                     <p>SCP (PF00188)</p>
                  </st>
                  <p>This domain family has previously only been reported in eukaryotes, but in fact it contains a diverged sub-group that occurs in eubacteria as well. An alignment of the eukaryotic and prokaryotic versions show that the principle difference is the absence in bacteria of the conserved cysteine residues, which form disulphide bridges, whereas the proposed active site <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> (see Figure <figr fid="F12">12</figr>) is mostly conserved. In order to try and determine its function in bacteria a review of the information available for the eukaryotic domains was carried out.</p>
                  <fig id="F12">
                     <title>
                        <p>Figure 12</p>
                     </title>
                     <caption>
                        <p>SCP domain alignment</p>
                     </caption>
                     <text>
                        <p>SCP domain alignment. All sequences shown are prokaryotic except GLIP_HUMAN and VA5_VESVU. The predicted active site residues, based on analysis of the eukaryotic domain <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, are marked by green or purple arrows. These residues are also almost fully conserved in the prokaryotic sequences except one which falls into an insert region not in the prokaryotic domains, which is marked by the green arrow. The secondary structure of the eukaryotic domain is shown on the line VA5_VESVU_SS.</p>
                     </text>
                     <graphic file="1471-2180-3-3-12"/>
                  </fig>
                  <p>So far all SCP-containing proteins appear to be secreted, and this is backed up by the consistent prediction of signal peptides at the N-terminus. There is very little direct evidence of their general function currently, however many examples have been found to be involved in signaling. For instance they are involved in several mammalian developmental processes, most notably sperm maturation <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and sperm-egg fusion <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, and are up-regulated in several tumors (<abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp>). Clear evidence has been found, in <it>Xenopus</it>, of sperm following the concentration of 'Allurin' &#8211; an SCP-containing protein <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. They are also commonly used by insects and reptiles as mammalian toxins.</p>
                  <p>However these proteins are very big for direct signaling molecules, typically being 200 or 400 residues (1 or 2 SCP domains). It has been suggested that there is an active site, based on analysis of the 3D NMR image of plant PR14a and comparison with human GliPR <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, and three of the four residues predicted to make up the site are conserved between the eukaryotic and prokaryotic subfamilies (See Figure <figr fid="F12">12</figr>). This would imply that the domain generates a smaller signaling molecule. However no evidence has been found of such a molecule and several pieces of evidence conflict with this hypothesis. Firstly the nematode SCP-containing Neutrophil Inhibitory Factor (NIH) binds directly to integrins CD11b/CD18 on the neutrophil cell surface <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Secondly pseudochetoxin (from King brown snake) appears to bind the extracellular portion of cyclic-nucleotide gated ion channels (CNG channels) blocking their function <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. In the second case there does appear to be time-lag between association or disassociation and blocking or release of the gate. This does seem to suggest that its mode of action is not simply as a steric block.</p>
                  <p>SCP-containing proteins are involved in a tremendously wide range of processes, and found to be essential in plants (PR1-like proteins), mammals, lizards, insects (venom allergens) and nematodes. It appears likely that they are similarly important and similarly multi-function in bacteria, and hence are an important target for further analysis.</p>
               </sec>
               <sec>
                  <st>
                     <p>FG-GAP (PF01839)</p>
                  </st>
                  <p>Several <it>S. coelicolor </it>proteins were identified that were found to be related to FG-GAP repeats. The Pfam family from version 7.4 contained only 5 bacterial members; the updated family in Pfam 7.5 is found in thirty nine bacterial proteins &#8211; including fourteen in <it>S. coelicolor </it>(see Figure <figr fid="F13">13</figr> for distribution of FG-GAP repeats in bacteria). An extra thirty-four eukaryotic family members are also identified, as well as an archaeal protein (Swiss:O28333). The FG-GAP repeats have been predicted to assume a &#946;-propeller conformation. The occurrence of this repeat as sets of four or five tandem copies casts doubt on this (e.g. Swiss:ITA2_DROME), as they are normally six or seven bladed <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. However the hemopexin repeat (PF00045) has been seen as a four-bladed propeller e.g. as in mammalian blood serum haemopexin glycosylated-native protein (PDB:1qjs), so perhaps FG-GAP repeats might be more structurally similar to these repeats.</p>
                  <fig id="F13">
                     <title>
                        <p>Figure 13</p>
                     </title>
                     <caption>
                        <p>Species tree showing the distribution of FG-GAP proteins in bacteria according to Pfam 7.5</p>
                     </caption>
                     <text>
                        <p>Species tree showing the distribution of FG-GAP proteins in bacteria according to Pfam 7.5. The broad distribution indicates that more thorough searching may find them to be ubiquitous.</p>
                     </text>
                     <graphic file="1471-2180-3-3-13"/>
                  </fig>
               </sec>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>The primary purpose of this research was to identify novel protein domains for which information could be easily derived, and that were of biological significance to <it>Streptomyces coelicolor</it>. To manually investigate every single protein is an immensely time-consuming enterprise, and it would not be possible to add significant annotation to many of the families built. However fully automatic methods of family building lack precision, and the automated production of detailed annotation is currently not feasible. Hence we employed a combination in an LHF ("low-hanging fruit") process in order to concentrate on potentially the most interesting observations.</p>
         <p>To underline the speed of this approach there are 204 copies of the novel domains listed in Table <tblr tid="T1">1</tblr> in <it>S. coelicolor</it>, not including the SCP and FG-GAP families. In order to discover this many domains in <it>S. coelicolor </it>it was only necessary to investigate 145 potential families, most of which could be discarded quickly. The primary reason for this was that no matches were found to other proteins. This suggests that once a sufficient number of genomes have been sequenced comparative scans like this one will be very useful. The BTAD domain is the only domain not derived directly from a target, but rather the region was highlighted by the investigation.</p>
         <p>Examples, such as the PASTA domain, also demonstrate that reasonably large gains in biological knowledge could be made through the delineation of the domain structures of these proteins and the taxonomical distribution of the domains. Similarly with SCO0002 and SCO0003 a strong functional link can be made between them due to the occurrence of HA domains in the C-termini of both of them. We hypothesise that the HA domains bind DNA, most likely telomere-specific structures, based on secondary structural similarities to the Myb-like DNA-binding (PF00249) domain. Previously such a hypothesis could only be made based solely on their close proximity within the telomeres of the chromosome.</p>
         <p>Not all the predictions made lead to the identification of novel domains but rather to the expansion of known domain families. Most of these are not reported as they do not particularly enhance our understanding of the domains or <it>S. coelicolor</it>; however the extension of the SCP domain into prokaryotes does appear to be significant. The substantial differences in sequence conservation suggest that the prokaryotic versions are not simply the product of lateral transfers, but are of ancient origin. The lack of conservation of the cysteines, after which the domain was originally named, suggests that they are not functionally important but are involved in stabilizing the protein over the greater distances involved in eukaryotic signaling. In contrast the conservation of three of the four proposed active site residues confirms that these are the functionally significant residues. The apparent importance of SCPs in eukaryotes suggests that these domains will prove to be similarly important in bacteria.</p>
         <p>It is important to recognize when basing future work on bioinformatic studies such as this one, that the results are sets of hypotheses rather than true descriptions. This does not detract from the success of such approaches. Previously a researcher investigating an HHE-containing protein would have known little about it apart from the sequence; now three strong candidates for the functional or active site residues are clear and a putative function (cation-binding) assigned that can be tested. Also once one member of a family is described information can be transferred to its relations. This is enhanced by the deposition of the families into Pfam; any further investigations into the streptomycetes using Pfam will automatically annotate these domains, increasing the knowledge and understanding of these remarkable organisms.</p>
         <sec>
            <st>
               <p>Supplementary Information</p>
            </st>
            <p><ul>S1</ul>: Architecture diagrams for all HA, BTAD, SPDY, PASTA and HHE domain-containing proteins in <it>S. coelicolor</it>, as well as other proteins referred to in the text. Architectures are based on data from Pfam and SMART. Domain names are as given in Pfam and SMART. Small orange boxes at the N-termini indicate signal peptide sequences. TM indicates transmembrane regions.</p>
            <p><ul>S2</ul>: Architecture diagrams for all PPC, FMN_bind and MbtH domain-containing proteins in <it>S. coelicolor</it>, as well as other proteins referred to in the text. Architectures are based on data from Pfam and SMART. Small orange boxes at the N-termini indicate signal peptide sequences. TM indicates transmembrane regions.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Author's Contributions</p>
         </st>
         <p>CY carried was involved in all aspects of the work. SB was involved in annotation of the families, providing <it>S. coelicolor</it>-specific information and supervising writing of the manuscript. AB designed the search methodologies and provided advice on family annotation as well as supervising writing of the manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Correction</p>
         </st>
         <p>Subsequent to submission of this manuscript it came to the authors' attention that SCP domains have previously been described in prokaryotes <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>; so this section should be considered as a formal report of the prokaryotic version rather than an initial observation.</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>Pfam accession numbers are indicated by '(PF#####)', # represents a numeral.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgments</p>
            </st>
            <p>The Pfam group for help with software use and interaction with the Pfam database. Robert Finn for his previous work on the PASTA domain.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2).</p>
            </title>
            <aug>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Chater</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Cerdeno-Tarraga</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Challis</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Thomson</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Kieser</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Harper</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>417</volume>
            <fpage>141</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/417141a</pubid>
                  <pubid idtype="pmpid" link="fulltext">12000953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Searching databases to find protein domain organization.</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Advances in Protein Chemistry, Vol 54</source>
            <pubdate>2000</pubdate>
            <volume>54</volume>
            <fpage>137</fpage>
            <lpage>157</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Structural Principles for the Propeller Assembly of Beta-Sheets &#8211; the Preference for 7-Fold Symmetry.</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
            </aug>
            <source>Proteins-Structure Function and Genetics</source>
            <pubdate>1992</pubdate>
            <volume>14</volume>
            <fpage>191</fpage>
            <lpage>201</lpage>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Rapid and Sensitive Sequence Comparison with Fastp and Fasta.</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Methods in Enzymology</source>
            <pubdate>1990</pubdate>
            <volume>183</volume>
            <fpage>63</fpage>
            <lpage>98</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2156132</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii.</p>
            </title>
            <aug>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>LX</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>FitzGerald</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>273</volume>
            <fpage>1058</fpage>
            <lpage>1073</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8688087</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption.</p>
            </title>
            <aug>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>In Silico Biology</source>
            <pubdate>1998</pubdate>
            <volume>1</volume>
            <fpage>55</fpage>
            <lpage>67</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11471243</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comparative assessment of large-scale data sets of protein-protein interactions.</p>
            </title>
            <aug>
               <au>
                  <snm>von Mering</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cornell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>SG</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>417</volume>
            <fpage>399</fpage>
            <lpage>403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature750</pubid>
                  <pubid idtype="pmpid" link="fulltext">12000970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The natural history of protein domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Russell</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Annual Review of Biophysics and Biomolecular Structure</source>
            <pubdate>2002</pubdate>
            <volume>31</volume>
            <fpage>45</fpage>
            <lpage>71</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.biophys.31.082901.134314</pubid>
                  <pubid idtype="pmpid" link="fulltext">11988462</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Modular Arrangement of Proteins as Inferred from Analysis of Homology.</p>
            </title>
            <aug>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Protein Science</source>
            <pubdate>1994</pubdate>
            <volume>3</volume>
            <fpage>482</fpage>
            <lpage>492</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8019419</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The Pfam Protein Families Database.</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cerruti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Etwiller</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Howe</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>276</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99071</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752314</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Novel protein domains and repeats in Drosophila melanogaster: Insights into structure, function, and evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Mott</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1996</fpage>
            <lpage>2008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.198701</pubid>
                  <pubid idtype="pmpid" link="fulltext">11731489</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Statistics of Local Complexity in Amino-Acid-Sequences and Sequence Databases.</p>
            </title>
            <aug>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Computers &amp; Chemistry</source>
            <pubdate>1993</pubdate>
            <volume>17</volume>
            <fpage>149</fpage>
            <lpage>163</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0097-8485(93)85006-X</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <aug>
               <au>
                  <snm>Mott</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <pubdate>2000</pubdate>
            <url>http://www.well.ox.ac.uk/rmott/ARIADNE/prospero.shtml</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>HMMER: Profile hidden Markov models for biological sequence analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>45</fpage>
            <lpage>48</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102476</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592178</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.45</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>T-Coffee: A novel method for fast and accurate multiple sequence alignment.</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Recent improvements to the SMART domain-based sequence annotation resource.</p>
            </title>
            <aug>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Goodstadt</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dickens</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Doerks</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mott</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ciccarelli</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Copley</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>242</fpage>
            <lpage>244</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99073</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752305</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.242</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The PROSITE database, its status in 2002.</p>
            </title>
            <aug>
               <au>
                  <snm>Falquet</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pagni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hulo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sigrist</snm>
                  <fnm>CJA</fnm>
               </au>
               <au>
                  <snm>Hofmann</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>235</fpage>
            <lpage>238</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99105</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752303</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The InterPro database, an integrated documentation resource for protein families, domains and functional sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Attwood</snm>
                  <fnm>TK</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Biswas</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cerutti</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Corpet</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Croning</snm>
                  <fnm>MDR</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>37</fpage>
            <lpage>40</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29841</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125043</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.37</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Clustal-W &#8211; Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice.</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7984417</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Engelbrecht</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Protein Engineering</source>
            <pubdate>1997</pubdate>
            <volume>10</volume>
            <fpage>1</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/10.1.1</pubid>
                  <pubid idtype="pmpid" link="fulltext">9051728</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>2001</pubdate>
            <volume>305</volume>
            <fpage>567</fpage>
            <lpage>580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4315</pubid>
                  <pubid idtype="pmpid" link="fulltext">11152613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Predicting Coiled Coils from Protein Sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Lupas</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vandyke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stock</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1991</pubdate>
            <volume>252</volume>
            <fpage>1162</fpage>
            <lpage>1164</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2031185</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>PHD: Predicting one-dimensional protein structure by profile-based neural networks.</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>In: Computer Methods for Macromolecular Sequence Analysis</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>525</fpage>
            <lpage>539</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Artemis: sequence visualization and annotation.</p>
            </title>
            <aug>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Crook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Horsnell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>944</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.10.944</pubid>
                  <pubid idtype="pmpid" link="fulltext">11120685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39.</p>
            </title>
            <aug>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Brunham</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Berry</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>1397</fpage>
            <lpage>1406</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">111046</pubid>
                  <pubid idtype="pmpid" link="fulltext">10684935</pubid>
                  <pubid idtype="doi">10.1093/nar/28.6.1397</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The homologous terminal sequence of the Streptomyces lividans chromosome and SLP2 plasmid.</p>
            </title>
            <aug>
               <au>
                  <snm>Bey</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Tsou</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Microbiology-Uk</source>
            <pubdate>2000</pubdate>
            <volume>146</volume>
            <fpage>911</fpage>
            <lpage>922</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The domains of death: evolution of the apoptosis machinery.</p>
            </title>
            <aug>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dixit</snm>
                  <fnm>VM</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Trends in Biochemical Sciences</source>
            <pubdate>1999</pubdate>
            <volume>24</volume>
            <fpage>47</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(98)01341-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">10098397</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>afsR is a pleiotropic but conditionally required regulatory gene for antibiotic production in Streptomyces coelicolor A3(2).</p>
            </title>
            <aug>
               <au>
                  <snm>Floriano</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bibb</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Molecular Microbiology</source>
            <pubdate>1996</pubdate>
            <volume>21</volume>
            <fpage>385</fpage>
            <lpage>396</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.1996.6491364.x</pubid>
                  <pubid idtype="pmpid">8858592</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Protein serine/threonine kinases in signal transduction for secondary metabolism and morphogenesis in Streptomyces.</p>
            </title>
            <aug>
               <au>
                  <snm>Umeyama</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Horinouchi</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Applied Microbiology and Biotechnology</source>
            <pubdate>2002</pubdate>
            <volume>59</volume>
            <fpage>419</fpage>
            <lpage>425</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00253-002-1045-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12172604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Mapping the DNA-binding domain and target sequences of the <it>Streptomyces peucitis </it>daunorubicin biosynthesis regulatory protein DnrI.</p>
            </title>
            <aug>
               <au>
                  <snm>Sheldon</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Busarow</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Hutchinson</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Molecular Microbiology</source>
            <pubdate>2002</pubdate>
            <volume>44</volume>
            <fpage>449</fpage>
            <lpage>460</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2002.02886.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11972782</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The PASTA domain: a beta-lactam-binding domain.</p>
            </title>
            <aug>
               <au>
                  <snm>Yeats</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends in Biochemical Sciences</source>
            <pubdate>2002</pubdate>
            <volume>27</volume>
            <fpage>438</fpage>
            <lpage>440</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0968-0004(02)02164-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">12217513</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Nitric oxide signalling and transcriptional control of denitrification genes in <it>Pseudomonas stutzeri</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Vollack</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Zumft</snm>
                  <fnm>WG</fnm>
               </au>
            </aug>
            <source>Journal of Bacteriology</source>
            <pubdate>2001</pubdate>
            <volume>183</volume>
            <fpage>2516</fpage>
            <lpage>2526</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">95168</pubid>
                  <pubid idtype="pmpid" link="fulltext">11274111</pubid>
                  <pubid idtype="doi">10.1128/JB.183.8.2516-2526.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>A novel NO-repsonding regulator controls the reduction of nitric oxide in <it>Ralstonia eutropha</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Pohlmann</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cramm</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schmeiz</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Friedrich</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Molecular Microbiology</source>
            <pubdate>2000</pubdate>
            <volume>38</volume>
            <fpage>626</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2000.02157.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11069685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The <it>Staphylococcus aureus </it>scdA gene: a novel locus that affects cell division and morphogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Brunskill</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>de Jonge</snm>
                  <fnm>BLM</fnm>
               </au>
               <au>
                  <snm>Bayles</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Microbiology</source>
            <pubdate>1997</pubdate>
            <volume>38</volume>
            <fpage>626</fpage>
            <lpage>638</lpage>
         </bibl>
         <bibl id="B37">
            <title>
               <p>MEROPS: the protease database.</p>
            </title>
            <aug>
               <au>
                  <snm>Rawlings</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>343</fpage>
            <lpage>346</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">99100</pubid>
                  <pubid idtype="pmpid" link="fulltext">11752332</pubid>
                  <pubid idtype="doi">10.1093/nar/30.1.343</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Expression and mutagenesis of the NqrC subunit of the NQR respiratory Na+ pump from Vibrio cholerae with covalently attached FMN.</p>
            </title>
            <aug>
               <au>
                  <snm>Barquera</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hase</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Gennis</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Febs Letters</source>
            <pubdate>2001</pubdate>
            <volume>492</volume>
            <fpage>45</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(01)02224-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11248234</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>FMN is covalently attached to a threonine residue in the NqrB and NqrC subunits of Na(+)-translocating NADH-quinone reductase from <it>Vibrio alginolyticus</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Hayashi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nakayama</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yasui</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Maeda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furuishi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Unemoto</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>FEBS Letters</source>
            <pubdate>2001</pubdate>
            <volume>488</volume>
            <fpage>5</fpage>
            <lpage>8</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(00)02404-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11163785</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Overexpression in <it>Escherichia coli </it>of the rnf genes from <it>Rhodobacter capsulatus </it>&#8211; characterisation of two membrane-bound iron-sulfur proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Jouanneau</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H-S</fnm>
               </au>
               <au>
                  <snm>Hugo</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Meye</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Willison</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>European Journal of Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>251</volume>
            <fpage>54</fpage>
            <lpage>64</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1432-1327.1998.2510054.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">9492268</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Molecular characterisation of co-transcribed genes from <it>Streptomyces tendae </it>Tu901 involved in the biosynthesis of the peptidyl moiety and assembly of the peptidyl nucleoside antibiotic nikkomycin.</p>
            </title>
            <aug>
               <au>
                  <snm>Lauer</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Russwurm</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schwarz</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Kalmanczhelyi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bruntner</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>A</snm>
                  <fnm>Rosemeier</fnm>
               </au>
               <au>
                  <snm>Bormann</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Molecular and General Genetics</source>
            <pubdate>2001</pubdate>
            <volume>264</volume>
            <fpage>662</fpage>
            <lpage>673</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004380000352</pubid>
                  <pubid idtype="pmpid">11212921</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Cloning and analysis of the simocyclonine biosynthetic gene cluster of <it>Streptomyces antibioticus </it>Tu 6040.</p>
            </title>
            <aug>
               <au>
                  <snm>Galm</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Schima</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fiedler</snm>
                  <fnm>HP</fnm>
               </au>
               <au>
                  <snm>Schimdt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Heide</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Archives of Microbiology</source>
            <pubdate>2002</pubdate>
            <volume>178</volume>
            <fpage>102</fpage>
            <lpage>114</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00203-002-0429-z</pubid>
                  <pubid idtype="pmpid" link="fulltext">12115055</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Identification of the coumermycin A1 biosynthetic gene cluster of <it>Streptomyces rishiriensis </it>DSM 40489.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>ZX</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Heide</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Antimicrobial Agents and Chemotherapy</source>
            <pubdate>2000</pubdate>
            <volume>44</volume>
            <fpage>3040</fpage>
            <lpage>3048</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">101600</pubid>
                  <pubid idtype="pmpid" link="fulltext">11036020</pubid>
                  <pubid idtype="doi">10.1128/AAC.44.11.3040-3048.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Genetics and assembly line enzymology of siderophore biosynthesis in bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Crosa</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Walsh</snm>
                  <fnm>CT</fnm>
               </au>
            </aug>
            <source>Microbiology and Molecular Biology Reviews</source>
            <pubdate>2002</pubdate>
            <volume>66</volume>
            <fpage>223</fpage>
            <lpage>249</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">120789</pubid>
                  <pubid idtype="pmpid" link="fulltext">12040125</pubid>
                  <pubid idtype="doi">10.1128/MMBR.66.2.223-249.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Structure comparison of a human glioma pathogenesis-related protein GliPR and the plant pathogenesis-related protein P14a indicates a functional link between the human immune system and a plant defense system.</p>
            </title>
            <aug>
               <au>
                  <snm>Szyperski</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fernandez</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mumenthaler</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wuthrich</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>2262</fpage>
            <lpage>2266</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.95.5.2262</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Expression pattern, subcellular localization and structure-function relationship of rat Tpx-1, a spermatogenic cell adhesion molecule responsible for association with Sertoli cells.</p>
            </title>
            <aug>
               <au>
                  <snm>Maeda</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nakanishi</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Development Growth &amp; Differentiation</source>
            <pubdate>1999</pubdate>
            <volume>41</volume>
            <fpage>715</fpage>
            <lpage>722</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1046/j.1440-169x.1999.00470.x</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>A comparative analysis of expression and processing of the rat epididymal fluid and sperm-bound forms of proteins D and E.</p>
            </title>
            <aug>
               <au>
                  <snm>Roberts</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>Ensrud</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Hamilton</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Biology of Reproduction</source>
            <pubdate>2002</pubdate>
            <volume>67</volume>
            <fpage>525</fpage>
            <lpage>533</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12135891</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>cDNA cloning of a novel trypsin inhibitor with similarity to pathogenesis-related proteins and its frequent expression in human brain cancer cells.</p>
            </title>
            <aug>
               <au>
                  <snm>Yamakawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Miyata</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Oqawa</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Koshikawa</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yasumitsu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kanamori</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Miyazaki</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Biochimica et Biophysica Acta</source>
            <pubdate>1998</pubdate>
            <volume>1395</volume>
            <fpage>202</fpage>
            <lpage>208</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0167-4781(97)00149-8</pubid>
                  <pubid idtype="pmpid">9473672</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Identification of differentially expressed genes in normal and malignant prostrate by electronic profiling of expressed sequence tags.</p>
            </title>
            <aug>
               <au>
                  <snm>Asmann</snm>
                  <fnm>YW</fnm>
               </au>
               <au>
                  <snm>Kosari</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Cheville</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Vasmatzis</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Cancer Research</source>
            <pubdate>2002</pubdate>
            <volume>62</volume>
            <fpage>3308</fpage>
            <lpage>3314</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12036949</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Allurin, a 21-kDa sperm chemoattractant from <it>Xenopus </it>egg jelly, is related to mammalian sperm-binding proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Olson</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Xiang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ziegert</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kittelson</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rawls</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bieber</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Chandler</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>11205</fpage>
            <lpage>11210</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.211316798</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>A hookworm glycoprotein that inhibits neutrophil function is a ligand of the integrin CD11b/CD18.</p>
            </title>
            <aug>
               <au>
                  <snm>Moyle</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Foster</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>McGrath</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Laroche</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>De Meutter</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stanssens</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bogowitz</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Fried</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Ely</snm>
                  <fnm>JA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Journal of Biological Chemistry</source>
            <pubdate>1994</pubdate>
            <volume>269</volume>
            <fpage>10008</fpage>
            <lpage>10015</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7908286</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Pseudochetoxin: A peptide blocker of cyclic nucleotide-gated channels.</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Haley</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>West</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Crabb</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>754</fpage>
            <lpage>759</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1073/pnas.96.2.754</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Comparative genome analysis of the pathogenic spirochetes Borrelia burgdorferi and Treponema pallidum.</p>
            </title>
            <aug>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Infection and Immunity</source>
            <pubdate>2000</pubdate>
            <volume>68</volume>
            <fpage>1633</fpage>
            <lpage>1648</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">97324</pubid>
                  <pubid idtype="pmpid" link="fulltext">10678983</pubid>
                  <pubid idtype="doi">10.1128/IAI.68.3.1633-1648.2000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Protein fold recognition using sequence profiles and its application in structural genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>In: Advances in Protein Chemistry</source>
            <pubdate>2000</pubdate>
            <volume>54</volume>
            <fpage>245</fpage>
            <lpage>275</lpage>
         </bibl>
         <bibl id="B55">
            <title>
               <p>MHYT, a new integral membrane sensor domain.</p>
            </title>
            <aug>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Gaidenko</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Mulkidjanian</snm>
                  <fnm>AY</fnm>
               </au>
               <au>
                  <snm>Nakano</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Price</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>Fems Microbiology Letters</source>
            <pubdate>2001</pubdate>
            <volume>205</volume>
            <fpage>17</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1097(01)00424-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11728710</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>C-terminal domain of gyrase A is predicted to have a beta-propeller structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Qi</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Pei</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Proteins-Structure Function and Genetics</source>
            <pubdate>2002</pubdate>
            <volume>47</volume>
            <fpage>258</fpage>
            <lpage>264</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/prot.10090</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
