<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2229-10-14</ui><ji>1471-2229</ji><fm>
<dochead>Database</dochead>
<bibl>
<title>
<p>SoyDB: a knowledge database of soybean transcription factors</p>
</title>
<aug>
<au id="A1"><snm>Wang</snm><fnm>Zheng</fnm><insr iid="I1"/><email>zwyw6@mail.missouri.edu</email></au>
<au id="A2"><snm>Libault</snm><fnm>Marc</fnm><insr iid="I2"/><insr iid="I3"/><email>libaultm@missouri.edu</email></au>
<au id="A3"><snm>Joshi</snm><fnm>Trupti</fnm><insr iid="I1"/><insr iid="I2"/><email>joshitr@missouri.edu</email></au>
<au id="A4"><snm>Valliyodan</snm><fnm>Babu</fnm><insr iid="I2"/><insr iid="I3"/><email>valliyodanb@missouri.edu</email></au>
<au id="A5"><snm>Nguyen</snm><mi>T</mi><fnm>Henry</fnm><insr iid="I2"/><insr iid="I3"/><email>nguyenhenry@missouri.edu</email></au>
<au id="A6"><snm>Xu</snm><fnm>Dong</fnm><insr iid="I1"/><insr iid="I2"/><insr iid="I4"/><email>xudong@missouri.edu</email></au>
<au id="A7"><snm>Stacey</snm><fnm>Gary</fnm><insr iid="I2"/><insr iid="I3"/><email>staceyg@missouri.edu</email></au>
<au ca="yes" id="A8"><snm>Cheng</snm><fnm>Jianlin</fnm><insr iid="I1"/><insr iid="I2"/><insr iid="I4"/><email>chengji@missouri.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Computer Science Department, University of Missouri, Columbia, MO 65211, USA</p></ins>
<ins id="I2"><p>Christopher S Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA</p></ins>
<ins id="I3"><p>Division of Plant Sciences, National Center for Soybean Biotechnology, Christopher S Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA</p></ins>
<ins id="I4"><p>Informatics Institute, University of Missouri, Columbia, MO 65211, USA</p></ins>
</insg>
<source>BMC Plant Biology</source>
<issn>1471-2229</issn>
<pubdate>2010</pubdate>
<volume>10</volume>
<issue>1</issue>
<fpage>14</fpage>
<url>http://www.biomedcentral.com/1471-2229/10/14</url>
<xrefbib><pubidlist><pubid idtype="pmpid">20082720</pubid><pubid idtype="doi">10.1186/1471-2229-10-14</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>11</day><month>8</month><year>2009</year></date></rec><acc><date><day>18</day><month>1</month><year>2010</year></date></acc><pub><date><day>18</day><month>1</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Wang et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Transcription factors play the crucial rule of regulating gene expression and influence almost all biological processes. Systematically identifying and annotating transcription factors can greatly aid further understanding their functions and mechanisms. In this article, we present SoyDB, a user friendly database containing comprehensive knowledge of soybean transcription factors.</p>
</sec>
<sec>
<st>
<p>Description</p>
</st>
<p>The soybean genome was recently sequenced by the Department of Energy-Joint Genome Institute (DOE-JGI) and is publicly available. Mining of this sequence identified 5,671 soybean genes as putative transcription factors. These genes were comprehensively annotated as an aid to the soybean research community. We developed SoyDB - a knowledge database for all the transcription factors in the soybean genome. The database contains protein sequences, predicted tertiary structures, putative DNA binding sites, domains, homologous templates in the Protein Data Bank (PDB), protein family classifications, multiple sequence alignments, consensus protein sequence motifs, web logo of each family, and web links to the soybean transcription factor database PlantTFDB, known EST sequences, and other general protein databases including Swiss-Prot, Gene Ontology, KEGG, EMBL, TAIR, InterPro, SMART, PROSITE, NCBI, and Pfam. The database can be accessed via an interactive and convenient web server, which supports full-text search, PSI-BLAST sequence search, database browsing by protein family, and automatic classification of a new protein sequence into one of 64 annotated transcription factor families by hidden Markov models.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>A comprehensive soybean transcription factor database was constructed and made publicly accessible at <url>http://casp.rnet.missouri.edu/soydb/</url>.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Soybean is a great source of protein, as it contains significant amounts of all the essential amino acids, including some that cannot be synthesized by the human body <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. Soybean has been used as a food and a drug component in China for thousands of years <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp> and over the past 60 years has become a leading crop in many nations around the world <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. Because of its high value in the agricultural and food industry, soybean has received greater and greater research attention, both to improve soybean agronomic performances and as a model for basic biological studies. In early 2008, the Department of Energy-Joint Genome Institute (DOE-JGI) finished sequencing the soybean genome using a whole-genome shotgun approach <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>, which makes soybean the most complex plant so far ever sequenced <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. The homology-based gene prediction and annotation produced putative protein sequences <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
</abbrgrp>, which makes it feasible to identify and annotate soybean transcription factors.</p>
<p>Transcription factors (TF) are proteins that bind to DNA sequences (<it>i.e.</it>, promoters) and regulate gene expression by one or more DNA binding domains. Virtually all biological processes are directly regulated or influenced by transcription factors <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. For example, the transcription process in eukaryotes would not occur in the absence of a specific class of transcription factors named "general transcription factors" <abbrgrp>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
</abbrgrp>. Studies have shown that transcription factors are closely involved in the process of cell development, such as cellular division, migration, and differentiation <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>. Transcription factors of <it>Arabidopsis thaliana </it>have been well studied since its genome has been fully sequenced as a model specie <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp>. This makes it possible to identify and study transcription factors of other newly sequenced species, such as soybean, by homology searching and comparative analysis.</p>
<p>Several databases for soybean genome analysis have been built and made publicly available, such as SoyGD <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>, SoyBase <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>, and SoyXpress <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. These databases contain a variety of information, such as soybean genome sequences, bacterial artificial chromosome (BAC), expressed sequence tags (EST), and some useful tools including genome browsers, BLAST searching, and pathway searching. However, these databases only contain general annotations for the soybean genome, instead of knowledge specifically targeting the transcription factors. For example, none of them systematically organizes transcription factors into families or clearly points out the DNA binding domains. PlantTFDB <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp> and DBD <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> are two existent transcription factor databases, which contain knowledge about transcription factors from multiple species. For each soybean transcription factor, PlantTFDB contains information including protein sequence, Gene Ontology annotation <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>, putative binding domains found by InterProScan <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp>, and cross-links to external databases, including EMBL <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>, UniProt <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>, RefSeq <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>, and TRANSFAC <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. DBD contains the amino acid sequence of each transcription factor and external links to Ensembl <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>, Pfam <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, and SUPERFAMILY <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. Compared to PlantTFDB, DBD has less external database links, but DBD claims to contain the transcription factors of 927 completely sequenced genomes whereas PlantTFDB covers 22 species. The knowledge in these two databases is very useful; however, they were built based on a relatively older version of soybean sequence data and their annotations are still incomplete. The most important component they lack is the three dimensional structure for each transcription factor, because the visualization of the transcription factor, especially binding sites, can help further understanding the mechanism and functions of transcription factors, which is indispensible to structural genomics <abbrgrp>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
</abbrgrp>. Furthermore, with the complete genome sequences of more and more species available, a computer system is needed that can automatically generate a knowledge database and publish it with a user-friendly interface.</p>
<p>To fill the gap, we developed SoyDB - a comprehensive and integrated database for soybean transcription factors. This database not only contains most of the content and features already existed in PlantTFDB and DBD, but also extends them by containing more comprehensive knowledge and links to more versatile external datasets. The annotations in SoyDB include predicted tertiary structures, protein domains, multiple sequence alignments, DNA binding sites, and web logos and consensus sequences for each family. The SoyDB database also contains links to the homologous EST sequences, and the same or homologous proteins in other databases including PlantTFDB, PDB <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>, Swiss-Prot <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>, TAIR <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>, RefSeq <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>, SMART <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>, Pfam <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, KEGG <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp>, SPRINTS <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>, EMBL <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>, InterPro <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>, PROSITE <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>, and Gene Ontology <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>.</p>
<p>Moreover, our system can automatically execute bioinformatics tools and generate annotations, link to other well-known protein databases, construct MySQL databases, and generate PHP scripts to build its website. This fully automated approach can be used to create a protein annotation database and website for any sequenced organism in the future.</p>
</sec>
<sec>
<st>
<p>Construction and Content</p>
</st>
<sec>
<st>
<p>Database Overview</p>
</st>
<p>SoyDB contains the annotations of 5,671 putative transcription factors. These proteins were classified into 64 families (for details see the section "Transcription Factor Family Prediction Using SAM Hidden Markov Models"). Figure <figr fid="F1">1</figr> illustrates the architecture of the SoyDB website. Users can access the main components from the home page: full-text search, PSI-BLAST sequence search, family classification by hidden Markov model, family browsing, protein browsing, family information, protein information, and FTP site.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Architecture of SoyDB website</p></caption><text>
   <p><b>Architecture of SoyDB website</b>. Main annotation components include family information page and sequence information page. Main functional components include full-text search, HMM family classification, PSI-BLAST sequential search, and FTP downloading.</p>
</text><graphic file="1471-2229-10-14-1"/></fig>
</sec>
<sec>
<st>
<p>Data Source</p>
</st>
<p>The soybean genome sequences and gene model predictions used in this study were acquired from the publicly available database Phytozome <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. These sequences were generated by the preliminary GenomeScan <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp>, FgenesH <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>, and PASA <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp> gene annotations based on the Gm1.01 version of soybean assembly data <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Transcription Factor Identification</p>
</st>
<p>We used the standalone versions of InterProScan <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> to search all the soybean protein sequences against 11 databases integrated in InerPro <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. These databases and their corresponding scanning methods include: PROSITE (<it>pfscan</it>) <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>, PRINTS (<it>FingerPRINTScan</it>) <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>, Pfam (<it>HMMPfam</it>) <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, ProDom (<it>ProDomBlast3i</it>) <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp>, SMART (<it>HMMSmart</it>) <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>, TIGRFAMs (<it>HMMTigr</it>) <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>, PIR SuperFamily (<it>HMMPIR</it>) <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>, SUPERFAMILY (<it>superfamily</it>) <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp>, Gene3D (<it>gene3d</it>) <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>, PANTHER (<it>HMMPanther</it>) <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, and HAMAP (<it>pfscan</it>) <abbrgrp>
<abbr bid="B50">50</abbr>
</abbrgrp>. InterProScan systematically searches each of these databases using their corresponding scanning methods to find domains. The proteins predicted to contain TF related domain(s) were considered as putative transcription factors. Using the Plant Transcription Factor Database (PlnTFDB) <abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp> and the classification of <it>Medicago truncatula </it>TF genes <abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp> as references, we manually curated the list of putative transcription factors and eliminated any mistakenly identified sequences. In this way, we identified 5,671 putative TF sequences.</p>
</sec>
<sec>
<st>
<p>Transcription Factor Family Prediction Using SAM Hidden Markov Models</p>
</st>
<p>The transcription factors of <it>Arabidopsis thaliana </it>have been well studied and classified into 64 families <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>. This provides a model for us to classify soybean transcription factors. We used MUSCLE <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp> to generate a multiple sequence alignment for each <it>Arabidopsis thaliana </it>TF family. The multiple sequence alignment was then input into SAM 3.5 <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp> to build a hidden Markov model (HMM) for each family. Every soybean TF sequence was aligned with each of the 64 HMMs, which outputs an e-value. This e-value can be considered as a fitness score between a TF sequence and a hidden Markov model: lower e-value indicates better fitness. Finally, a transcription factor was classified into the family whose HMM yields the lowest e-value.</p>
</sec>
<sec>
<st>
<p>Annotations Using Bioinformatics Tools</p>
</st>
<p>The standalone versions of several bioinformatics tools were locally installed and executed to generate annotations for soybean transcription factors. An accurate protein structure prediction tool MULTICOM <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp> was used to predict the tertiary structure of each transcription factor when homologous template proteins could be found in PDB. According to the official evaluations of the 8<sup>th </sup>community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP8) <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>, MULTICOM was able to predict high-accuracy tertiary structures with an average GDT-TS score <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp> 0.87 if suitable templates can be found. GDT-TS score ranges from 0 to 1 measuring the similarities between the predicted and experimental structures, whereas 1 indicates completely the same and 0 completely different. Figure <figr fid="F2">2</figr> illustrates the predicted tertiary structure of a transcription factor in SoyDB with ID GM00002, and the electrostatic polarization of the predicted structure. The blue area in the electrostatic polarization shows residues positively charged, which is found to be highly identical to the green area in Figure <figr fid="F2">2(b)</figr>, which is the putative DNA-binding sites identified by a pair-wise alignment between GM00002 and its template protein <ext-link ext-link-id="1WID" ext-link-type="pdb">1WID</ext-link> (Figure <figr fid="F2">2(d)</figr>). Since it has been studied and found that the DNA-binding area is positively charged if analyzed by electrostatic potentials <abbrgrp>
<abbr bid="B57">57</abbr>
</abbrgrp>, the highly identical area in Figure <figr fid="F2">2(a)</figr> and <figr fid="F2">2(b)</figr> strongly confirms that the predicted structure has the electrostatic properties of a transcription factor. This further confirms the qualities of MULTICOM predictions and the correctness of the predicted binding sites derived from the homology alignment. In SoyDB, a predicted tertiary structure is visualized by Jmol <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>. In order to clearly visualize the tertiary structure of the DNA-binding region, only the segments containing homologous DNA binding domains are visualized by Jmol. Users can view a TF structure from various perspectives in a three-dimensional way and perform many operations including selecting and highlighting interested regions, changing view styles and colors, and measuring atom distances and angles by right clicking on the Jmol console and selecting corresponding menus. Detailed instructions about Jmol menus can be found at Jmol website <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>. During the structure prediction process, MULTICOM generates the sequence alignments between the transcription factor and its homologous templates using PSI-BLAST. These sequence alignments can be used to predict the binding sites of a transcription factor based on the experimentally determined binding sites of its template as shown in Figure <figr fid="F2">2</figr>.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>The predicted structure of a transcription factor in SoyDB</p></caption><text>
   <p><b>The predicted structure of a transcription factor in SoyDB</b>. The electrostatic polarization (a) (blue, positive; red, negative), sphere (b) and ribbon (c) visualizations of MULTICOM predicted structure for GM00002. (d) a segment of the pair-wise alignment between <ext-link ext-link-type="pdb" ext-link-id="1WID">1WID</ext-link> (PDB template of GM00002) and GM00002, and, below, the DNA-binding site predictions from three independent tools: DNABindR <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>, BindN <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>, and DP-Bind <abbrgrp><abbr bid="B72">72</abbr></abbrgrp> ("+" indicates predicted DNA-binding positions, "-" indicates gap or no prediction). The green regions in the sequence of <ext-link ext-link-type="pdb" ext-link-id="1WID">1WID</ext-link> are the DNA-binding regions identified by experimental methods <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. The green regions in GM00002 sequence are the two DNA-binding regions derived from the alignment with <ext-link ext-link-type="pdb" ext-link-id="1WID">1WID</ext-link>. The predicted DNA-binding regions in GM00002 are illustrated in green in (b) and (c). (c) the side chains of the predicted binding regions. (a), (d), and (c) are in the same orientation. The electrostatic polarization (a) was computed and mapped to protein surface by Swiss-PDB viewer (deep view) <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>, and the structures in sphere (b) and ribbon styles (c) were made with PyMol <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>.</p>
</text><graphic file="1471-2229-10-14-2"/></fig>
<p>A predicted structure was parsed into domains by Protein Domain Parser (PDP) <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. Since some transcription factors did not have homologous templates found in PDB, DOMAC <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>, an accurate <it>ab initio </it>domain prediction tool, was also used to predict domains for each transcription factor.</p>
<p>The protein sequences in the same family were aligned by MUSCLE <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp> and visualized by WebLogo <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. A consensus sequence was derived from the multiple sequence alignment by selecting the most frequently appeared amino acid at each position. The multiple sequence alignments were also used to identify the conserved signatures (likely the DNA binding domains) for each family.</p>
<p>All of the bioinformatics tools incorporated to construct SoyDB can be used to automatically annotate other species in the future.</p>
</sec>
<sec>
<st>
<p>Links to External Databases and Datasets</p>
</st>
<p>In order to annotate the functions of soybean transcription factors, each TF protein sequence was searched against the soybean TF database PlantTFDB, NCBI known EST sequences, and other general protein databases by PSI-BLAST or TBLASTN. The external protein databases include Swiss-Prot <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp>, TAIR <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>, RefSeq <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>, SMART <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>, Pfam <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>, KEGG <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp>, SPRINTS <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>, EMBL <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>, InterPro <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>, PROSITE <abbrgrp>
<abbr bid="B39">39</abbr>
</abbrgrp>, and Gene Ontology <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Web links to these databases were created when the same transcription factor or its homologous proteins were found in them; and for each database or EST dataset only the PSI-BLAST or PBLASTN hit with the smallest e-values was listed in SoyDB. To search the known EST sequences, PSI-BLAST was first used to build a position-specific score matrix for each transcription factor. TBLASTN was then used to search each protein sequence against three known EST datasets: EST human, EST mouse, and EST others. GenBank <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp> web page of each EST hit was linked to SoyDB website. The gene expression of a subset of TF genes (about 1,000 TF genes) was recently published <abbrgrp>
<abbr bid="B63">63</abbr>
</abbrgrp>. Transcription profile of all soybean TF genes in various conditions is under investigation.</p>
<p>These external links greatly expand the annotation scope of SoyDB providing related knowledge from various perspectives. SoyDB provides a systematic view of a transcription factor -- from the features of the protein itself, to the biological pathway it locates in. The links to the external databases and datasets can be updated by a re-run of PSIBLAST and TBLASTN. Currently, these links are scheduled to be updated once every six months. This time interval can be changed if necessary.</p>
</sec>
<sec>
<st>
<p>Database and Website Implementation</p>
</st>
<p>The programs used to automatically annotate proteins were written in PERL. The relational database was built on MySQL with database schemas automatically generated by programs written in PERL. The website was implemented in PHP. The database and web site were automatically constructed by computer programs with little human intervention.</p>
</sec>
</sec>
<sec>
<st>
<p>Utility and Discussion</p>
</st>
<sec>
<st>
<p>Protein Information</p>
</st>
<p>This component contains the complete annotations for each transcription factor, including protein ID, protein name and description, tools used for TF identification, family ID, family name and description, amino acid sequence, homology domain prediction, <it>ab initio </it>domain prediction, PDB homologous templates, and predicted tertiary structure. This component can be reached by clicking the sequence ID, such as GM00001, or the Phytozome protein name, such as Glyma01g11670.1, at the "Protein Browsing" webpage (for details see the following "Protein Browsing" section). Figure <figr fid="F3">3</figr> illustrates the protein information page. The sequence ID and family ID, such as GM00001 and GMF0001, are internal indices used by the SoyDB, and the sequence name is the standard soybean TF name used by the soybean genome database Phytozome <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. We noticed the trend of unifying annotation formats within the soybean community. Therefore, the commonly used TF ID format, such as PTGm00009.1, is also compatible in SoyDB. Details are described in the "Full-Text Search" section below.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Information page for a transcription factor</p></caption><text>
   <p><b>Information page for a transcription factor</b>. This example web page shows the knowledge for each transcription factor, which includes amino acid sequence, predicted tertiary structure, domain(s) found by homologous search and <it>ab initio </it>prediction, PDB template and alignment, and links to other protein databases and EST datasets.</p>
</text><graphic file="1471-2229-10-14-3"/></fig>
</sec>
<sec>
<st>
<p>Family Information</p>
</st>
<p>This component contains the complete annotations for each TF family, including family ID, family name and description, number of sequences within the family, consensus sequence, consensus signatures (likely the DNA binding regions), web logo of the signature profile, and multiple sequence alignment of the protein sequences within the family. Figure <figr fid="F4">4</figr> demonstrates a family information web page. This component can be reached from the "Family Browsing" web page.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Family information page</p></caption><text>
   <p><b>Family information page</b>. This example web page shows the knowledge for each TF family, which includes number of sequences in the family, consensus sequence of the family, signature of sequences in the family, web logo, and multiple sequence alignment of the sequences in the family.</p>
</text><graphic file="1471-2229-10-14-4"/></fig>
</sec>
<sec>
<st>
<p>Protein Browsing</p>
</st>
<p>The transcription factors within a family are listed in the order of sequence IDs. The list contains the thumbnail of tertiary structure, protein ID and name, family ID, and family name of each transcription factor (Figure <figr fid="F5">5</figr>). Users can further view the complete annotations by clicking its sequence ID or the Phytozome protein name. This component can be reached by clicking the number of sequences in the "Family Browsing" or the "Family Information" web page.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Transcription factor browsing page</p></caption><text>
   <p><b>Transcription factor browsing page</b>. This page lists the transcription factors in a TF family. The tertiary structure of each sequence is displayed in an interactive way, <it>i.e.</it>, users can zoom in/out and rotate the structure. Users can further view sequence annotations by clicking the TF IDs, and view family annotations by clicking family names.</p>
</text><graphic file="1471-2229-10-14-5"/></fig>
</sec>
<sec>
<st>
<p>Family Browsing</p>
</st>
<p>A user can browse SoyDB from TF family perspective. The 64 TF families are listed in the order of family IDs. The family ID, family name, and the number of transcription factors within each family are listed. By clicking the family ID or name, users can view the complete annotations for a family, or further browse the sequences within a family by clicking the number of sequences. This component can be reached by clicking "Browse Database" in both the top and bottom menu bars from any SoyDB web pages. Additional file <supplr sid="S1">1</supplr> (<b>Figure S1</b>) illustrates the web page showing a TF family list.</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Figure S1 The SoyDB web page showing a list of transcription factor families</b>. TF families are shown with their family ID, family name, and number of sequences within the family. Click on the family ID can further view the detailed information about the family as shown in Figure <figr fid="F4">4</figr>, and click on the number of sequences can open the webpage showing all the transcription factors within the family, as shown in Figure <figr fid="F5">5</figr>.</p>
</text>
<file name="1471-2229-10-14-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Full Text Search</p>
</st>
<p>This component allows users to search the entire SoyDB database by a query text, such as protein name or family name. Given input keywords, SoyDB searches all the fields in the database and returns matched transcription factors with links to their annotations. Users can also search SoyDB by the TF IDs used in PlantTFDB. The search component will return the homologous soybean TFs found in SoyDB.</p>
</sec>
<sec>
<st>
<p>PSI-BLAST Sequence Search</p>
</st>
<p>This component allows users to search a query sequence against every TF sequence stored in SoyDB. Users can submit a query sequence and adjust PSI-BLAST parameters from a web page as shown in Additional file <supplr sid="S2">2</supplr> (<b>Figure S2</b>). After a PSI-BLAST search is performed, the significant hits, with links to their annotation web pages, are ranked based on the e-values generated by PSI-BLAST. Additional file <supplr sid="S3">3</supplr> (<b>Figure S3</b>) illustrates a PSI-BLAST result web page.</p>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Figure S2 The PSI-BLAST search web page</b>. Users can paste or type in a query amino acid sequence and specify PSI-BLAST parameters on the web page. Click on the "Run" button will execute PSI-BLAST.</p>
</text>
<file name="1471-2229-10-14-S2.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S3">
<title>
<p>Additional file 3</p>
</title>
<text>
<p>
<b>Figure S3 The result web page of PSI-BLAST search</b>. PSI-BLAST result page shows the hit TF sequence ID, and the PSI-BLAST score and E-value. The hits are listed in a decreasing order of the PSI-BLAST score. Click on the sequence ID can open the web page showing detailed TF information, as shown in Figure <figr fid="F3">3</figr>.</p>
</text>
<file name="1471-2229-10-14-S3.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Family Classification by Hidden Markov Model</p>
</st>
<p>This component classifies a query protein sequence into one of the 64 TF families. Additional file <supplr sid="S4">4</supplr> (<b>Figure S4</b>) illustrates the web page for family classification. A submitted query sequence is aligned with each of the 64 hidden Markov models built based on the 64 <it>Arabidopsis thaliana </it>TF families. The query sequence is classified into a family whose hidden Markov model outputs the lowest e-value (correspondingly the highest alignment score or fitness score). More details about family classification can be found at the "Transcription Factor Family Prediction Using SAM Hidden Markov Models" section under "Construction and Content".</p>
<suppl id="S4">
<title>
<p>Additional file 4</p>
</title>
<text>
<p>
<b>Figure S4 The HMM family classification web page</b>. Users can paste or type in a query amino acid sequence. Click on the "Predict" button will execute family classification by HMM.</p>
</text>
<file name="1471-2229-10-14-S4.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>FTP Site</p>
</st>
<p>All of the information in SoyDB is available for users to download from an FTP site. For example, users can download all of the TF protein sequences in the FASTA format and the multiple sequence alignments for each family in plain text. This makes it possible for other websites to link to SoyDB by performing PSI-BLAST searches on SoyDB sequences, similarly as SoyDB links with other external databases.</p>
</sec>
<sec>
<st>
<p>Comparisons and Overlapping between SoyDB and PlantTFDB</p>
</st>
<p>In total, SoyDB has 5,671 transcription factors - 4,306 of them (75.9%) have hits found in PlantTFDB identified by PSI-BLAST with an e-value threshold 10<sup>-3</sup>. PlantTFDB has 1,891 soybean transcription factors (based on the FASTA file downloaded from PlantTFDB FTP site), and 1,805 of them (95.5%) have hits found from SoyDB based on a PSI-BLAST search with an e-value threshold 10<sup>-3</sup>.</p>
</sec>
<sec>
<st>
<p>Comparisons of Soybean Transcription Factor Family Distributions with Other Plants</p>
</st>
<p>The collection and analyses in SoyDB allows us to perform comparisons of TF family distributions across the plant kingdom. The large number of soybean TF genes (5,671) described in this study is likely due to the two soybean whole genome duplication events; one estimated to have occurred 40-50 million years ago (mya) and the most recent one approximately 10-15 million years ago <abbrgrp>
<abbr bid="B64">64</abbr>
<abbr bid="B65">65</abbr>
</abbrgrp>. By comparing the total number of genes in different organisms, it was found that the increase of plant gene number is related to multicellularity and ploidy. For example, compared to the unicellular eukaryote <it>Chlamydomonas reinhardtii </it>where 15,143 genes were predicted <abbrgrp>
<abbr bid="B66">66</abbr>
</abbrgrp>, larger numbers of protein-encoding genes were reported in multicellular plant organisms, <it>e.g.</it>, <it>Physcomitrella patens </it>(35,938; <abbrgrp>
<abbr bid="B67">67</abbr>
</abbrgrp>), <it>Arabidopsis thaliana </it>(32,944; TAIR <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>), and the tetraploid <it>Glycine max </it>(66,153, Phytozome). We hypothesize that TF gene number also follows the same trend as land plants, which have a larger number of TF genes compared to algae. To perform comparisons of plant TF genes and their distributions across TF gene families, we mined the last updated DBD database <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> for 11 plant species (<it>C. reinhardtii, P. patens, Oryza sativa, Zea mays, Sorghum bicolor, Lotus japonicum, Medicago truncatula, A. thaliana, Vinis vinifera, Ricinus communis</it>, and <it>Populus trichocarpa</it>). These species were then compared with the soybean TF genes stored in our SoyDB database.</p>
<p>Our analysis showed that the unicellular <it>C. reinhardtii </it>has the lowest number of TF genes compared to multicellular land plants (the exceptions are <it>L. japonicus </it>and <it>M. truncatula </it>where only a partial genome sequence is available). This trend also reflects the differences of total gene number in the organisms shown in Figure <figr fid="F6">6</figr>. For example, it is interesting to note that homeobox, MYB, NAC, and WRKY TF genes in <it>C. reinhardtii </it>lack or have very low representations compared to the 11 other plant models (Table <tblr tid="T1">1</tblr>). Previous studies defined a role for homeobox <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp> and WRKY genes <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp> in plant development. Therefore, the occurrence of these genes only in multicellular plants may reflect their special roles in development. In addition, a close relationship between TF gene number and total gene number <abbrgrp>
<abbr bid="B69">69</abbr>
</abbrgrp> is observed when comparing the TF gene numbers of <it>G. max </it>and <it>A. thaliana </it>with their total gene numbers (<it>i.e.</it>, <it>G. max </it>encodes 66,153 protein-coding genes including 5,683 TF genes; <it>A. thaliana </it>encodes 32,944 protein-coding genes and 1,738 TF genes). Thus, the family distribution of soybean TF genes is similar to other land plant species, except for <it>P. patens </it>(<it>e.g.</it>, AP2 represents 7% of total TF genes in soybean vs. 8-12% for other land plants; bZIP: 3% vs. 3-7%; bHLH: 7% vs. 8-11%; homeobox: 6% vs. 4-7%; MYB: 14% vs. 7-14%; NAC: 4% vs. 4-9%; WRKY: 3% vs. 4-7%; ZF-C2H2: 7% vs. 5-9%) (Figure <figr fid="F6">6</figr> and Table <tblr tid="T1">1</tblr>).</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Distributions of transcription factor families across major plant species</p></caption><tblbdy cols="13">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b>At</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Zm</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Os</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Gm</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Lj</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Mt</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Sb</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Pt</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Pp</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cr</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Vv</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Rc</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="13">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>AP2</p>
         </c>
         <c ca="center">
            <p>162</p>
         </c>
         <c ca="center">
            <p>251</p>
         </c>
         <c ca="center">
            <p>186</p>
         </c>
         <c ca="center">
            <p>381</p>
         </c>
         <c ca="center">
            <p>16</p>
         </c>
         <c ca="center">
            <p>63</p>
         </c>
         <c ca="center">
            <p>153</p>
         </c>
         <c ca="center">
            <p>207</p>
         </c>
         <c ca="center">
            <p>153</p>
         </c>
         <c ca="center">
            <p>16</p>
         </c>
         <c ca="center">
            <p>124</p>
         </c>
         <c ca="center">
            <p>111</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>bZIP</p>
         </c>
         <c ca="center">
            <p>116</p>
         </c>
         <c ca="center">
            <p>119</p>
         </c>
         <c ca="center">
            <p>130</p>
         </c>
         <c ca="center">
            <p>176</p>
         </c>
         <c ca="center">
            <p>6</p>
         </c>
         <c ca="center">
            <p>27</p>
         </c>
         <c ca="center">
            <p>89</p>
         </c>
         <c ca="center">
            <p>90</p>
         </c>
         <c ca="center">
            <p>38</p>
         </c>
         <c ca="center">
            <p>14</p>
         </c>
         <c ca="center">
            <p>48</p>
         </c>
         <c ca="center">
            <p>50</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>bHLH</p>
         </c>
         <c ca="center">
            <p>183</p>
         </c>
         <c ca="center">
            <p>207</p>
         </c>
         <c ca="center">
            <p>203</p>
         </c>
         <c ca="center">
            <p>393</p>
         </c>
         <c ca="center">
            <p>16</p>
         </c>
         <c ca="center">
            <p>47</p>
         </c>
         <c ca="center">
            <p>148</p>
         </c>
         <c ca="center">
            <p>172</p>
         </c>
         <c ca="center">
            <p>102</p>
         </c>
         <c ca="center">
            <p>9</p>
         </c>
         <c ca="center">
            <p>110</p>
         </c>
         <c ca="center">
            <p>112</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>homeobox</p>
         </c>
         <c ca="center">
            <p>105</p>
         </c>
         <c ca="center">
            <p>121</p>
         </c>
         <c ca="center">
            <p>132</p>
         </c>
         <c ca="center">
            <p>319</p>
         </c>
         <c ca="center">
            <p>15</p>
         </c>
         <c ca="center">
            <p>35</p>
         </c>
         <c ca="center">
            <p>81</p>
         </c>
         <c ca="center">
            <p>129</p>
         </c>
         <c ca="center">
            <p>44</p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>74</p>
         </c>
         <c ca="center">
            <p>66</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MYB</p>
         </c>
         <c ca="center">
            <p>212</p>
         </c>
         <c ca="center">
            <p>192</p>
         </c>
         <c ca="center">
            <p>193</p>
         </c>
         <c ca="center">
            <p>791</p>
         </c>
         <c ca="center">
            <p>14</p>
         </c>
         <c ca="center">
            <p>56</p>
         </c>
         <c ca="center">
            <p>165</p>
         </c>
         <c ca="center">
            <p>210</p>
         </c>
         <c ca="center">
            <p>89</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>151</p>
         </c>
         <c ca="center">
            <p>105</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NAC/NAM</p>
         </c>
         <c ca="center">
            <p>132</p>
         </c>
         <c ca="center">
            <p>149</p>
         </c>
         <c ca="center">
            <p>146</p>
         </c>
         <c ca="center">
            <p>208</p>
         </c>
         <c ca="center">
            <p>15</p>
         </c>
         <c ca="center">
            <p>31</p>
         </c>
         <c ca="center">
            <p>111</p>
         </c>
         <c ca="center">
            <p>174</p>
         </c>
         <c ca="center">
            <p>32</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>81</p>
         </c>
         <c ca="center">
            <p>92</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>WRKY</p>
         </c>
         <c ca="center">
            <p>89</p>
         </c>
         <c ca="center">
            <p>141</p>
         </c>
         <c ca="center">
            <p>123</p>
         </c>
         <c ca="center">
            <p>197</p>
         </c>
         <c ca="center">
            <p>8</p>
         </c>
         <c ca="center">
            <p>39</p>
         </c>
         <c ca="center">
            <p>92</p>
         </c>
         <c ca="center">
            <p>103</p>
         </c>
         <c ca="center">
            <p>37</p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>59</p>
         </c>
         <c ca="center">
            <p>58</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZF-C2H2</p>
         </c>
         <c ca="center">
            <p>98</p>
         </c>
         <c ca="center">
            <p>114</p>
         </c>
         <c ca="center">
            <p>117</p>
         </c>
         <c ca="center">
            <p>395</p>
         </c>
         <c ca="center">
            <p>19</p>
         </c>
         <c ca="center">
            <p>49</p>
         </c>
         <c ca="center">
            <p>88</p>
         </c>
         <c ca="center">
            <p>101</p>
         </c>
         <c ca="center">
            <p>51</p>
         </c>
         <c ca="center">
            <p>5</p>
         </c>
         <c ca="center">
            <p>67</p>
         </c>
         <c ca="center">
            <p>78</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Other TF</p>
         </c>
         <c ca="center">
            <p>641</p>
         </c>
         <c ca="center">
            <p>957</p>
         </c>
         <c ca="center">
            <p>883</p>
         </c>
         <c ca="center">
            <p>2823</p>
         </c>
         <c ca="center">
            <p>99</p>
         </c>
         <c ca="center">
            <p>244</p>
         </c>
         <c ca="center">
            <p>524</p>
         </c>
         <c ca="center">
            <p>771</p>
         </c>
         <c ca="center">
            <p>277</p>
         </c>
         <c ca="center">
            <p>167</p>
         </c>
         <c ca="center">
            <p>352</p>
         </c>
         <c ca="center">
            <p>797</p>
         </c>
      </r>
      <r>
         <c cspan="13">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total</p>
         </c>
         <c ca="center">
            <p>1738</p>
         </c>
         <c ca="center">
            <p>2251</p>
         </c>
         <c ca="center">
            <p>2113</p>
         </c>
         <c ca="center">
            <p>5671</p>
         </c>
         <c ca="center">
            <p>208</p>
         </c>
         <c ca="center">
            <p>591</p>
         </c>
         <c ca="center">
            <p>1451</p>
         </c>
         <c ca="center">
            <p>1957</p>
         </c>
         <c ca="center">
            <p>823</p>
         </c>
         <c ca="center">
            <p>213</p>
         </c>
         <c ca="center">
            <p>1066</p>
         </c>
         <c ca="center">
            <p>1469</p>
         </c>
      </r>
      <r>
         <c cspan="13">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="13">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <b>%</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>At</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Zm</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Os</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Gm</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Lj</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Mt</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Sb</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Pt</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Pp</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Cr</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Vv</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Rc</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="13">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>AP2</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>19%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>12%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>bZIP</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>3%</p>
         </c>
         <c ca="center">
            <p>3%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>3%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>bHLH</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>12%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
         <c ca="center">
            <p>10%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>homeobox</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>0%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>MYB</p>
         </c>
         <c ca="center">
            <p>12%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>14%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>11%</p>
         </c>
         <c ca="center">
            <p>0%</p>
         </c>
         <c ca="center">
            <p>14%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>NAC/NAM</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
         <c ca="center">
            <p>0%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>WRKY</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>3%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
         <c ca="center">
            <p>0%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>4%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZF-C2H2</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>7%</p>
         </c>
         <c ca="center">
            <p>9%</p>
         </c>
         <c ca="center">
            <p>8%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>2%</p>
         </c>
         <c ca="center">
            <p>6%</p>
         </c>
         <c ca="center">
            <p>5%</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Other TF</p>
         </c>
         <c ca="center">
            <p>37%</p>
         </c>
         <c ca="center">
            <p>43%</p>
         </c>
         <c ca="center">
            <p>42%</p>
         </c>
         <c ca="center">
            <p>50%</p>
         </c>
         <c ca="center">
            <p>48%</p>
         </c>
         <c ca="center">
            <p>41%</p>
         </c>
         <c ca="center">
            <p>36%</p>
         </c>
         <c ca="center">
            <p>39%</p>
         </c>
         <c ca="center">
            <p>34%</p>
         </c>
         <c ca="center">
            <p>78%</p>
         </c>
         <c ca="center">
            <p>33%</p>
         </c>
         <c ca="center">
            <p>54%</p>
         </c>
      </r>
      <r>
         <c cspan="13">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
         <c ca="center">
            <p>100%</p>
         </c>
      </r>
   </tblbdy></tbl>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>Distributions of transcription factor families across major plant species</p></caption><text>
   <p><b>Distributions of transcription factor families across major plant species</b>. Phytozome and DBD databases were mined to identify transcription factor genes in soybean (Gm: Glycine max) and in the 11 remaining plant species, respectively (Cr: Chlamydomonas reinhardtii; Pp: Physcomitrella patens; Sb: Sorghum bicolor; Os: Oryza sativa; Zm: Zea mays; Lj: Lotus japonicus; Mt: Medicago truncatula; At: Arabidopsis thaliana; Vv: Vinis vinifera; Rc: Ricinus communis; Pt: Populus trichocarpa). After being classified based on their family membership, nine major TF families are represented for each plant species. Numbers next to the plant name abbreviation are the total number of TF genes available in DBD. Details are available in Table 1.</p>
</text><graphic file="1471-2229-10-14-6"/></fig>
<p>Collectively, these results suggest that soybean TF genes were not lost following soybean genome duplication, and may have evolved for specialized functions in plant development or response to the environment.</p>
</sec>
<sec>
<st>
<p>Future Development Plan</p>
</st>
<p>In the future, we plan to link to more soybean database, such as SoyBase, and add a human expert discussion section for each transcription factor where biologists can register, log in, and make comments on any annotation items. Also, we plan to link the protein name, such as Glyma01g11670.1, listed in each protein information page to its entry in Phytozome. By doing this, SoyDB can be linked with other soybean genome annotations. Furthermore, we may identify the binding regions on the soybean DNA sequences, which can further help biologists target the regulated regions on soybean genome.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>SoyDB is a comprehensive database for soybean transcription factors. It integrates bioinformatics tools and various external databases to provide rich annotations, which can be browsed and retrieved through convenient web interfaces. The automated process generates annotations and creates database and website, and can be used to annotate other sequenced species.</p>
</sec>
<sec>
<st>
<p>Availability and Requirements</p>
</st>
<p>SoyDB is freely available at <url>http://casp.rnet.missouri.edu/soydb/</url> for academic use. Based on our test, SoyDB is fully functional with three web browsers: Mozilla Firefox, Internet Explorer, and Safari, and four operating systems: Windows XP, Windows Vista, Linux (Red Hat), and Mac OS. The only system requirement for SoyDB is that JAVA runtime environment (JRE) needs to be installed and set fully functional in order to make Jmol work.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>JC conceived the idea of automating the process of protein annotation and database construction, and coordinated the project. JC and ZW designed the database and formulated the content of the annotations. JC developed or executed the tools that generate structures, templates, domains, DNA binding sites, and multiple sequence alignments. ZW implemented the computer system that automatically gathers data, constructs database, builds web site, and links to external databases and data sources. ZW, TJ, ML and JC performed the soybean TF family classification. ZW, JC, ML, and TJ wrote the first draft of the manuscript. ML analyzed and compared the distributions of transcription factors in multiple species. TJ and ML identified the list of transcription factors. GS, DX, HN, and BV contributed some annotation ideas. JC, GS, DX, and HN provided financial support for the project. All edited and approved the manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was partially funded by a University of Missouri (MU) start-up grant, a MU research board grant, and a MU research council grant to JC. The worked conducted by ML and GS was funded by a grant from the National Science Foundation (Plant Genome Program, #DBI-0421620). Other funding includes grants from the Missouri Soybean Merchandising Council to HN, DX, and GS.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Soy: health claims for soy protein, questions about other components</p></title><aug><au><snm>Henkel</snm><fnm>J</fnm></au></aug><source>FDA consumer</source><pubdate>2000</pubdate><volume>34</volume></bibl><bibl id="B2"><title><p>A Chinese fermented soybean food</p></title><aug><au><snm>Han</snm><fnm>B-Z</fnm></au><au><snm>Rombouts</snm><fnm>FM</fnm></au><au><snm>Nout</snm><fnm>MJR</fnm></au></aug><source>International Journal of Food Microbiology</source><pubdate>2001</pubdate><volume>65</volume><issue>1-2</issue><fpage>1</fpage><lpage>10</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-1605(00)00523-7</pubid><pubid idtype="pmpid">11322691</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Agricultural biotechnology: updated benefit estimates</p></title><aug><au><snm>Carpenter</snm><fnm>J</fnm></au><au><snm>Gianessi</snm><fnm>L</fnm></au></aug><publisher>Washington, USA National Centre for Food and Agricultural Policy (NCFAP)</publisher><pubdate>2001</pubdate></bibl><bibl id="B4"><title><p>Genome sequence of the paleopolyploid soybean (Glycine max (L.) Merr.)</p></title><aug><au><snm>Schmutz</snm><fnm>J</fnm></au><au><snm>Cannon</snm><fnm>S</fnm></au><au><snm>Schlueter</snm><fnm>J</fnm></au><au><snm>Ma</snm><fnm>J</fnm></au><au><snm>Hyten</snm><fnm>D</fnm></au><au><snm>Song</snm><fnm>Q</fnm></au><au><snm>Mitros</snm><fnm>T</fnm></au><au><snm>Nelson</snm><fnm>W</fnm></au><au><snm>May</snm><fnm>G</fnm></au><au><snm>Gill</snm><fnm>N</fnm></au><etal/></aug><source>Nature</source><inpress/></bibl><bibl id="B5"><title><p>Phytozome</p></title><url>http://www.phytozome.net/soybean</url></bibl><bibl id="B6"><title><p>bZIP transcription factors in Arabidopsis</p></title><aug><au><snm>Jakoby</snm><fnm>M</fnm></au><au><snm>Weisshaar</snm><fnm>B</fnm></au><au><snm>Droge-Laser</snm><fnm>W</fnm></au><au><snm>Vicente-Carbajosa</snm><fnm>J</fnm></au><au><snm>Tiedemann</snm><fnm>J</fnm></au><au><snm>Kroj</snm><fnm>T</fnm></au><au><snm>Parcy</snm><fnm>F</fnm></au></aug><source>Trends in Plant Science</source><pubdate>2002</pubdate><volume>7</volume><issue>3</issue><fpage>106</fpage><lpage>111</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1360-1385(01)02223-3</pubid><pubid idtype="pmpid" link="fulltext">11906833</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Basal transcription factors</p></title><aug><au><snm>Reese</snm><fnm>J</fnm></au></aug><source>Current opinion in genetics &amp; development</source><pubdate>2003</pubdate><volume>13</volume><issue>2</issue><fpage>114</fpage><lpage>118</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0959-437X(03)00013-3</pubid><pubid idtype="pmpid" link="fulltext">12672487</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Mechanisms of gene expression: structure, function and evolution of the basal transcriptional machinery</p></title><aug><au><snm>Weinzierl</snm><fnm>R</fnm></au></aug><publisher>London, UK Imperial College Press</publisher><pubdate>1999</pubdate></bibl><bibl id="B9"><title><p>Transcription factors and mammalian development</p></title><aug><au><snm>Corrinne</snm><fnm>C</fnm></au></aug><source>Current Topics in Developmental Biology</source><publisher>Academic Press, Inc</publisher><editor>Pedersen RA</editor><pubdate>1992</pubdate><volume>27</volume><fpage>351</fpage><xrefbib><pubidlist><pubid idtype="doi">full_text</pubid><pubid idtype="pmpid">1424766</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Two transcription factors, DREB1 and DREB2, with an EREBP/AP2 DNA binding domain separate two cellular signal transduction pathways in drought-and low-temperature-responsive gene expression, respectively, in Arabidopsis</p></title><aug><au><snm>Liu</snm><fnm>Q</fnm></au><au><snm>Kasuga</snm><fnm>M</fnm></au><au><snm>Sakuma</snm><fnm>Y</fnm></au><au><snm>Abe</snm><fnm>H</fnm></au><au><snm>Miura</snm><fnm>S</fnm></au><au><snm>Yamaguchi-Shinozaki</snm><fnm>K</fnm></au><au><snm>Shinozaki</snm><fnm>K</fnm></au></aug><source>The Plant Cell Online</source><pubdate>1998</pubdate><volume>10</volume><issue>8</issue><fpage>1391</fpage><lpage>1406</lpage><xrefbib><pubid idtype="doi">10.1105/tpc.10.8.1391</pubid></xrefbib></bibl><bibl id="B11"><title><p>Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes</p></title><aug><au><snm>Riechmann</snm><fnm>J</fnm></au><au><snm>Heard</snm><fnm>J</fnm></au><au><snm>Martin</snm><fnm>G</fnm></au><au><snm>Reuber</snm><fnm>L</fnm></au><au><snm>Z</snm><fnm>C</fnm></au><au><snm>Keddie</snm><fnm>J</fnm></au><au><snm>Adam</snm><fnm>L</fnm></au><au><snm>Pineda</snm><fnm>O</fnm></au><au><snm>Ratcliffe</snm><fnm>O</fnm></au><au><snm>Samaha</snm><fnm>R</fnm></au></aug><source>Science</source><pubdate>2000</pubdate><volume>290</volume><issue>5499</issue><fpage>2105</fpage><lpage>2110</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.290.5499.2105</pubid><pubid idtype="pmpid" link="fulltext">11118137</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-responsive gene expression</p></title><aug><au><snm>Sakuma</snm><fnm>Y</fnm></au><au><snm>Maruyama</snm><fnm>K</fnm></au><au><snm>Osakabe</snm><fnm>Y</fnm></au><au><snm>Qin</snm><fnm>F</fnm></au><au><snm>Seki</snm><fnm>M</fnm></au><au><snm>Shinozaki</snm><fnm>K</fnm></au><au><snm>Yamaguchi-Shinozaki</snm><fnm>K</fnm></au></aug><source>The Plant Cell Online</source><pubdate>2006</pubdate><volume>18</volume><issue>5</issue><fpage>1292</fpage><lpage>1309</lpage><xrefbib><pubid idtype="doi">10.1105/tpc.105.035881</pubid></xrefbib></bibl><bibl id="B13"><title><p>TRANSPARENT TESTA GLABRA2, a trichome and seed coat development gene of Arabidopsis, encodes a WRKY transcription factor</p></title><aug><au><snm>Johnson</snm><fnm>C</fnm></au><au><snm>Kolevski</snm><fnm>B</fnm></au><au><snm>Smyth</snm><fnm>D</fnm></au></aug><source>The Plant Cell</source><pubdate>2002</pubdate><volume>14</volume><issue>6</issue><fpage>1359</fpage><lpage>1375</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1105/tpc.001404</pubid><pubid idtype="pmcid">150785</pubid><pubid idtype="pmpid" link="fulltext">12084832</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>WRKY transcription factors: from DNA binding towards biological function</p></title><aug><au><snm>Ulker</snm><fnm>B</fnm></au><au><snm>Somssich</snm><fnm>I</fnm></au></aug><source>Current Opinion in Plant Biology</source><pubdate>2004</pubdate><volume>7</volume><issue>5</issue><fpage>491</fpage><lpage>498</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.pbi.2004.07.012</pubid><pubid idtype="pmpid" link="fulltext">15337090</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>The Soybean Genome Database (SoyGD): a browser for display of duplicated, polyploid, regions and sequence tagged sites on the integrated physical and genetic maps of Glycine max</p></title><aug><au><snm>Shultz</snm><fnm>J</fnm></au><au><snm>Kurunam</snm><fnm>D</fnm></au><au><snm>Shopinski</snm><fnm>K</fnm></au><au><snm>Iqbal</snm><fnm>M</fnm></au><au><snm>Kazi</snm><fnm>S</fnm></au><au><snm>Zobrist</snm><fnm>K</fnm></au><au><snm>Bashir</snm><fnm>R</fnm></au><au><snm>Yaegashi</snm><fnm>S</fnm></au><au><snm>Lavu</snm><fnm>N</fnm></au><au><snm>Afzai</snm><fnm>A</fnm></au><etal/></aug><source>Nucleic Acids Research</source><pubdate>2006</pubdate><issue>34 Database</issue><fpage>D758</fpage><lpage>D765</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj050</pubid><pubid idtype="pmcid">1347413</pubid><pubid idtype="pmpid" link="fulltext">16381975</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>SoyBase and the soybean breeder's toolbox</p></title><url>http://soybase.org/</url></bibl><bibl id="B17"><title><p>SoyXpress: a database for exploring the soybean transcriptome</p></title><aug><au><snm>Cheng</snm><fnm>K</fnm></au><au><snm>Stromvik</snm><fnm>M</fnm></au></aug><source>BMC genomics</source><pubdate>2008</pubdate><volume>9</volume><issue>1</issue><fpage>368</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-368</pubid><pubid idtype="pmcid">2536680</pubid><pubid idtype="pmpid" link="fulltext">18671881</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>PlantTFDB: a comprehensive plant transcription factor database</p></title><aug><au><snm>Guo</snm><fnm>A</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au><au><snm>Gao</snm><fnm>G</fnm></au><au><snm>Zhang</snm><fnm>H</fnm></au><au><snm>Zhu</snm><fnm>Q</fnm></au><au><snm>Liu</snm><fnm>X</fnm></au><au><snm>Zhong</snm><fnm>Y</fnm></au><au><snm>Gu</snm><fnm>X</fnm></au><au><snm>He</snm><fnm>K</fnm></au><au><snm>Luo</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2008</pubdate><issue>36 Database</issue><fpage>D966</fpage><lpage>D969</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2238823</pubid><pubid idtype="pmpid" link="fulltext">17933783</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>DBD taxonomically broad transcription factor predictions: new content and functionality</p></title><aug><au><snm>Wilson</snm><fnm>D</fnm></au><au><snm>Charoensawan</snm><fnm>V</fnm></au><au><snm>Kummerfeld</snm><fnm>S</fnm></au><au><snm>Teichmann</snm><fnm>S</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2007</pubdate><issue>36 Database</issue><fpage>D88</fpage><lpage>D92</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm964</pubid><pubid idtype="pmcid">2238844</pubid><pubid idtype="pmpid" link="fulltext">18073188</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Gene ontology: tool for the unification of biology</p></title><aug><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Ball</snm><fnm>C</fnm></au><au><snm>Blake</snm><fnm>J</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Butler</snm><fnm>H</fnm></au><au><snm>Cherry</snm><fnm>J</fnm></au><au><snm>Davis</snm><fnm>A</fnm></au><au><snm>Dolinski</snm><fnm>K</fnm></au><au><snm>Dwight</snm><fnm>S</fnm></au><au><snm>Eppig</snm><fnm>J</fnm></au></aug><source>Nature Genetics</source><pubdate>2000</pubdate><volume>25</volume><issue>1</issue><fpage>25</fpage><lpage>29</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/75556</pubid><pubid idtype="pmpid" link="fulltext">10802651</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>InterProScan-an integration platform for the signature-recognition methods in InterPro</p></title><aug><au><snm>Zdobnov</snm><fnm>E</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2001</pubdate><volume>17</volume><issue>9</issue><fpage>847</fpage><lpage>848</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/17.9.847</pubid><pubid idtype="pmpid" link="fulltext">11590104</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>The EMBL nucleotide sequence database</p></title><aug><au><snm>Stoesser</snm><fnm>G</fnm></au><au><snm>Tuli</snm><fnm>M</fnm></au><au><snm>Lopez</snm><fnm>R</fnm></au><au><snm>Sterk</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1999</pubdate><volume>27</volume><issue>1</issue><fpage>18</fpage><lpage>24</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.1.18</pubid><pubid idtype="pmcid">148088</pubid><pubid idtype="pmpid" link="fulltext">9847133</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>UniProt: the universal protein knowledgebase</p></title><aug><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Wu</snm><fnm>C</fnm></au><au><snm>Barker</snm><fnm>W</fnm></au><au><snm>Boeckmann</snm><fnm>B</fnm></au><au><snm>Ferro</snm><fnm>S</fnm></au><au><snm>Gasteiger</snm><fnm>E</fnm></au><au><snm>Huang</snm><fnm>H</fnm></au><au><snm>Lopez</snm><fnm>R</fnm></au><au><snm>Magrane</snm><fnm>M</fnm></au><etal/></aug><source>Nucleic Acids Research</source><pubdate>2004</pubdate><issue>32 Database</issue><fpage>D115</fpage><lpage>D119</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh131</pubid><pubid idtype="pmcid">308865</pubid><pubid idtype="pmpid" link="fulltext">14681372</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p></title><aug><au><snm>Pruitt</snm><fnm>K</fnm></au><au><snm>Tatusova</snm><fnm>T</fnm></au><au><snm>Maglott</snm><fnm>D</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2006</pubdate><issue>00 Database</issue><fpage>D1</fpage><lpage>D5</lpage></bibl><bibl id="B25"><title><p>TRANSFAC: an integrated system for gene expression regulation</p></title><aug><au><snm>Wingender</snm><fnm>E</fnm></au><au><snm>Chen</snm><fnm>X</fnm></au><au><snm>Hehl</snm><fnm>R</fnm></au><au><snm>Karas</snm><fnm>H</fnm></au><au><snm>Liebich</snm><fnm>I</fnm></au><au><snm>Matys</snm><fnm>V</fnm></au><au><snm>Meinhardt</snm><fnm>T</fnm></au><au><snm>Prub</snm><fnm>M</fnm></au><au><snm>Reuter</snm><fnm>I</fnm></au><au><snm>Schacherer</snm><fnm>F</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2000</pubdate><volume>28</volume><issue>1</issue><fpage>316</fpage><lpage>319</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.316</pubid><pubid idtype="pmcid">102445</pubid><pubid idtype="pmpid" link="fulltext">10592259</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>The Ensembl genome database project</p></title><aug><au><snm>Hubbard</snm><fnm>T</fnm></au><au><snm>Barker</snm><fnm>D</fnm></au><au><snm>Birney</snm><fnm>E</fnm></au><au><snm>Cameron</snm><fnm>G</fnm></au><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>Clark</snm><fnm>L</fnm></au><au><snm>Cox</snm><fnm>T</fnm></au><au><snm>Cuff</snm><fnm>J</fnm></au><au><snm>Curwen</snm><fnm>V</fnm></au><au><snm>Down</snm><fnm>T</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2002</pubdate><volume>30</volume><issue>1</issue><fpage>38</fpage><lpage>41</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.1.38</pubid><pubid idtype="pmcid">99161</pubid><pubid idtype="pmpid" link="fulltext">11752248</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>The Pfam protein families database</p></title><aug><au><snm>Bateman</snm><fnm>A</fnm></au><au><snm>Coin</snm><fnm>L</fnm></au><au><snm>Durbin</snm><fnm>R</fnm></au><au><snm>Finn</snm><fnm>R</fnm></au><au><snm>Hollich</snm><fnm>V</fnm></au><au><snm>Griffiths-Jones</snm><fnm>S</fnm></au><au><snm>Khanna</snm><fnm>A</fnm></au><au><snm>Marshall</snm><fnm>M</fnm></au><au><snm>Moxon</snm><fnm>S</fnm></au><au><snm>Sonnhammer</snm><fnm>E</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2004</pubdate><volume>32</volume><issue>1</issue><fpage>276</fpage><lpage>280</lpage><xrefbib><pubid idtype="doi">10.1093/nar/gkh121</pubid></xrefbib></bibl><bibl id="B28"><title><p>The SUPERFAMILY database in 2004: additions and improvements</p></title><aug><au><snm>Madera</snm><fnm>M</fnm></au><au><snm>Vogel</snm><fnm>C</fnm></au><au><snm>Kummerfeld</snm><fnm>S</fnm></au><au><snm>Chothia</snm><fnm>C</fnm></au><au><snm>Gough</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2004</pubdate><issue>32 Database</issue><fpage>D235</fpage><lpage>D239</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh117</pubid><pubid idtype="pmcid">308851</pubid><pubid idtype="pmpid" link="fulltext">14681402</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Structural genomics: beyond the human genome project</p></title><aug><au><snm>Burley</snm><fnm>S</fnm></au><au><snm>Almo</snm><fnm>S</fnm></au><au><snm>Bonanno</snm><fnm>J</fnm></au><au><snm>Capel</snm><fnm>M</fnm></au><au><snm>Chance</snm><fnm>M</fnm></au><au><snm>Gaasterland</snm><fnm>T</fnm></au><au><snm>Lin</snm><fnm>D</fnm></au><au><snm>Sali</snm><fnm>A</fnm></au><au><snm>Studier</snm><fnm>F</fnm></au><au><snm>Swaminathan</snm><fnm>S</fnm></au></aug><source>Nature Genetics</source><pubdate>1999</pubdate><volume>23</volume><fpage>151</fpage><lpage>158</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/13783</pubid><pubid idtype="pmpid" link="fulltext">10508510</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>An overview of structural genomics</p></title><aug><au><snm>Burley</snm><fnm>S</fnm></au></aug><source>Nature Structural &amp; Molecular Biology</source><pubdate>2000</pubdate><volume>7</volume><fpage>932</fpage><lpage>934</lpage><xrefbib><pubid idtype="doi">10.1038/80697</pubid></xrefbib></bibl><bibl id="B31"><title><p>The protein data bank</p></title><aug><au><snm>Berman</snm><fnm>H</fnm></au><au><snm>Westbrook</snm><fnm>J</fnm></au><au><snm>Feng</snm><fnm>Z</fnm></au><au><snm>Gilliland</snm><fnm>G</fnm></au><au><snm>Bhat</snm><fnm>T</fnm></au><au><snm>Weissig</snm><fnm>H</fnm></au><au><snm>Shindyalov</snm><fnm>I</fnm></au><au><snm>Bourne</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2000</pubdate><volume>28</volume><issue>1</issue><fpage>235</fpage><lpage>242</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.235</pubid><pubid idtype="pmcid">102472</pubid><pubid idtype="pmpid" link="fulltext">10592235</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003</p></title><aug><au><snm>Boeckmann</snm><fnm>B</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Blatter</snm><fnm>M</fnm></au><au><snm>Estreicher</snm><fnm>A</fnm></au><au><snm>Gasteiger</snm><fnm>E</fnm></au><au><snm>Martin</snm><fnm>M</fnm></au><au><snm>Michoud</snm><fnm>K</fnm></au><au><snm>O'Donovan</snm><fnm>C</fnm></au><au><snm>Phan</snm><fnm>I</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2003</pubdate><volume>31</volume><issue>1</issue><fpage>365</fpage><lpage>370</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg095</pubid><pubid idtype="pmcid">165542</pubid><pubid idtype="pmpid" link="fulltext">12520024</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community</p></title><aug><au><snm>Rhee</snm><fnm>S</fnm></au><au><snm>Beavis</snm><fnm>W</fnm></au><au><snm>Berardini</snm><fnm>T</fnm></au><au><snm>Chen</snm><fnm>G</fnm></au><au><snm>Dixon</snm><fnm>D</fnm></au><au><snm>Doyle</snm><fnm>A</fnm></au><au><snm>Garcia-Hernandez</snm><fnm>M</fnm></au><au><snm>Huala</snm><fnm>E</fnm></au><au><snm>Lander</snm><fnm>G</fnm></au><au><snm>Montoya</snm><fnm>M</fnm></au><etal/></aug><source>Nucleic Acids Research</source><pubdate>2003</pubdate><issue>31</issue><fpage>224</fpage><lpage>228</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg076</pubid><pubid idtype="pmcid">165523</pubid><pubid idtype="pmpid" link="fulltext">12519987</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>SMART 5: domains in the context of genomes and networks</p></title><aug><au><snm>Letunic</snm><fnm>I</fnm></au><au><snm>Copley</snm><fnm>R</fnm></au><au><snm>Pils</snm><fnm>B</fnm></au><au><snm>Pinkert</snm><fnm>S</fnm></au><au><snm>Schultz</snm><fnm>J</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2006</pubdate><issue>34 Database</issue><fpage>D257</fpage><lpage>D260</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj079</pubid><pubid idtype="pmcid">1347442</pubid><pubid idtype="pmpid" link="fulltext">16381859</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>KEGG for linking genomes to life and the environment</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Araki</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Hattori</snm><fnm>M</fnm></au><au><snm>Hirakawa</snm><fnm>M</fnm></au><au><snm>Itoh</snm><fnm>M</fnm></au><au><snm>Katayama</snm><fnm>T</fnm></au><au><snm>Kawashima</snm><fnm>S</fnm></au><au><snm>Okuda</snm><fnm>S</fnm></au><au><snm>Tokimatsu</snm><fnm>T</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2008</pubdate><issue>36 Database</issue><fpage>D480</fpage><lpage>D484</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2238879</pubid><pubid idtype="pmpid" link="fulltext">18077471</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>PRINTS and PRINTS-S shed light on protein ancestry</p></title><aug><au><snm>Attwood</snm><fnm>T</fnm></au><au><snm>Blythe</snm><fnm>M</fnm></au><au><snm>Flower</snm><fnm>D</fnm></au><au><snm>Gaulton</snm><fnm>A</fnm></au><au><snm>Mabey</snm><fnm>J</fnm></au><au><snm>Maudling</snm><fnm>N</fnm></au><au><snm>McGregor</snm><fnm>L</fnm></au><au><snm>Mitchell</snm><fnm>A</fnm></au><au><snm>Moulton</snm><fnm>G</fnm></au><au><snm>Paine</snm><fnm>K</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2002</pubdate><volume>30</volume><issue>1</issue><fpage>239</fpage><lpage>241</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.1.239</pubid><pubid idtype="pmcid">99143</pubid><pubid idtype="pmpid" link="fulltext">11752304</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Toward an online repository of standard operating procedures (SOPs) for (Meta) genomic annotation</p></title><aug><au><snm>Angiuoli</snm><fnm>S</fnm></au><au><snm>Gussman</snm><fnm>A</fnm></au><au><snm>Klimke</snm><fnm>W</fnm></au><au><snm>Cochrane</snm><fnm>G</fnm></au><au><snm>Field</snm><fnm>D</fnm></au><au><snm>Garrity</snm><fnm>G</fnm></au><au><snm>Kodira</snm><fnm>C</fnm></au><au><snm>Kyrpides</snm><fnm>N</fnm></au><au><snm>Madupu</snm><fnm>R</fnm></au><au><snm>Markowitz</snm><fnm>V</fnm></au></aug><source>OMICS: A Journal of Integrative Biology</source><pubdate>2008</pubdate><volume>12</volume><issue>2</issue><fpage>137</fpage><lpage>141</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/omi.2008.0017</pubid><pubid idtype="pmpid" link="fulltext">18416670</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>InterPro: An integrated documentation resource for protein families, domains and functional sites</p></title><aug><au><snm>Mulder</snm><fnm>N</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Attwood</snm><fnm>T</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Bateman</snm><fnm>A</fnm></au><au><snm>Binns</snm><fnm>D</fnm></au><au><snm>Biswas</snm><fnm>M</fnm></au><au><snm>Bradley</snm><fnm>P</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au><au><snm>Bucher</snm><fnm>P</fnm></au></aug><source>Briefings in Bioinformatics</source><pubdate>2002</pubdate><volume>3</volume><issue>3</issue><fpage>225</fpage><lpage>235</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/3.3.225</pubid><pubid idtype="pmpid" link="fulltext">12230031</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>The PROSITE database</p></title><aug><au><snm>Hulo</snm><fnm>N</fnm></au><au><snm>Bairoch</snm><fnm>A</fnm></au><au><snm>Bulliard</snm><fnm>V</fnm></au><au><snm>Cerutti</snm><fnm>L</fnm></au><au><snm>De Castro</snm><fnm>E</fnm></au><au><snm>Langendijk-Genevaux</snm><fnm>P</fnm></au><au><snm>Pagni</snm><fnm>M</fnm></au><au><snm>Sigrist</snm><fnm>C</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2006</pubdate><issue>34 Database</issue><fpage>D227</fpage><lpage>D230</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkj063</pubid><pubid idtype="pmcid">1347426</pubid><pubid idtype="pmpid" link="fulltext">16381852</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Computational inference of homologous gene structures in the human genome</p></title><aug><au><snm>Yeh</snm><fnm>R</fnm></au><au><snm>Lim</snm><fnm>L</fnm></au><au><snm>Burge</snm><fnm>C</fnm></au></aug><source>Genome research</source><pubdate>2001</pubdate><volume>11</volume><issue>5</issue><fpage>803</fpage><lpage>816</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.175701</pubid><pubid idtype="pmcid">311055</pubid><pubid idtype="pmpid" link="fulltext">11337476</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>FgenesH</p></title><url>http://linux1.softberry.com/berry.phtml</url></bibl><bibl id="B42"><title><p>Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies</p></title><aug><au><snm>Haas</snm><fnm>B</fnm></au><au><snm>Delcher</snm><fnm>A</fnm></au><au><snm>Mount</snm><fnm>S</fnm></au><au><snm>Wortman</snm><fnm>J</fnm></au><au><snm>Smith</snm><fnm>R</fnm><suf>Jr</suf></au><au><snm>Hannick</snm><fnm>L</fnm></au><au><snm>Maiti</snm><fnm>R</fnm></au><au><snm>Ronning</snm><fnm>C</fnm></au><au><snm>Rusch</snm><fnm>D</fnm></au><au><snm>Town</snm><fnm>C</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2003</pubdate><volume>31</volume><issue>19</issue><fpage>5654</fpage><lpage>5666</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg770</pubid><pubid idtype="pmcid">206470</pubid><pubid idtype="pmpid" link="fulltext">14500829</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>PRINTS-S: the database formerly known as PRINTS</p></title><aug><au><snm>Attwood</snm><fnm>T</fnm></au><au><snm>Croning</snm><fnm>M</fnm></au><au><snm>Flower</snm><fnm>D</fnm></au><au><snm>Lewis</snm><fnm>A</fnm></au><au><snm>Mabey</snm><fnm>J</fnm></au><au><snm>Scordis</snm><fnm>P</fnm></au><au><snm>Selley</snm><fnm>J</fnm></au><au><snm>Wright</snm><fnm>W</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2000</pubdate><volume>28</volume><issue>1</issue><fpage>225</fpage><lpage>227</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.225</pubid><pubid idtype="pmcid">102408</pubid><pubid idtype="pmpid" link="fulltext">10592232</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Recent improvements of the ProDom database of protein domain families</p></title><aug><au><snm>Corpet</snm><fnm>F</fnm></au><au><snm>Gouzy</snm><fnm>J</fnm></au><au><snm>Kahn</snm><fnm>D</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1999</pubdate><volume>27</volume><issue>1</issue><fpage>263</fpage><lpage>267</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.1.263</pubid><pubid idtype="pmcid">148152</pubid><pubid idtype="pmpid" link="fulltext">9847197</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>TIGRFAMs: a protein family resource for the functional identification of proteins</p></title><aug><au><snm>Haft</snm><fnm>D</fnm></au><au><snm>Loftus</snm><fnm>B</fnm></au><au><snm>Richardson</snm><fnm>D</fnm></au><au><snm>Yang</snm><fnm>F</fnm></au><au><snm>Eisen</snm><fnm>J</fnm></au><au><snm>Paulsen</snm><fnm>I</fnm></au><au><snm>White</snm><fnm>O</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2001</pubdate><volume>29</volume><issue>1</issue><fpage>41</fpage><lpage>43</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/29.1.41</pubid><pubid idtype="pmcid">29844</pubid><pubid idtype="pmpid" link="fulltext">11125044</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>Protein family classification and functional annotation</p></title><aug><au><snm>Wu</snm><fnm>C</fnm></au><au><snm>Huang</snm><fnm>H</fnm></au><au><snm>Yeh</snm><fnm>L</fnm></au><au><snm>Barker</snm><fnm>W</fnm></au></aug><source>Computational Biology and Chemistry</source><pubdate>2003</pubdate><volume>27</volume><issue>1</issue><fpage>37</fpage><lpage>47</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1476-9271(02)00098-1</pubid><pubid idtype="pmpid" link="fulltext">12798038</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure</p></title><aug><au><snm>Gough</snm><fnm>J</fnm></au><au><snm>Karplus</snm><fnm>K</fnm></au><au><snm>Hughey</snm><fnm>R</fnm></au><au><snm>Chothia</snm><fnm>C</fnm></au></aug><source>Journal of Molecular Biology</source><pubdate>2001</pubdate><volume>313</volume><issue>4</issue><fpage>903</fpage><lpage>919</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.2001.5080</pubid><pubid idtype="pmpid" link="fulltext">11697912</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>Gene3D: structural assignment for whole genes and genomes using the CATH domain structure database</p></title><aug><au><snm>Buchan</snm><fnm>D</fnm></au><au><snm>Shepherd</snm><fnm>A</fnm></au><au><snm>Lee</snm><fnm>D</fnm></au><au><snm>Pearl</snm><fnm>F</fnm></au><au><snm>Rison</snm><fnm>S</fnm></au><au><snm>Thornton</snm><fnm>J</fnm></au><au><snm>Orengo</snm><fnm>C</fnm></au></aug><source>Genome Research</source><pubdate>2002</pubdate><volume>12</volume><issue>3</issue><fpage>503</fpage><lpage>514</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.213802</pubid><pubid idtype="pmcid">155287</pubid><pubid idtype="pmpid" link="fulltext">11875040</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>The PANTHER database of protein families, subfamilies, functions and pathways</p></title><aug><au><snm>Mi</snm><fnm>H</fnm></au><au><snm>Lazareva-Ulitsky</snm><fnm>B</fnm></au><au><snm>Loo</snm><fnm>R</fnm></au><au><snm>Kejariwal</snm><fnm>A</fnm></au><au><snm>Vandergriff</snm><fnm>J</fnm></au><au><snm>Rabkin</snm><fnm>S</fnm></au><au><snm>Guo</snm><fnm>N</fnm></au><au><snm>Muruganujan</snm><fnm>A</fnm></au><au><snm>Doremieux</snm><fnm>O</fnm></au><au><snm>Campbell</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2005</pubdate><issue>33 Database</issue><fpage>D284</fpage><lpage>D288</lpage><xrefbib><pubidlist><pubid idtype="pmcid">540032</pubid><pubid idtype="pmpid" link="fulltext">15608197</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot</p></title><aug><au><snm>Lima</snm><fnm>T</fnm></au><au><snm>Auchincloss</snm><fnm>A</fnm></au><au><snm>Coudert</snm><fnm>E</fnm></au><au><snm>Keller</snm><fnm>G</fnm></au><au><snm>Michoud</snm><fnm>K</fnm></au><au><snm>Rivoire</snm><fnm>C</fnm></au><au><snm>Bulliard</snm><fnm>V</fnm></au><au><snm>de Castro</snm><fnm>E</fnm></au><au><snm>Lachaize</snm><fnm>C</fnm></au><au><snm>Baratin</snm><fnm>D</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2009</pubdate><issue>37 Database</issue><fpage>D471</fpage><lpage>D478</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn661</pubid><pubid idtype="pmcid">2686602</pubid><pubid idtype="pmpid" link="fulltext">18849571</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>PlnTFDB: an integrative plant transcription factor database</p></title><aug><au><snm>Riano-Pachon</snm><fnm>D</fnm></au><au><snm>Ruzicic</snm><fnm>S</fnm></au><au><snm>Dreyer</snm><fnm>I</fnm></au><au><snm>Mueller-Roeber</snm><fnm>B</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2007</pubdate><volume>8</volume><issue>1</issue><fpage>42</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-8-42</pubid><pubid idtype="pmcid">1802092</pubid><pubid idtype="pmpid" link="fulltext">17286856</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>A community resource for high-throughput quantitative RT-PCR analysis of transcription factor gene expression in Medicago truncatula</p></title><aug><au><snm>Kakar</snm><fnm>K</fnm></au><au><snm>Wandrey</snm><fnm>M</fnm></au><au><snm>Czechowski</snm><fnm>T</fnm></au><au><snm>Gaertner</snm><fnm>T</fnm></au><au><snm>Scheible</snm><fnm>W</fnm></au><au><snm>Stitt</snm><fnm>M</fnm></au><au><snm>Torres-Jerez</snm><fnm>I</fnm></au><au><snm>Xiao</snm><fnm>Y</fnm></au><au><snm>Redman</snm><fnm>J</fnm></au><au><snm>Wu</snm><fnm>H</fnm></au></aug><source>Plant Methods</source><pubdate>2008</pubdate><volume>4</volume><issue>1</issue><fpage>18</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1746-4811-4-18</pubid><pubid idtype="pmcid">2490690</pubid><pubid idtype="pmpid" link="fulltext">18611268</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>MUSCLE: multiple sequence alignment with high accuracy and high throughput</p></title><aug><au><snm>Edgar</snm><fnm>R</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2004</pubdate><volume>32</volume><issue>5</issue><fpage>1792</fpage><lpage>1797</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh340</pubid><pubid idtype="pmcid">390337</pubid><pubid idtype="pmpid" link="fulltext">15034147</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>A multi-template combination algorithm for protein comparative modeling</p></title><aug><au><snm>Cheng</snm><fnm>J</fnm></au></aug><source>BMC Structural Biology</source><pubdate>2008</pubdate><volume>8</volume><issue>1</issue><fpage>18</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1472-6807-8-18</pubid><pubid idtype="pmcid">2311309</pubid><pubid idtype="pmpid" link="fulltext">18366648</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>CASP8 Group Performance</p></title><url>http://www.predictioncenter.org/casp8/results.cgi</url></bibl><bibl id="B56"><title><p>LGA: a method for finding 3D similarities in protein structures</p></title><aug><au><snm>Zemla</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2003</pubdate><volume>31</volume><issue>13</issue><fpage>3370</fpage><lpage>3374</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg571</pubid><pubid idtype="pmcid">168977</pubid><pubid idtype="pmpid" link="fulltext">12824330</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins</p></title><aug><au><snm>Jones</snm><fnm>S</fnm></au><au><snm>Shanahan</snm><fnm>H</fnm></au><au><snm>Berman</snm><fnm>H</fnm></au><au><snm>Thornton</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2003</pubdate><volume>31</volume><issue>24</issue><fpage>7189</fpage><lpage>7198</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg922</pubid><pubid idtype="pmcid">291864</pubid><pubid idtype="pmpid" link="fulltext">14654694</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Jmol: an open-source Java viewer for chemical structures in 3D</p></title><url>http://jmol.sourceforge.net/</url></bibl><bibl id="B59"><title><p>PDP: protein domain parser</p></title><aug><au><snm>Alexandrov</snm><fnm>N</fnm></au><au><snm>Shindyalov</snm><fnm>I</fnm></au></aug><publisher>Oxford Univ Press</publisher><pubdate>2003</pubdate><volume>19</volume><fpage>429</fpage><lpage>430</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12584135</pubid></xrefbib></bibl><bibl id="B60"><title><p>DOMAC: an accurate, hybrid protein domain prediction server</p></title><aug><au><snm>Cheng</snm><fnm>J</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2007</pubdate><issue>35 Web Server</issue><fpage>W354</fpage><lpage>W356</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkm390</pubid><pubid idtype="pmcid">1933197</pubid><pubid idtype="pmpid" link="fulltext">17553833</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>WebLogo: a sequence logo generator</p></title><aug><au><snm>Crooks</snm><fnm>G</fnm></au><au><snm>Hon</snm><fnm>G</fnm></au><au><snm>Chandonia</snm><fnm>J</fnm></au><au><snm>Brenner</snm><fnm>S</fnm></au></aug><source>Genome Research</source><pubdate>2004</pubdate><volume>14</volume><issue>6</issue><fpage>1188</fpage><lpage>1190</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.849004</pubid><pubid idtype="pmcid">419797</pubid><pubid idtype="pmpid" link="fulltext">15173120</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>GenBank</p></title><aug><au><snm>Benson</snm><fnm>D</fnm></au><au><snm>Boguski</snm><fnm>M</fnm></au><au><snm>Lipman</snm><fnm>D</fnm></au><au><snm>Ostell</snm><fnm>J</fnm></au><au><snm>Ouellette</snm><fnm>B</fnm></au><au><snm>Rapp</snm><fnm>B</fnm></au><au><snm>Wheeler</snm><fnm>D</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>1999</pubdate><volume>27</volume><issue>1</issue><fpage>12</fpage><lpage>17</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.1.12</pubid><pubid idtype="pmcid">148087</pubid><pubid idtype="pmpid" link="fulltext">9847132</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p>Large-scale analysis of putative soybean regulatory gene expression identifies a Myb gene involved in soybean nodule development</p></title><aug><au><snm>Libault</snm><fnm>M</fnm></au><au><snm>Joshi</snm><fnm>T</fnm></au><au><snm>Takahashi</snm><fnm>K</fnm></au><au><snm>Hurley-Sommer</snm><fnm>A</fnm></au><au><snm>Puricelli</snm><fnm>K</fnm></au><au><snm>Blake</snm><fnm>S</fnm></au><au><snm>Xu</snm><fnm>D</fnm></au><au><snm>Nguyen</snm><fnm>H</fnm></au><au><snm>Stacey</snm><fnm>G</fnm></au></aug><source>Plant Physiology</source><pubdate>2009</pubdate><volume>151</volume><fpage>1207</fpage><lpage>1220</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1104/pp.109.144030</pubid><pubid idtype="pmcid">2773063</pubid><pubid idtype="pmpid" link="fulltext">19755542</pubid></pubidlist></xrefbib></bibl><bibl id="B64"><title><p>Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing</p></title><aug><au><snm>Schlueter</snm><fnm>J</fnm></au><au><snm>Lin</snm><fnm>J</fnm></au><au><snm>Schlueter</snm><fnm>S</fnm></au><au><snm>Vasylenko-Sanders</snm><fnm>I</fnm></au><au><snm>Deshpande</snm><fnm>S</fnm></au><au><snm>Yi</snm><fnm>J</fnm></au><au><snm>O'Bleness</snm><fnm>M</fnm></au><au><snm>Roe</snm><fnm>B</fnm></au><au><snm>Nelson</snm><fnm>R</fnm></au><au><snm>Scheffler</snm><fnm>B</fnm></au></aug><source>BMC genomics</source><pubdate>2007</pubdate><volume>8</volume><issue>1</issue><fpage>330</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-8-330</pubid><pubid idtype="pmcid">2077340</pubid><pubid idtype="pmpid" link="fulltext">17880721</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>Mining EST databases to resolve evolutionary events in major crop species</p></title><aug><au><snm>Schlueter</snm><fnm>J</fnm></au><au><snm>Dixon</snm><fnm>P</fnm></au><au><snm>Granger</snm><fnm>C</fnm></au><au><snm>Grant</snm><fnm>D</fnm></au><au><snm>Clark</snm><fnm>L</fnm></au><au><snm>Doyle</snm><fnm>J</fnm></au><au><snm>Shoemaker</snm><fnm>R</fnm></au></aug><source>Genome</source><pubdate>2004</pubdate><volume>47</volume><issue>5</issue><fpage>868</fpage><lpage>876</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1139/g04-047</pubid><pubid idtype="pmpid" link="fulltext">15499401</pubid></pubidlist></xrefbib></bibl><bibl id="B66"><title><p>The chlamydomonas genome reveals the evolution of key animal and plant functions</p></title><aug><au><snm>Merchant</snm><fnm>S</fnm></au><au><snm>Prochnik</snm><fnm>S</fnm></au><au><snm>Vallon</snm><fnm>O</fnm></au><au><snm>Harris</snm><fnm>E</fnm></au><au><snm>Karpowicz</snm><fnm>S</fnm></au><au><snm>Witman</snm><fnm>G</fnm></au><au><snm>Terry</snm><fnm>A</fnm></au><au><snm>Salamov</snm><fnm>A</fnm></au><au><snm>Fritz-Laylin</snm><fnm>L</fnm></au><au><snm>Marechal-Drouard</snm><fnm>L</fnm></au></aug><source>Science</source><pubdate>2007</pubdate><volume>318</volume><issue>5848</issue><fpage>245</fpage><lpage>250</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1143609</pubid><pubid idtype="pmpid" link="fulltext">17932292</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>The physcomitrella genome reveals evolutionary insights into the conquest of land by plants</p></title><aug><au><snm>Rensing</snm><fnm>S</fnm></au><au><snm>Lang</snm><fnm>D</fnm></au><au><snm>Zimmer</snm><fnm>A</fnm></au><au><snm>Terry</snm><fnm>A</fnm></au><au><snm>Salamov</snm><fnm>A</fnm></au><au><snm>Shapiro</snm><fnm>H</fnm></au><au><snm>Nishiyama</snm><fnm>T</fnm></au><au><snm>Perroud</snm><fnm>P</fnm></au><au><snm>Lindquist</snm><fnm>E</fnm></au><au><snm>Kamisugi</snm><fnm>Y</fnm></au></aug><source>Science</source><pubdate>2008</pubdate><volume>319</volume><issue>5859</issue><fpage>64</fpage><lpage>69</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1150646</pubid><pubid idtype="pmpid" link="fulltext">18079367</pubid></pubidlist></xrefbib></bibl><bibl id="B68"><title><p>KNOX gene function in plant stem cell niches</p></title><aug><au><snm>Scofield</snm><fnm>S</fnm></au><au><snm>Murray</snm><fnm>J</fnm></au></aug><source>Plant Molecular Biology</source><pubdate>2006</pubdate><volume>60</volume><issue>6</issue><fpage>929</fpage><lpage>946</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s11103-005-4478-y</pubid><pubid idtype="pmpid" link="fulltext">16724262</pubid></pubidlist></xrefbib></bibl><bibl id="B69"><title><p>Legume transcription factor genes; what makes legumes so special?</p></title><aug><au><snm>Libault</snm><fnm>M</fnm></au><au><snm>Joshi</snm><fnm>T</fnm></au><au><snm>Benedito</snm><fnm>V</fnm></au><au><snm>Xu</snm><fnm>D</fnm></au><au><snm>Udvardi</snm><fnm>M</fnm></au><au><snm>Stacey</snm><fnm>G</fnm></au></aug><source>Plant Physiology</source><pubdate>2009</pubdate><inpress/></bibl><bibl id="B70"><title><p>Predicting DNA-binding sites of proteins from amino acid sequence</p></title><aug><au><snm>Yan</snm><fnm>C</fnm></au><au><snm>Terribilini</snm><fnm>M</fnm></au><au><snm>Wu</snm><fnm>F</fnm></au><au><snm>Jernigan</snm><fnm>R</fnm></au><au><snm>Dobbs</snm><fnm>D</fnm></au><au><snm>Honavar</snm><fnm>V</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><issue>1</issue><fpage>262</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-262</pubid><pubid idtype="pmcid">1534068</pubid><pubid idtype="pmpid" link="fulltext">16712732</pubid></pubidlist></xrefbib></bibl><bibl id="B71"><title><p>BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences</p></title><aug><au><snm>Wang</snm><fnm>L</fnm></au><au><snm>Brown</snm><fnm>S</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2006</pubdate><issue>34 Web Server</issue><fpage>W243</fpage><lpage>W248</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkl298</pubid><pubid idtype="pmcid">1538853</pubid><pubid idtype="pmpid" link="fulltext">16845003</pubid></pubidlist></xrefbib></bibl><bibl id="B72"><title><p>DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins</p></title><aug><au><snm>Hwang</snm><fnm>S</fnm></au><au><snm>Gou</snm><fnm>Z</fnm></au><au><snm>Kuznetsov</snm><fnm>I</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>5</issue><fpage>634</fpage><lpage>636</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl672</pubid><pubid idtype="pmpid" link="fulltext">17237068</pubid></pubidlist></xrefbib></bibl><bibl id="B73"><title><p>Solution structure of the B3 DNA binding domain of the Arabidopsis cold-responsive transcription factor RAV1</p></title><aug><au><snm>Yamasaki</snm><fnm>K</fnm></au><au><snm>Kigawa</snm><fnm>T</fnm></au><au><snm>Inoue</snm><fnm>M</fnm></au><au><snm>Tateno</snm><fnm>M</fnm></au><au><snm>Yamasaki</snm><fnm>T</fnm></au><au><snm>Yabuki</snm><fnm>T</fnm></au><au><snm>Aoki</snm><fnm>M</fnm></au><au><snm>Seki</snm><fnm>E</fnm></au><au><snm>Matsuda</snm><fnm>T</fnm></au><au><snm>Tomo</snm><fnm>Y</fnm></au></aug><source>The Plant Cell Online</source><pubdate>2004</pubdate><volume>16</volume><issue>12</issue><fpage>3448</fpage><lpage>3459</lpage><xrefbib><pubid idtype="doi">10.1105/tpc.104.026112</pubid></xrefbib></bibl><bibl id="B74"><title><p>Swiss-PDB viewer (deep view)</p></title><aug><au><snm>Kaplan</snm><fnm>W</fnm></au><au><snm>Littlejohn</snm><fnm>T</fnm></au></aug><source>Briefings in Bioinformatics</source><pubdate>2001</pubdate><volume>2</volume><issue>2</issue><fpage>195</fpage><lpage>197</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bib/2.2.195</pubid><pubid idtype="pmpid" link="fulltext">11465736</pubid></pubidlist></xrefbib></bibl><bibl id="B75"><title><p>The PyMOL molecular graphics system</p></title><url>http://pymol.sourceforge.net/</url></bibl></refgrp>
</bm></art>
