<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-7-116</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Database</dochead>
      <bibl>
         <title>
            <p>angaGEDUCI: <it>Anopheles gambiae </it>gene expression database with integrated comparative algorithms for identifying conserved DNA motifs in promoter sequences</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Dissanayake</snm>
               <mi>N</mi>
               <fnm>Sumudu</fnm>
               <insr iid="I1"/>
               <email>sdissana@uci.edu</email>
            </au>
            <au id="A2">
               <snm>Marinotti</snm>
               <fnm>Osvaldo</fnm>
               <insr iid="I1"/>
               <email>omarinot@uci.edu</email>
            </au>
            <au id="A3">
               <snm>Ribeiro</snm>
               <mi>C</mi>
               <fnm>Jose Marcos</fnm>
               <insr iid="I2"/>
               <email>jribeiro@niaid.nih.gov</email>
            </au>
            <au id="A4" ca="yes">
               <snm>James</snm>
               <mi>A</mi>
               <fnm>Anthony</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>aajames@uci.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA</p>
            </ins>
            <ins id="I2">
               <p>Laboratory of Malaria and Vector Research, National Institutes of Health (NIH/NIAID), Rockville, MD 20852, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Microbiology and Molecular Genetics, University of California, Irvine, CA 92697, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>116</fpage>
         <url>http://www.biomedcentral.com/1471-2164/7/116</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16707020</pubid>
               <pubid idtype="doi">10.1186/1471-2164-7-116</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>12</day>
               <month>1</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>17</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>17</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Dissanayake et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The completed sequence of the <it>Anopheles gambiae </it>genome has enabled genome-wide analyses of gene expression and regulation in this principal vector of human malaria. These investigations have created a demand for efficient methods of cataloguing and analyzing the large quantities of data that have been produced. The organization of genome-wide data into one unified database makes possible the efficient identification of spatial and temporal patterns of gene expression, and by pairing these findings with comparative algorithms, may offer a tool to gain insight into the molecular mechanisms that regulate these expression patterns.</p>
            </sec>
            <sec>
               <st>
                  <p>Description</p>
               </st>
               <p>We provide a publicly-accessible database and integrated data-mining tool, angaGEDUCI, that unifies 1) stage- and tissue-specific microarray analyses of gene expression in <it>An. gambiae </it>at different developmental stages and temporal separations following a bloodmeal, 2) functional gene annotation, 3) genomic sequence data, and 4) promoter sequence comparison algorithms. The database can be used to study genes expressed in particular stages, tissues, and patterns of interest, and to identify conserved promoter sequence motifs that may play a role in the regulation of such expression. The database is accessible from the address <url>http://www.angaged.bio.uci.edu</url>.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>By combining gene expression, function, and sequence data with integrated sequence comparison algorithms, angaGEDUCI streamlines spatial and temporal pattern-finding and produces a straightforward means of developing predictions and designing experiments to assess how gene expression may be controlled at the molecular level.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The sequenced genome of the principal vector of human malaria parasites in subSaharan Africa, <it>Anopheles gambiae </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, has raised expectations for the development of new and unexpected ways to manage or manipulate vector populations to control disease transmission <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. As part of efforts to meet these expectations, we generated and organized large data sets using gene expression microarrays to quantify genome-wide transcription in different developmental stages and tissues of this mosquito <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Arrangement of these data into a searchable format has streamlined the elucidation of genes expressed with stage-, tissue-, and sex-specificity. In addition, by juxtaposing these microarray findings with DNA comparative algorithms, the regulation of genes co-ordinately expressed in specific spatial and temporal patterns can be studied at a mechanistic level. We provide here a public database and web-based data-mining tool that combine stage and tissue expression microarray data, functional annotation, and regulatory DNA sequence comparison algorithms to provide insight into gene expression and regulation in <it>An. gambiae</it>.</p>
      </sec>
      <sec>
         <st>
            <p>Construction and content</p>
         </st>
         <sec>
            <st>
               <p>Data collection</p>
            </st>
            <p>Stage-specific transcriptional signal values were imported from genome-wide microarray analyses of <it>An. gambiae </it>larvae, male sugar-fed adults, female sugar-fed adults, and female blood-fed adults 3, 24, 48, 72, 96 hours and 15 days after a bloodmeal using Affymetrix GCOS software. Values from tissue-specific microarray analyses also were imported using GCOS to quantify genome-wide transcription in fat bodies, midgut, and ovaries at 24 hours after bloodfeeding <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Functional gene annotation was imported from the Ano-Xcel database <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> to populate angaGEDUCI with keywords and annotation from the ENSEMBL, NCBI non-redundant, GO, PFAM, and SMART databases. Promoter sequences were selected as regions 1.5 kilobases (kb) in length adjacent to the 5'-ends of transcription start sites of genes using genomic data from ENSEMBL (Assembly: AgamP3, Feb 2006; Genebuild: VectorBase, Feb 2006; Database version: 37.3). Transcription factor binding sites from several classes of organisms were imported from the Transcription Factors Database (TFD) available publicly at <url>ftp://ftp.ncbi.nih.gov/repository/TFD/datasets/</url>. Of the 7,066 sites listed in TFD, 6639 (94.0%) are eight nucleotides or longer and 623 (8.82%) contain degenerate notation. Five-hundred and eleven sites in the database were identified in insects (7.23%), of which 499 (97.7%) are eight nucleotides or longer, and 34 (6.65%) contain degeneracy.</p>
         </sec>
         <sec>
            <st>
               <p>Implementation</p>
            </st>
            <p>The data have been stored as a MySQL relational database that is accessible directly through an Apache web server. A web-based data mining interface is used to manage queries to identify genes that meet specific expression, keyword, and sequence criteria (Figure <figr fid="F1">1</figr>). A sequence comparison program based on the Boyer-Moore algorithm <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> is built into the data-mining interface for comparison of promoter regions of genes within a selected gene set.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Data-mining interface</p>
               </caption>
               <text>
                  <p><b>Data-mining interface</b>. The "Filter database" data-mining interface allows users to select a gene set that meets specific expression, keyword, and sequence criteria. Input fields include a) differential expression quantified from stage- and tissue-specific expression microarray analyses, b) keywords included in functional annotation gathered by Ano-Xcel [5] from the ENSEMBL, NCBI non-redundant, PFAM, GO, and SMART databases, and c) presence of transcription factor binding sites and other conserved DNA sequences contained within promoter, 3' UTR, or coding regions of the <it>An. gambiae </it>genome. Each filter is imposed on the current gene set being examined, beginning with the entire <it>An. gambiae </it>genome, thus selecting and reducing the gene set in a stepwise fashion as genes matching previous filter criteria are eliminated by subsequent filters. The parameters specified here are those that are used in the prophenoloxidase case study described in the text.</p>
               </text>
               <graphic file="1471-2164-7-116-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Data retrieval</p>
            </st>
            <p>The main page of the database provides hyperlinks to: Filter Database, Import Gene Set, Download Data, View Database, Submit Study, Documentation, and Contact. Selection of the Filter Database link opens the data-mining interface and allows users to focus on specific genes that satisfy input criteria based on: 1) stage- and tissue-specific expression, 2) annotated keywords, 3) DNA sequences present in promoter, 3' untranslated regions (UTR), or coding regions, or 4) presence of specific transcription factor binding sites (Figure <figr fid="F1">1</figr>). Queries are conducted by stepwise entry of input criteria with each query imposed on the previous so that all genes currently displayed meet all preceding query criteria as well as the criterion that was last entered. Once a gene set of interest has been selected, users then can use the analysis menu in the interface to search for conserved DNA motifs within the promoters of the gene set, view expression profiles, build a distribution of annotated keywords, or export the set for future retrieval (Figure <figr fid="F2">2</figr>). Detailed annotation and expression data for each gene also can be viewed at any time by selecting the gene identifier link to invoke the description of a gene entry.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Gene set with analysis menu</p>
               </caption>
               <text>
                  <p><b>Gene set with analysis menu</b>. The six transcripts comprising the prophenoloxidase case study gene set, listed by ENSANGT identifiers, are shown in the background. The link to each transcript invokes a gene entry page, an example of which is represented in Figure 3. The analysis drop-down menu allows users to execute a search for conserved DNA sequence motifs in the promoter regions of the six genes in this gene set, build a keyword distribution from the functional annotation of these genes, display expression profiles of genes in the set, export promoter, 3' UTR, or cDNA sequences of the genes in FASTA format, or export the gene set.</p>
               </text>
               <graphic file="1471-2164-7-116-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Description of a gene entry</p>
            </st>
            <p>Each gene has a corresponding data page that can be accessed by selecting the gene identifier link during data retrieval. Gene entry pages display data from microarray expression analyses for stage- and tissue-specific expression and functional annotation as gathered by Ano-Xcel from ENSEMBL, NCBI non-redundant, GO, PFAM, and SMART databases (Figure <figr fid="F3">3</figr>). A link to the Vectorbase database that contains additional, centralized gene data also is provided on each entry page. User-contributed notes and a form for sharing notes for a gene entry are found below the annotation of each gene. To encourage data sharing, note submission does not require user pre-registration.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Gene entry for one transcript</p>
               </caption>
               <text>
                  <p><b>Gene entry for one transcript</b>. Complete gene description for one transcript, ENSANGT00000011456. Each entry displays the developmental expression profile built for the transcript from stage- and tissue-specific microarray analyses, followed by a link to Vectorbase and functional annotation gathered by Ano-Xcel [5] from the ENSEMBL, NCBI non-redundant, PFAM, GO, and SMART databases. The bottom of each entry includes user-contributed notes if they are available, as well as a form for users to submit their own notes for immediate listing.</p>
               </text>
               <graphic file="1471-2164-7-116-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparing promoters to identify conserved DNA sequence motifs</p>
            </st>
            <p>After clustering genes into gene sets that show similar patterns of expression, the data-mining interface analysis menu can be used to search for common DNA motifs that may act as regulatory sequences in coordinating these expression patterns. Two parameters must be selected to begin the analysis: 1) motif match length: the desired conserved sequence motif length to search for in the analysis, 2) mismatches: the number of base mismatches allowed between two nearly-conserved sequence motifs without disqualification.</p>
            <p>The resulting output from the analysis contains three parts. First, a comparison matrix is displayed indicating the number of conserved motifs found in each pair-wise comparison among every gene in the gene set (Figure <figr fid="F4">4</figr>). Each link in the matrix invokes a new page that prints the promoter sequences of the two genes being compared with areas of sequence conservation and transcription factor binding sites highlighted (Figure <figr fid="F5">5</figr>). Second, a table of the conserved motifs is displayed that compares the frequency of occurrence of each conserved motif within the gene set against the frequency of each motif in all 1) exons, 2) exons and introns, and 3) promoters within the <it>An. gambiae </it>genome (Figure <figr fid="F6">6</figr>). Each motif that matches or contains a transcription factor binding site is indicated in the same output. The third item displayed is a table indicating the frequency of occurrence of each transcription factor binding site of any size found within the gene set (Figure <figr fid="F7">7</figr>). Due to the degeneracy and varied size of transcription factor binding sites in the TFD database, the frequencies reported here are noticeably higher in this item compared to the frequencies in the conserved motif table that precedes it.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Promoter comparison matrix</p>
               </caption>
               <text>
                  <p><b>Promoter comparison matrix</b>. Each transcript in the current gene set is displayed in a matrix indicating the number of conserved motifs found between each transcript when compared pair-wise with every other transcript within the gene set. The matrix shown corresponds to the prophenoloxidase case study gene set, with the promoter regions of the six transcripts being compared to search for conserved DNA sequence motifs that are 12 nucleotides in length, with no mismatched bases allowed. Each link in the matrix invokes the sequence comparison output shown in Figure 5.</p>
               </text>
               <graphic file="1471-2164-7-116-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Promoter sequence comparison between the genes encoding two transcripts</p>
               </caption>
               <text>
                  <p><b>Promoter sequence comparison between the genes encoding two transcripts</b>. Abbreviated promoter region for the gene corresponding to one transcript, ENSANGP00000011456 as printed when compared to a second, ENSANGP00000020273. Nucleotides that are part of a conserved DNA sequence motif (of length greater than or equal to the specified motif search length: 12 bp in this example) that is found in both transcripts are indicated in blue. Numbered positions where known transcription factor binding sites occur are highlighted in green.</p>
               </text>
               <graphic file="1471-2164-7-116-5"/>
            </fig>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Conserved DNA sequence motifs in putative promoter regions</p>
               </caption>
               <text>
                  <p><b>Conserved DNA sequence motifs in putative promoter regions</b>. Analysis output from comparing putative promoter regions of the six prophenoloxidase transcripts identified in the case study, searching for conserved DNA sequence motifs that are 12 nucleotides in length with no mismatches allowed. Each conserved DNA sequence (<b>motif</b>) is followed by the number of genes (<b>#genes</b>) within the gene set where this motif was found, the total occurrences of the motif (<b>count</b>), taking into account that some genes may contain multiple instances of a motif, the corresponding frequency (<b>%set</b>) of occurrence of this motif within the current gene set, the frequency of occurrence of the motif within: all cDNAs (<b>%cdna</b>), all genes [including introns] (<b>%gene</b>), and all promoters (<b>%prom</b>), in the <it>An. gambiae </it>genome, and the fold difference between the frequency of occurrence of the motif in this gene set as compared to its frequency in all cDNAs (<b>cdna-fold</b>), all genes (<b>gene-fold</b>), and all promoter regions (<b>prom-fold</b>), in the <it>An. gambiae </it>genome. Each transcription factor binding site that matches or occurs within a conserved motif is indicated (<b>factors</b>), along with the class of organism in which the binding site was described originally. Motifs that do not match or contain a known transcription factor binding site are highlighted in orange. The gene identifiers containing each sequence motif are shown in the last column (<b>genes</b>).</p>
               </text>
               <graphic file="1471-2164-7-116-6"/>
            </fig>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Transcription factor binding sites contained in a gene set</p>
               </caption>
               <text>
                  <p><b>Transcription factor binding sites contained in a gene set</b>. Tabular account of known transcription factor binding sites of any length found within the putative promoter regions of the prophenoloxidase case study gene set. Each factor is indicated (<b>factor name</b>), along with the number of genes in which it is found (<b>#genes</b>), its frequency (<b>%set</b>) within the current gene set as compared to its frequency (<b>%prom</b>) within all promoter regions in the <it>An. gambiae </it>genome, and the difference between the latter two (<b>fold+/-</b>). The transcript identifiers containing each transcription factor binding site are indicated last (<b>genes</b>). Fifteen of the 287 binding sites found in the case study comparison are shown in this abbreviated figure.</p>
               </text>
               <graphic file="1471-2164-7-116-7"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Visualization of transcription profiles</p>
            </st>
            <p>The transcription profiles for a gene set can be viewed in batch by using the analysis menu from the data-mining interface after a gene set has been selected. The resulting graphs print transcriptional expression according to developmental stage: larvae, male sugar-fed adults, female sugar-fed adults, and female blood-fed adults 3, 24, 48, 72, 96 hours and 15 days after a bloodmeal (Figure <figr fid="F8">8</figr>).</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Developmental expression profiles</p>
               </caption>
               <text>
                  <p><b>Developmental expression profiles</b>. Gene expression profiles measuring transcriptional signal values from stage-specific microarray analyses of the six prophenoloxidase case study transcripts. The stages shown are larvae (L), male (M), sugar-fed adult female (NBF), and blood-fed adult female 3, 24, 48, 72, 96 hours, and 15 days after bloodmeal (BF3h-BF96h, BF15d).</p>
               </text>
               <graphic file="1471-2164-7-116-8"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Keyword distribution</p>
            </st>
            <p>A keyword distribution listing all keywords found in a gene set, as gathered by Ano-Xcel <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, and their respective frequency of occurrence, can be constructed by using the analysis menu from the data-mining interface (Figure <figr fid="F9">9</figr>).</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Keyword distribution</p>
               </caption>
               <text>
                  <p><b>Keyword distribution</b>. A distribution of keywords gathered by Ano-Xcel [5] from the ENSEMBL, NCBI non-redundant, PFAM, GO, and SMART databases for genes in the prophenoloxidase case study gene set. The number of occurrences corresponds to the number of genes in the gene set that contain the keyword.</p>
               </text>
               <graphic file="1471-2164-7-116-9"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Import gene set</p>
            </st>
            <p>A gene set can be imported by entering a list of gene identifiers in ENSANGG, ENSANGP, ENSANGT, Probeset ID, or Celera form, or by choosing from a list of pre-defined gene sets. Pre-defined gene sets consist of groups of genes that have been linked to similar function or regulation in existing literature (Figure <figr fid="F10">10</figr>). Users can submit gene sets for automatic and immediate listing as a pre-defined gene set from the same page. Gene sets can be exported from the data-mining interface by using the analysis menu.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Pre-defined gene sets</p>
               </caption>
               <text>
                  <p><b>Pre-defined gene sets</b>. The "Import Gene Set" page contains a sample list of pre-defined gene sets as grouped in existing literature. Investigators can use the same page to load a pre-defined set into the data-mining interface for study, or to submit additional sets for immediate listing. A general name is provided for each set (<b>Gene set</b>) along with the name or e-mail address of the user who submitted the set (<b>Author</b>), the number of genes contained in the set (<b>Number of genes</b>), and any details about the set or the literature it was derived from (<b>Details</b>).</p>
               </text>
               <graphic file="1471-2164-7-116-10"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Submit a microarray study</p>
            </st>
            <p>The angaGEDUCI database has the capacity to store and integrate additional Affymetrix microarray studies that examine gene expression in <it>An. gambiae</it>. The Submit Study link provides a short form for uploading microarray data and specifications.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Utility and Discussion</p>
         </st>
         <p>The angaGEDUCI database identifies genes that meet stage- and tissue-specific expression criteria, and incorporates keyword searching and promoter sequence analysis into one unified data-mining tool. A case study best illustrates the utility of this integration. In this example, we will identify genes linked to the complex regulation of phenoloxidase, an enzyme involved in the melanization of invading parasites and micro-organisms as part of invertebrate innate immunity <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Specifically, we will search for pro-phenoloxidase genes that are preferentially found in fat bodies and expressed highly three hours after bloodfeeding. Three filters will be used to complete this inquiry (Figure <figr fid="F1">1</figr>). First, a filter selects genes that contain the keyword "prophenoloxidase" in their functional annotation. Eighty-eight of the 13,639 transcripts in the <it>An. gambiae </it>genome contain this keyword. Second, a stage-specific filter identifies 14 of these 88 transcripts that show 5-fold up-regulated expression three hours after bloodfeeding (BF3h) as compared to sugarfed mosquitoes (NBF). Third, a tissue-specific filter isolates six of these 14 transcripts that are expressed 5-fold higher in fat bodies as compared to their corresponding expression in the midgut and ovaries (Figure <figr fid="F2">2</figr>).</p>
         <p>The analysis menu can be used with this gene set of interest to search for common DNA sequence motifs that occur within the promoter regions of the genes corresponding to these transcripts. Analysis of the promoter regions of the six prophenoloxidase-related genes shows the occurrence of 14 conserved 12-basepair DNA sequence motifs (Figure <figr fid="F6">6</figr>). Of these 14 motifs, 10 match known transcription factor binding sites while the other four do not. Additional motifs of interest can be found by executing the promoter analysis as a search for a conserved motif length less than 12 nucleotides in length or by specifying a number of mismatches that may be allowed within a nearly-conserved but imperfectly-matching motif. Depending on how these parameters are adjusted, the output from the promoter analysis of a gene set may generate more or less conserved motifs, as well as a different number of motifs that are or are not matched to known transcription factor binding sites. A survey of the data produced with different specifications of these parameters in the analysis of the prophenoloxidase gene set is included in Figure <figr fid="F11">11</figr> to aid users in choosing parameters that are most appropriate for their particular investigation.</p>
         <fig id="F11">
            <title>
               <p>Figure 11</p>
            </title>
            <caption>
               <p>Promoter analysis results with different parameter specifications</p>
            </caption>
            <text>
               <p><b>Promoter analysis results with different parameter specifications</b>. Different numbers of conserved DNA sequence motifs found by the promoter analysis algorithm when different parameters were specified (x-axis: length in basepairs [<b>bp</b>]; number of mismatches allowed [<b>mm</b>]). Numbers of conserved motifs (Y-axis) that match known transcription factor binding sites are shown in green, with motifs that do not match known sites shown in orange.</p>
            </text>
            <graphic file="1471-2164-7-116-11"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>While existing databases may allow individualized searching by expression, keyword, or sequence criteria, it is the unification of these fields that makes angaGEDUCI a unique facilitator of experimental design. The database may be used in many different ways, but perhaps most useful is the ability to use the stage- and tissue-specific expression microarray data to identify genes that are expressed in spatial and temporal patterns of interest and then compare the promoter regions of such genes to investigate putative means of facilitating such expression. The experimentally validated utility of such applications may pave the way for similar investigations into the regulatory role of conserved DNA sequence motifs in other control regions within the genome, such as putative microRNA target sites that may be found in 3' UTRs.</p>
         <p>In addition to its current microarray data based on genome-wide tissue- and stage-specific gene expression, angaGEDUCI has been built with the goal of expanding its scope to house, integrate, and display additional microarray studies of <it>An. gambiae</it>. For example, Affymetrix microarray data from a study investigating gene expression in <it>An. gambiae </it>following infection with <it>Plasmodium falciparum </it>can be integrated with the existing data in the database to produce a clearer picture of how the mosquito responds to parasite challenge at the transcriptional level. This flexibility assures that angaGEDUCI is capable of growing alongside the increasing quantity of data being produced from other studies. By working closely with Vectorbase and other laboratories in this way, it is hoped that angaGEDUCI will act as a catalyst in accelerating the study and understanding of gene expression and regulation in this important and devastating vector of disease.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>The <it>Anopheles gambiae </it>Gene Expression Database at UCI is publicly accessible from the URL: <url>http://www.angaged.bio.uci.edu</url>. Questions and comments are welcomed through the site.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>SND designed and implemented the website, database, and promoter analysis algorithms and wrote the principal draft of the manuscript. OM assisted in designing the analysis and editing of the manuscript. JMCR captured putative promoter sequences and constructed the Ano-Xcel database. AAJ assisted in the editing of the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank Dr. Norman Jacobson for his advice and Lynn Olson for help in preparing the manuscript. This work was supported by a grant from the National Institutes of Health (AI29746 to AAJ).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The genome sequence of the malaria mosquito <it>Anopheles gambiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Holt</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Halpern</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Charlab</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nusskern</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Ribeiro</snm>
                  <fnm>JMC</fnm>
               </au>
               <au>
                  <snm>Wides</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Loftus</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Majoros</snm>
                  <fnm>WH</fnm>
               </au>
               <au>
                  <snm>Rusch</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Kraft</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Anthouard</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Arensburger</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Atkinson</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Baden</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>de Berardinis</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Benes</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Biedler</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blass</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bolanos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Boscus</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Barnstead</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Center</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Chaturverdi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Christophides</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Chrystal</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Clamp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cravchik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Dana</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Dew</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Evans</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Flanigan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Grundschober-Freimoser</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Friedli</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Guan</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hillenmeyer</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Hladun</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Hogan</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>YS</fnm>
               </au>
               <au>
                  <snm>Hoover</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jaillon</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Ke</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Kodira</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kokoza</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Koutsos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Letunic</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Levitsky</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Lobo</snm>
                  <fnm>NF</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Malek</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>McIntosh</snm>
                  <fnm>TC</fnm>
               </au>
               <au>
                  <snm>Meister</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mobarry</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mongin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>O'Brochta</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Pfannkoch</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Qi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Regier</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Remington</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shao</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sharakhova</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Sitter</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Shetty</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Strong</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Thomasova</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ton</snm>
                  <fnm>LQ</fnm>
               </au>
               <au>
                  <snm>Topalis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Unger</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Walenz</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Woodford</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Wortman</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zdobnov</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Zhimulev</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Coluzzi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>della Torre</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Louis</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kalush</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Mural</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Broder</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gardner</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brey</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kafatos</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>FH</fnm>
               </au>
               <au>
                  <snm>Hoffman</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>129</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1076181</pubid>
                  <pubid idtype="pmpid" link="fulltext">12364791</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Arthropod-borne diseases: vector control in the genomics era</p>
            </title>
            <aug>
               <au>
                  <snm>Hill</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Kafatos</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Stansfield</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>FH</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>262</fpage>
            <lpage>268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1101</pubid>
                  <pubid idtype="pmpid" link="fulltext">15703759</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Microarray analysis of genes showing variable expression following a bloodmeal in <it>Anopheles gambiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Marinotti</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>QK</fnm>
               </au>
               <au>
                  <snm>Calvo</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Ribeiro</snm>
                  <fnm>JMC</fnm>
               </au>
            </aug>
            <source>Insect Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>365</fpage>
            <lpage>373</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2583.2005.00567.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16033430</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Genome-wide analysis of gene expression in adult <it>Anopheles gambiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Marinotti</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Calvo</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nguyen</snm>
                  <fnm>QK</fnm>
               </au>
               <au>
                  <snm>Dissanayake</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ribeiro</snm>
                  <fnm>JMC</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>AA</fnm>
               </au>
            </aug>
            <source>Insect Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>15</volume>
            <fpage>1</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2583.2006.00610.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16469063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>AnoXcel: an <it>Anopheles gambiae </it>protein database</p>
            </title>
            <aug>
               <au>
                  <snm>Ribeiro</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Topalis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Louis</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Insect Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <fpage>449</fpage>
            <lpage>457</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.0962-1075.2004.00503.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15373803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A fast string searching algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Boyer</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Communications of the ACM</source>
            <pubdate>1977</pubdate>
            <volume>20</volume>
            <fpage>762</fpage>
            <lpage>772</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1145/359842.359859</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The prophenoloxidase-activating system in invertebrates</p>
            </title>
            <aug>
               <au>
                  <snm>Cerenius</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>S&#246;derh&#228;ll</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Immunol Rev</source>
            <pubdate>2004</pubdate>
            <volume>198</volume>
            <fpage>116</fpage>
            <lpage>126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.0105-2896.2004.00116.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15199959</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Insect immunity and its implication in mosquito-malaria interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Dimopoulos</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Cell Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>5</volume>
            <fpage>3</fpage>
            <lpage>14</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1462-5822.2003.00252.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12542466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p><it>Anopheles gambiae </it>genome reannotation throughsynthesis of <it>ab initio </it>and comparative gene prediction algorithms</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Riehle</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Oduol</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gomez</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Eiglmeier</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ueberheide</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Shabanowitz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hunt</snm>
                  <fnm>DF</fnm>
               </au>
               <au>
                  <snm>Ribeiro</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Vernick</snm>
                  <fnm>KD</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>R24</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2006-7-3-r24</pubid>
                  <pubid idtype="pmpid" link="fulltext">16569258</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
