<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2005-6-13-r113</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Method</dochead>
		<bibl>
			<title>
				<p>A novel approach to identifying regulatory motifs in distantly related genomes</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Van Hellemont</snm>
					<fnm>Ruth</fnm>
					<insr iid="I1"/>
					<email>Ruth.vanhellemont@esat.kuleuven.be</email>
				</au>
				<au id="A2">
					<snm>Monsieurs</snm>
					<fnm>Pieter</fnm>
					<insr iid="I1"/>
					<email>Pieter.monsieurs@esat.kuleuven.be</email>
				</au>
				<au id="A3">
					<snm>Thijs</snm>
					<fnm>Gert</fnm>
					<insr iid="I1"/>
					<email>Gert.thijs@esat.kuleuven.be</email>
				</au>
				<au id="A4">
					<snm>De Moor</snm>
					<fnm>Bart</fnm>
					<insr iid="I1"/>
					<email>Bart.demoor@esat.kuleuven.be</email>
				</au>
				<au id="A5">
					<snm>Van de Peer</snm>
					<fnm>Yves</fnm>
					<insr iid="I2"/>
					<email>Yves.vandepeer@psb.ugent.be</email>
				</au>
				<au id="A6" ca="yes">
					<snm>Marchal</snm>
					<fnm>Kathleen</fnm>
					<insr iid="I1"/>
					<insr iid="I3"/>
					<email>Kathleen.Marchal@biw.kuleuven.be</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>ESAT-SCD, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium</p>
				</ins>
				<ins id="I2">
					<p>Plant Systems Biology, Bioinformatics and Evolutionary Genomics, VIB/Ghent University, Technologiepark 927, 9052 Gent, Belgium</p>
				</ins>
				<ins id="I3">
					<p>Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven-Heverlee, Belgium</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2005</pubdate>
			<volume>6</volume>
			<issue>13</issue>
			<fpage>R113</fpage>
			<url>http://genomebiology.com/2005/6/13/R113</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16420672</pubid><pubid idtype="doi">10.1186/gb-2005-6-13-r113</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>31</day>
					<month>5</month>
					<year>2005</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>22</day>
					<month>8</month>
					<year>2005</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>1</day>
					<month>12</month>
					<year>2005</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>30</day>
					<month>12</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>Van Hellemont et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<shorttitle>
			<p>Identifying regulatory motifs</p>
		</shorttitle>
		<shortabs>
			<p>A two-step procedure for identifying regulatory motifs in distantly related organisms is described that combines the advantages of sequence alignment and motif detection approaches.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>Although proven successful in the identification of regulatory motifs, phylogenetic footprinting methods still show some shortcomings. To assess these difficulties, most apparent when applying phylogenetic footprinting to distantly related organisms, we developed a two-step procedure that combines the advantages of sequence alignment and motif detection approaches. The results on well-studied benchmark datasets indicate that the presented method outperforms other methods when the sequences become either too long or too heterogeneous in size.</p>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Phylogenetic footprinting is a comparative method that uses cross-species sequence conservation to identify new regulatory motifs <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Based on the observation that functional regulatory motifs evolve more slowly than non-functional sequences, the method identifies potential regulatory motifs by detecting conserved regions in orthologous intergenic sequences <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. The comparison of orthologous sequences from multiple genomes is often based on multiple sequence alignment <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp> and several alignment algorithms, such as CLUSTALW <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, DIALIGN <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, MAVID <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> and MLAGAN <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, have proven very useful to identify conserved motifs in closely related higher vertebrate sequences <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Although the comparison of closely related organisms has proven successful, inclusion of more distantly related species can greatly improve the detection of conserved regulatory motifs. By adding more distantly related sequences, the conserved functional motifs can be more easily distinguished from the often highly variable 'background' sequence. Moreover, this leads to the detection of motifs that have a function in a wider variety of organisms, for example, all vertebrates <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Both Sandelin <it>et al. </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and Woolfe <it>et al. </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, for instance, performed a whole genome comparison of human and pufferfish, which diverged approximately 450 million years ago (mya) to discover non-coding elements conserved in both organisms. They showed that most of these conserved non-coding elements are located in regions of low gene density (implying long intergenic regions) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Moreover, many of the conserved non-coding elements are located at large distances from the nearest gene <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. These findings led to the conclusion that it is interesting to analyze whole intergenic regions of vertebrate genes, rather than limit the comparative analyses to the promoter region located near the transcription start.</p>
			<p>However, vertebrate intergenic regions may differ considerably in size, such as when comparing intergenics of, for example, mammals with those of <it>Fugu </it><abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. Since multiple sequence alignments are often based on global alignment procedures, they will likely fail to correctly align such sequences of heterogeneous length <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
			<p>An alternative for alignment methods is the use of <it>de novo </it>motif detection procedures for phylogenetic footprinting. These are based on either probabilistic or combinatorial algorithms. One such method, FootPrinter <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, uses a string based motif representation with dynamic programming to search a phylogenetic tree for motifs that show a minimal number of mismatches. Probabilistic algorithms, such as MEME <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, Consensus <abbrgrp><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp> and Gibbs sampling <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, use a matrix representation of the motif (position specific weight matrix). Currently, several implementations of Gibbs sampling are available, such as AlignACE <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>, ANN-spec <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, BioProspector <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and MotifSampler <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. However, these algorithms are sensitive to low signal-to-noise ratios, that is, the presence of small motifs (five to eight base pairs (bp)) in long intergenic sequences. This often results in the detection of many false positive motifs. On the other hand, an advantage of these procedures is that, because motif detection comes down to locally aligning the orthologous sequences, non-collinear motifs can still be detected.</p>
			<p>Neither motif detection nor multiple alignment methods are optimally suited to correctly align long intergenic sequences of heterogeneous length. Here, we present a simple two-step procedure that identifies conserved regions by combining the advantages of both alignment and motif detection methods. Such highly conserved regions most likely contain transcription factor binding sites or other functional intergenic sequences <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. To show its efficiency, we applied our two-step approach to well described benchmark datasets. Since regions of strong conservation among divergent vertebrates are often associated with developmental regulators <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, we choose mainly these types of genes to test our methodology. The presented approach, however, is applicable to any set of organisms and genes for which one wants to compare the intergenic sequences.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>A two-step procedure for phylogenetic footprinting</p>
				</st>
				<p>In this study, we aimed to detect regulatory motifs that have been retained over long periods in evolution; in our test case, this applied to mammals to ray-finned fishes such as <it>Fugu</it>. The <it>Fugu </it>genome, however, is very compact and approximately eight or nine times smaller than the human one, although both genomes are assumed to contain a similar repertoire of genes. The compactness of the genome of <it>Fugu </it>is the result of shorter intergenic regions and introns <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B42">42</abbr></abbrgrp>. On the other hand, the preliminary and still often erroneous annotation of the <it>Fugu </it>genome sometimes results in the selection of very long intergenic regions. Such heterogeneous sizes of the intergenic regions that need to be compared complicate identification of regulatory motifs. Widely used alignment algorithms, such as AVID, LAGAN and others, will usually fail when the sequences that need to be aligned differ too drastically in length. This problem is exacerbated when the sequences have a low overall percent identity. To cope with this, motif detection procedures could offer a solution. However, because regulatory motifs are typically only 6 to 30 bp long, whereas intergenic sequences of vertebrate genes range up to tens of kilobases <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, this results in a low signal-to-noise ratio that complicates the immediate use of <it>de novo </it>motif detection procedures. Therefore, we developed a two-step procedure to combine the advantages of the alignment and motif detection procedures.</p>
				<p>We included a first data reduction step based on an alignment method prior to the second motif detection step (see Materials and methods and Figure <figr fid="F1">1</figr>). This data reduction step increases the signal-to-noise ratio in the input set used for motif detection. Data reduction is based on the assumption that longer regions conserved in the orthologs of closely related species are more likely to contain biologically relevant motifs compared to non-conserved regions <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Therefore, in our benchmark study, regions conserved among closely related orthologous intergenic sequences of comparable size were preselected as input for motif detection. The mammalian intergenic sequences showed a relatively high overall percent identity and were comparable in length. Subsequently, these selected conserved mammalian subsequences were subjected to motif detection, together with the full-length <it>Fugu </it>intergenic region.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Schematic representation of the two-step procedure for phylogenetic footprinting</p>
					</caption>
					<text>
						<p>Schematic representation of the two-step procedure for phylogenetic footprinting. In the data reduction step, regions conserved among closely related (mammalian) orthologs are selected. Subsequently, these strongly conserved sequences are combined with a more distant ortholog (for example, <it>Fugu</it>); this set of genes is then subjected to motif detection. Finally, significantly conserved blocks are identified using a threshold defined by a random analysis.</p>
					</text>
					<graphic file="gb-2005-6-13-r113-1" hint_layout="double"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Data reduction</p>
				</st>
				<p>The data reduction procedure preselects subsequences conserved in closely related (mammalian) sequences. It requires a multiple alignment procedure that combines a pairwise alignment (AVID) and a clustering algorithm (Tribe-MCL). Details on this procedure can be found in the Materials and methods section. A resulting cluster consists of unique, non-overlapping subsequences, corresponding to a specific region conserved among the different related orthologs (human, chimp, mouse and rat).</p>
				<p>In our benchmark study, we were primarily interested in finding DNA motifs conserved among all input sequences (orthologs). Therefore, only clusters containing conserved subsequences of all mammalian orthologs included in this study (human, chimp, rat and mouse) were retained for further analysis (supplementary website <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>).</p>
			</sec>
			<sec>
				<st>
					<p>Motif detection</p>
				</st>
				<p>The motif detection step aims at identifying motifs that are statistically over-represented in the reduced set of orthologous intergenic sequences. To this end, we extended a previously developed Gibbs sampling based motif detection approach, MotifSampler <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr></abbrgrp> (see Materials and methods). The adapted implementation allows the user to choose a core sequence. A potential motif is only retained when it occurs in this core sequence. Indeed, the input data for motif detection consists of a set of (mammalian) subsequences and a complete <it>Fugu </it>intergenic sequence. This <it>Fugu </it>sequence shows a relatively low overall percent of identity with the other sequences. Due to the high sequence conservation (strong data dependence) between the mammalian subsequences, the original implementation of MotifSampler is not appropriate for detecting motifs in the most divergent sequence: the cost function (log likelihood score) that is optimized in the original MotifSampler offers a trade-off between the degree of conservation of the motif and the number of occurrences of the motif <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. This results in the detection of motifs that are highly conserved between the highly similar (mammalian) sequences but that show little or no conservation with the <it>Fugu </it>intergenic sequence. Therefore, to ensure the detection of motifs conserved among all sequences, we introduced the concept of a core sequence. By selecting the most divergent ortholog (the <it>Fugu </it>sequence) as the core sequence, the algorithm is forced to only detect motifs that are also present in the most distantly related organism.</p>
				<p>The adapted implementation was also redesigned to search for long conserved blocks instead of searching for short conserved motifs only. In datasets consisting of orthologs, not only the motif itself is conserved but also the local context of the motif <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B45">45</abbr></abbrgrp>. For this reason, we designed BlockSampler to extend motifs and search for the longest conserved blocks. A motif is thus used as a seed to generate ungapped multiple local alignments. Looking for longer motifs/blocks also increases the specificity of motif detection (less false positives). Finally, since it was previously shown that choosing a background model increases the performance of motif detection <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, we adapted the algorithm such that it uses for each ortholog in the dataset an organism-specific background model.</p>
			</sec>
			<sec>
				<st>
					<p>Results of developed methodology on benchmark datasets</p>
				</st>
				<p>To evaluate its performance, we applied our two-step motif detection procedure to several benchmark datasets. Since we were primarily interested in detecting regulatory motifs over large evolutionary distances, that is, conserved between <it>Fugu </it>and mammalian genomes, we compiled sets of evolutionarily divergent vertebrate orthologs that had been described to contain conserved motifs.</p>
				<p>In vertebrate organisms, large conserved regions tend to be associated with genes encoding regulators of development <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Since our strategy aims at detecting such conserved blocks, we tested the methodology on three sets of orthologous genes that function in the regulation of development, containing motifs described in the literature: <it>hoxb2 </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, <it>pax6 </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and <it>scl </it><abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. We also included in the analysis one gene, <it>cfos</it>, not related to developmental processes <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
				<p>All the benchmark sets consisted of orthologous genes that contain evolutionarily retained motifs described in the literature that have, to a large extent, been experimentally verified. These known motifs were used to evaluate the performance of our approach and to compare it to other algorithms. Additionally, we monitored whether our procedure was capable of detecting as yet unknown motifs.</p>
				<p>Using the two-step procedure we detected 8 significant blocks for <it>hoxb2</it>, 13 for <it>pax6</it>, 1 for <it>scl </it>and none for the <it>cfos </it>dataset (Table <tblr tid="T1">1</tblr>). The consensus scores of each of these 22 blocks are given in Tables <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr>, <tblr tid="T4">4</tblr> for each benchmark dataset, respectively. The location of these blocks on the complete intergenic region of the respective <it>Fugu </it>orthologs is shown in Figure <figr fid="F2">2</figr>; alignments can be found in <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Localization of clusters and conserved blocks in the <b>(a) </b><it>hoxb2</it>, <b>(b) </b><it>pax6 </it>and <b>(c) </b><it>scl </it>datasets</p>
					</caption>
					<text>
						<p>Localization of clusters and conserved blocks in the <b>(a) </b><it>hoxb2</it>, <b>(b)</b><it>pax6 </it>and <b>(c)</b><it>scl </it>datasets. For each dataset, the different orthologous intergenic sequences are shown: <it>Rn</it>,<it>Rattus norvegicus</it>; <it>Mm</it>, <it>Mus musculus</it>; <it>Pt</it>, <it>Pan troglotydes</it>; <it>Hs</it>, <it>Homo sapiens</it>; <it>Fr</it>, <it>Fugu rubripes</it>. Clusters of conserved mammalian subsequences that were subjected to motif detection (that is, clusters containing at least one subsequence per mammalian organism) are represented on the respective mammalian sequences (cluster 1 in red, cluster 2 in blue and cluster 3 in green). The conserved blocks identified using BlockSampler are represented on the <it>Fugu </it>intergenic sequence (in the color of the mammalian cluster it is located in). For each block the localization relative to the start of the <it>Fugu </it>gene is given. The transcription start sites are marked with an inverse triangle. </p>
					</text>
					<graphic file="gb-2005-6-13-r113-2" hint_layout="double"/>
				</fig>
				<tbl id="T1" hint_layout="single">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Conserved blocks detected in benchmark datasets</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c ca="center">
								<p>Gene</p>
							</c>
							<c cspan="3" ca="center">
								<p>Number of blocks</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="3">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Two-step</p>
							</c>
							<c ca="center">
								<p>UCSC</p>
							</c>
							<c ca="center">
								<p>UCR</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>cfos</it>
								</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>hoxb2</it>
								</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>pax6</it>
								</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>scl</it>
								</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>22</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Number of blocks two-step: number of conserved blocks identified using the two-step procedure. For more details on the blocks see Tables 2 (<it>hoxb2</it>), 3 (<it>pax6</it>) and 4 (<it>scl</it>). Number of blocks UCSC: the number of blocks detected by the two-step procedure that were recovered in the USCS genome browser (aligned between mammals and <it>Fugu</it>) [51]. Number of blocks UCR: the number of blocks detected by the two-step procedure that correspond to an ultra-conserved region [20].</p>
					</tblfn>
				</tbl>
				<tbl id="T2" hint_layout="double">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>List of the significant blocks detected in the <it>hoxb2 </it>dataset</p>
					</caption>
					<tblbdy cols="2">
						<r>
							<c ca="left">
								<p>Block</p>
							</c>
							<c ca="left">
								<p>Consensus sequence and possible binding sites</p>
							</c>
						</r>
						<r>
							<c cspan="2">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 1.1 (-)</p>
							</c>
							<c ca="left">
								<p>
									<b>AATTCTTTGATGCAATCGGAGGGAGCTGTCAGGGGGCTAAGATTGATCGCCTCATsTCCT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Meis (CTGTCA)</b>, CTGTCA: 26-31 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Hox/Pbx</b>, AGATTGATCG: 40-49 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 39-46 - (0.937); 22-29 - (0.918)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CDP CR1</b>, M00104, NATCGATCGS: 41-50 + (0.964)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CDP CR3+HD</b>, M00106, NATYGATSSS: 41-50 + (0.992)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 1-7 + (0.919); 6-12 + (0.903)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HSF2</b>, M00147, NGAANNWTCK: 40-49 + (0.925)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>MEIS1</b>, M00419, NNNTGACAGNNN: 23-34 - (0.951)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>TGIF</b>, M00418, AGCTGTCANNA: 24-34 + (0.966)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Pbx1</b>, M00096, ANCAATCAW: 39-47 - (0.909)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 2.1 (-)</p>
							</c>
							<c ca="left">
								<p>
									<b>TTGCACTTrGAGTTTACATTTTAATG</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Octamer-motif (ATTTgCAT)</b>, GTTTACAT: 12-19 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Adhf-2a (TGCACTgAGA)</b>, TGCACTTrGA: 2-11 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 20-26 + (0.978); 19-25 - (0.905); 17-23 - (0.927)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 14-20 - (0.905)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 2.2 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>AAAAnTGTACTTTTTTAGTATTTACyT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*HoxA5 (TTTAaTAaTTA)</b>, TTTAGTATTTA: 14-24 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 16-22 - (0.979)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 7-13 - (0.928)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 2.3 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GTGTGTTCTAGTGAACATTTTCATATATATTTATTGGTTAT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Glucocorticoid receptor</b>, AGTGAACA: 10-17 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*CCAAT BOX</b>, ATTGGTT: 27-33 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 15-22 + (0.919); 21-28 + (0.906); 7-14 - (0.919)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 23-29 + (0.958); 29-35 + (0.940); 28-34 - (0.956); 26-32 - (0.951); 24-30 - (0.958); 22-28 - (0.960)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>FOXJ2</b>, M00422, NNNWAAAYAAAYANNNNN: 23-40 - (0.932)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HFH-3</b>, M00289, KNNTRTTTRTTTA: 25-37 + (0.908)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>NF-Y</b>, M00185, TRRCCAATSRN: 30-40 - (0.914)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Oct-1</b>, M00162, CWNAWTKWSATRYN: 14-27 + (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Pbx-1</b>, M00096, ANCAATCAW: 30-38 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 2.4 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GTGAACATTTTCATATATATTTATTGGTTATAGCCTGTTAAAATATTTTCTTTT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*GATA 1</b>, TTATAGCC: 28-35 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*CCAAT BOX</b>, ATTGGTT: 23-29 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 5-12 + (0.919); 11-18 + (0.906)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CCAAT box</b>, M00254, NNNRRCCAATSA: 21-32 - (0.940)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 13-19 + (0.958); 19-25 + (0.940); 39-45 + (0.925); 46-52 + (0.901); 36-42 - (0.930); 18-24 - (0.957); 16-22 - (0.951); 14-20 - (0.958); 12-18 - (0.960)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>FOXD3</b>, M00130, NAWTGTTTRTTT: 41-52 + (0.924)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>FOXJ2</b>, M00422, NNNWAAAYAAAYANNNNN: 13-30 - (0.932)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HFH-3</b>, M00289, KNNTRTTTRTTTA: 15-27 + (0.908)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HNF-3beta</b>, M00131, KGNANTRTTTRYTTW: 39-53 + (0.920)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>NF-Y</b>, M00185, TRRCCAATSRN: 20-30 - (0.914)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Oct-1</b>, M00162, CWNAWTKWSATRYN: 4-17 + (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Pbx-1</b>, M00096, ANCAATCAW: 20-28 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 47-53 - (0.961)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 2.5 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>AATTCyCTCTTGGAACTTTCTTTGTTCTTCmGTAG</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HSF1</b>, M00146, AGAANRTTCN: 12-21 + (0.915); 12-21 - (0.930)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HSF2</b>, M00147, NGAANNWTCK: 12-21 + (0.948); 12-21 - (0.930)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 17-23 - (0.961)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 3.1 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GGCCnAGACnAGCGATTGGCGGAGrCCGGTCCCGTGACCAnGAATTCCCTGyAATTT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>NF-Y, M00185, TRRCCAATSRN: 12-22 - (0.915)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>USF</b>, M00187, CYCACGTGNC: 29-38 - (0.957)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>USF</b>, M00217, NCACGTGN: 30-37 + (0.902)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hoxb2 3.2 (-)</p>
							</c>
							<c ca="left">
								<p>
									<b>TCCCGTGACCAnGAATTCCCTGyAATTTCGnyGGAGTCC</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>USF, M00217, NCACGTGN: 1-8 + (0.902)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>For each block, the consensus sequence is given followed by the possible binding sites situated in this block: motifs previously described in the literature [46] are marked with an asterisk. The motifs are summarized by their motif name (in bold), by their consensus sequence, if known, as described in the original article, by the sequence of the motif instance in our search, by the positions of the motif instance relative to the consensus sequence of the entire block and by the strand (indicated by a '+' or a '-') on which the motif occurred. Motif hits derived by Transfac are indicated by their matrix accession number, the consensus of this binding site and the instances of this motif in our search. These are further characterized by their positions relative to the consensus sequence of the entire block, by the strand on which the motif occurred and by the corresponding MotifLocator score (in parentheses). The blocks identified by the UCSC genome browser as conserved between mammals and <it>Fugu </it>are marked with 'UCSC', while the blocks detected by our two-step methodology but not present in the UCSC genome browser are indicated with a '-'.</p>
					</tblfn>
				</tbl>
				<tbl id="T3" hint_layout="double">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>List of the significant blocks detected in the <it>pax6 </it>dataset</p>
					</caption>
					<tblbdy cols="2">
						<r>
							<c ca="left">
								<p>Block</p>
							</c>
							<c ca="left">
								<p>Consensus sequence and possible binding sites</p>
							</c>
						</r>
						<r>
							<c cspan="2">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 1.1 (UCSC)</p>
							</c>
							<c ca="left">
								<p>CTTAATGATGAGAGATCTTTCCGCTCATTGCCCATTCAAATACAATTGTAGATCGAAGCCGGCCTT GTCAsGTTGAGAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACT</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Minimal fragment for expression in lens and cornea as described in [46]</b>: 11-117 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 25-32 + (0.940); 79-86 - (0.964); 4-11 - (0.946); 1-8 - (0.903)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CCAAT box</b>, M00254, NNNRRCCAATSA: 27-38 + (0.901)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*CdxA</b>, M00100, 'MTTTATR': 1-7 + (0.921)<b>*</b>; 87-93 + (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*CdxA</b>, M00101, AWTWMTR: 1-7 + (0.934); 4-10 + (0.921); 38-44 + (0.905), 87-93 + (0.988)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00032, NCMGGAWGYN: 98-107 + (0.906)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00074, NNACMGGAWRTNN: 92-104 - (0.901)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 37-43 - (0.967)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-3</b>, M00351, ANAGATMWWA: 11-20 + (0.920)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HSF2</b>, M00147, NGAANNWTCK: 13-22 - (0.933)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>p53</b>, M00272, NGRCWTGYCY: 101-110 + (0.949)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 1.2 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>CATTATTGTTGCCAGCACGAAGCATCACAATCAATCATAAGGAAGTCCAGTTGGCAGGTGTCAATCTTG</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 1-7 - (0.995)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 25-32 + (0.934); 31-38 + (0.903); 35-42 + (0.903); 47-54 + (0.908); 61-68 + (0.937)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CDP CR3+HD</b>, M00106, NATYGATSSS: 27-36 - (0.907)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00074, NNACMGGAWRTNN: 36-48 + (0.902)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*HOXA3</b>, M00395, CNTANNNKN: 1-9 + (0.905)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>MyoD</b>, M00184, NNCACCTGNY: 53-62 - (0.956)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Pbx-1</b>, M00096, ANCAATCAW: 30-38 + (0.986); 2-10 - (0.923)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Sox-5</b>, M00042, NNAACAATNN: 3-12 - (0.932)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 33-39 + (0.910)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>USF</b>, M00122, NNRNCACGTGNYNN: 51-64 + (0.913); 51-64 - (0.908)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 1.3 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACTTTCAGwGAATTGCATCCAATCACCCC</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 3-10 - 0.964</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CCAAT box</b>, M00254, NNNRRCCAATSA: 52-63 + (0.949)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00100, 'MTTTATR': 11-17 + (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 11-17 + (0.988)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00032, NCMGGAWGYN: 22-31 + (0.906)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00074, NNACMGGAWRTNN:16-28 - (0.901)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 58-64 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-1</b>, M00075, SNNGATNNNN: 56-65 - (0.930)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-3</b>, M00077, NNGATARNG: 56-64 - (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>NF-Y</b>, M00185, TRRCCAATSRN: 54-64 + (0.910)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>p53</b>, M00272, NGRCWTGYCY: 25-34 + (0.949)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 59-65 + (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 1.4 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GTCTATATTTAATCCAATTATAAGGGTCACGGAGTAAGTGC</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Motif containing homeoboxes described in [46]</b>, TTTAATCCAATTATAA: 8-23 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 34-41 - (0.904)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00100, 'MTTTATR': 16-22 + (0.907)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 16-22 + (0.995); 16-22 - (0.906); 6-12 - (0.931); 4-10 - (0.951)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 15-21 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Nkx2-5</b>, M00240, TYAAGTG: 34-40 + (0.927)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>RORalpha1</b>, M00156, NWAWNNAGGTCAN: 18-30 + (0.919)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>TCF11</b>, M00285, GTCATNNWNNNNN: 26-38 + (0.906)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 1.5 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GCATCCAATCACCCCCAGGG</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 9-16 + (0.965)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 6-12 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-3</b>, M00077, NNGATARNG: 4-12 - (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 7-13 + (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 1.6 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>CAsGTTGAGAAAAAGTGAATTTCTAACATCCAGGACGTGCCTGTCTACTTTCAGw GAATTGCATCCAATCACCCCCAGGGAATTCnGCTAATGTCTCC</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Homeobox-binding site described in [46]</b>, GCTAATGTCTC: 87-97 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 69-76 + (0.965); 87-94 - (0.903); 11-18 - (0.964)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CCAAT box</b>, M00254, NNNRRCCAATSA: 60-71 + (0.949)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00100, 'MTTTATR': 19-25 + (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 19-25 + (0.988)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00032, NCMGGAWGYN: 30-39 + (0.906)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>c-Ets-1(p54)</b>, M00074, NNACMGGAWRTNN: 24-36 - (0.901)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 66-72 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-1</b>, M00075, SNNGATNNNN: 64-73 - (0.930)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-3</b>, M00077, NNGATARNG: 64-72 - (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>NF-Y</b>, M00185, TRRCCAATSRN: 62-72 + (0.910)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>p53</b>, M00272, NGRCWTGYCY: 33-42 + (0.949)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 67-73 + (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 2.1 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>TGGGTCCATTTTCCAGAyGGTTTGTTACTCTTGCTGCmTGATTTrG</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 6-13 + (0.921)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 9-15 + (0.918)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 21-27 - (0.942)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 2.2 (-)</p>
							</c>
							<c ca="left">
								<p>ATTTTGGTTGCTTTCAGGTwTAATTAACTTT</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Nkx2-5</b>, M00241, CWTAATTG: 21-28 - (0.902)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 2.3 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>ATTGTAATCATTTCAATTATCTTCA</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 8-15 + (0.927)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 14-20 - (0.948)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Nkx2-5</b>, M00241, CWTAATTG: 14-21 - (0.930)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 2.4 (-)</p>
							</c>
							<c ca="left">
								<p>GGTTGCTTTCAGGTwTAATTAACTTTGAACAACAAATA</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Nkx2-5</b>, M00241, CWTAATTG: 16-23 - (0.902)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 3.1 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>TTGTAATTACTGCCCTTCATGTGGTCCGGTGCCTTGAACCATCTTTAATTAAAAGCATAATTAAGG</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>AML-1a</b>, M00271, TGTGGT: 20-25 + (1.000)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 39-46 + (0.910); 55-62 + (0.909); 6-13 - (0.916)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00100, MTTTATR: 56-62 - (0.934)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 6-12 + (0.988); 44-50 + (0.913); 47-53 + (0.900); 48-54 + (0.905); 59-65 + (0.903); 60-66 + (0.926); 56-62 - (0.998); 47-53 - (0.913); 44-50 - (0.901); 43-49 - (0.907); 2-8 - (0.949);</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 3-9 + (0.912); 4-10 - (0.912)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HSF2 </b>, M00147, NGAANNWTCK: 35-44 + (0.908)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Nkx2-5</b>, M00241, CWTAATTG: 56-63 + (0.935), 58-65 - (0.954)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>USF</b>, M00217, NCACGTGN: 17-24 - (0.921)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 3.2 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>AAGGCTTGCAGCTGCCTCCAAATCAATAGAyGTCAAAGAAATATGAAAACArTC</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 39-45 + (0.953); 36-42 - (0.925)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 35-41 + (0.961)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 8-15 + (0.931); 39-46 - (0.940); 8-15 - (0.931)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>AP-4</b>, M00175, VDCAGCTGNN: 7-16 - (0.902)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>MyoD</b>, M00184, NNCACCTGNY: 7-16 + (0.957)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00160, NWWAACAAWANN: 19-30 + (0.928)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>pax6 3.3 (UCSC)</p>
							</c>
							<c ca="left">
								<p>
									<b>GCATAATTAAGGGAAGATCTAAAGAAAGACAATTACCAGATGGTCT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 1-8 + (0.909)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00100, MTTTATR: 2-8 - (0.934)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 5-11 + (0.903); 6-12 + (0.926); 32-38 + (0.939); 2-8 - (0.998)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>En-1</b>, M00396, GTANTNN: 30-36 - (1.000)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-1</b>, M00075, SNNGATNNNN: 36-45 + (0.936)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-2</b>, M00076, NNNGATRNNN: 36-45 + (0.922)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>GATA-3</b>, M00351, ANAGATMWWA: 13-22 + (0.949)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>HOXA3</b>, M00395, CNTANNNKN: 29-37 - (0.939)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Msx-1</b>, M00394, CNGTAWNTG: 30-38 - (0.915)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>MyoD</b>, M00184, NNCACCTGNY: 35-44 - (0.919)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Nkx2-5</b>, M00241, CWTAATTG: 2-9 + (0.935); 4-11 - (0.954)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>SRY</b>, M00148, AAACWAM: 21-27 + (0.961); 25-31 + (0.927)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>USF</b>, M00122, NNRNCACGTGNYNN: 33-46 + (0.907); 33-46 - (0.904)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>For each block, the consensus sequence is given followed by the possible binding sites situated in this block: motifs previously described in the literature [47] are marked with an asterisk. The motifs are summarized by their motif name (in bold), by their consensus sequence, if known, as described in the original article, by the sequence of the motif instance in our search, by the positions of the motif instance relative to the consensus sequence of the entire block and by the strand (indicated by a '+' or a '-') on which the motif occurred. Motif hits derived by Transfac are indicated by their matrix accession number, the consensus of this binding site and the instances of this motif in our search. These are further characterized by their positions relative to the consensus sequence of the entire block, by the strand on which the motif occurred and by the corresponding MotifLocator score (in parentheses). The blocks identified by the UCSC genome browser as conserved between mammals and <it>Fugu </it>are marked with 'UCSC', while the blocks detected by our two-step methodology but not present in the UCSC genome browser are indicated with a '-'.</p>
					</tblfn>
				</tbl>
				<tbl id="T4" hint_layout="double">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>List of the significant blocks detected in the <it>scl </it>dataset</p>
					</caption>
					<tblbdy cols="2">
						<r>
							<c ca="left">
								<p>Block</p>
							</c>
							<c ca="left">
								<p>Consensus sequence and possible binding sites</p>
							</c>
						</r>
						<r>
							<c cspan="2">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>scl 1.1 (-)</p>
							</c>
							<c ca="left">
								<p>
									<b>TTGCCAAATTAAAATGAATCATTTGGCCCATAATGGCCGAGGCGCT</b>
								</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Conserved sequence identified in [47]</b>, GCCAAAT: 3-9 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*Putative SKN1 site reported in [47]</b>, AATGAATCATTT: 13-24 +</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00100, 'MTTTATR': 29-35 - (0.917)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>CdxA</b>, M00101, AWTWMTR: 7-13 + (0.901); 8-14 + (0.905); 10-16 + (0.927); 29-35 + (0.927); 29-35 - (0.929); 7-13 - (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*En-1</b>, M00396, GTANTNN: 30-36 + (0.936)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Cap</b>, M00253, NCANHNNN: 19-26 + (0.932); 10-17 - (0.933)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Pbx-1</b>, M00096, ANCAATCAW:14-22 + (0.941)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>AP-1</b>, M00199, NTGASTCAG: 14-22 + (0.913)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>*HOXA3</b>, M00395, CNTANNNKN: 29-37 + (0.927)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><b>Tst-1</b>, M00133, NNKGAATTAVAVTDN: 3-17 + (0.901)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>For each block, the consensus sequence is given followed by the possible binding sites situated in this block: motifs previously described in the literature [48] are marked with an asterisk. The motifs are summarized by their motif name (in bold), by their consensus sequence, if known, as described in the original article, by the sequence of the motif instance in our search, by the positions of the motif instance relative to the consensus sequence of the entire block and by the strand (indicated by a '+' or a '-') on which the motif occurred. Motif hits derived by Transfac are indicated by their matrix accession number, the consensus of this binding site and the instances of this motif in our search. These are further characterized by their positions relative to the consensus sequence of the entire block, by the strand on which the motif occurred and by the corresponding MotifLocator score (in parentheses). The blocks identified by the UCSC genome browser as conserved between mammals and <it>Fugu </it>are marked with 'UCSC', while the blocks detected by our two-step methodology but not present in the UCSC genome browser are indicated with a '-'.</p>
					</tblfn>
				</tbl>
				<p>As a first validation step, we compared our results with the alignments and conserved regions identified by well-established genome browsers, namely the UCSC genome browser <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and the UCR browser <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> (Table <tblr tid="T1">1</tblr>).</p>
				<p>The UCSC genome browser <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> enables access to current genome assemblies; it offers visualizations of several genomic features, such as cross-species homologies <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B51">51</abbr></abbrgrp>. The latter can be viewed as multiple alignments over several species, ranging from closely related mammals to more distantly related species, such as chicken, zebrafish and pufferfish. The multiple alignments were generated with MULTIZ <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Of the conserved 22 blocks we identified by aligning intergenic regions of mammals and <it>Fugu</it>, 16 could also be retrieved from the USCS genome browser (Table <tblr tid="T1">1</tblr>); these are indicated in Tables <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr>, <tblr tid="T4">4</tblr>. The remaining six blocks could only be identified using our two-step approach.</p>
				<p>The set up of the UCR browser <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> is slightly different from the UCSC browser in that it focuses on the detection of ultra-conserved regions (UCRs) only, that is, regions conserved between human, mouse and <it>Fugu</it>. These regions were identified using sequence alignment strategies (BLAT) applied to complete genome sequences without prior data reduction <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B54">54</abbr></abbrgrp>. Although our strategy also identifies regions highly conserved among the species under study, no overlap was detected between our conserved blocks and the UCRs (Table <tblr tid="T1">1</tblr>); that is, in the regions we studied (up to 40 kb intergenic plus 5' untranslated region), no UCRs were located according to the analysis of Sandelin <it>et al. </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The regions the UCR browser identified as ultra-conserved were located much more upstream of the gene compared to the regions we used for our analysis.</p>
				<p>To further validate the detected blocks, we tested whether they contain the motifs that were originally reported by Scemama <it>et al. </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, Kammandel <it>et al. </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and G&#246;ttgens <it>et al. </it><abbrgrp><abbr bid="B48">48</abbr></abbrgrp> for <it>hoxb2</it>, <it>pax6 </it>and <it>scl</it>, respectively (no significant blocks were detected for <it>cfos</it>). The previously described motifs present in the respective blocks are listed in Tables <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr>, <tblr tid="T4">4</tblr> (marked with an asterisk). Of the 17 motifs reported by Scemama <it>et al. </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, 8 were present in the significant <it>hoxb2</it>-blocks (Table <tblr tid="T2">2</tblr>). Five other motifs were present in non-significant blocks. The latter are blocks with scores that fell below the threshold we chose based on the random analysis (see Materials and methods). The four remaining motifs could not be recovered. All motifs described by Kammandel <it>et al. </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp> as conserved among mammalian and <it>Fugu pax6 </it>intergenic regions were recovered by our methodology (Table <tblr tid="T3">3</tblr>). The conserved block detected in the <it>scl </it>dataset contains three of the five motifs previously identified by G&#246;ttgens <it>et al. </it><abbrgrp><abbr bid="B48">48</abbr></abbrgrp> (Table <tblr tid="T4">4</tblr>); a fourth motif was picked up in a non-significant block. One motif was not detected in any of the blocks.</p>
				<p>Besides these blocks containing known motifs, we identified several blocks (three for <it>hoxb2 </it>and eight for <it>pax6</it>) that correspond to conserved regions not previously described in the literature. To validate these blocks, we checked whether they were enriched for yet undescribed regulatory motifs. Hence, we screened all blocks with the Transfac database of vertebrate transcription factor binding sites <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. The result of this screening is summarized in Tables <tblr tid="T2">2</tblr>, <tblr tid="T3">3</tblr>, <tblr tid="T4">4</tblr>. As expected <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B56">56</abbr></abbrgrp>, the conserved blocks we identified contain many potential binding sites; remarkably they tend to be specifically enriched for homeodomain binding sites (in blocks hoxb2 1.1, hoxb2 2.1, hoxb2 2.3, hoxb2 2.4, pax6 1.1, pax6 1.4, pax6 3.1, pax6 3.3 and scl 1.1, homeodomain binding sites were significantly over-represented, with a <it>p</it> value &lt; 10<sup>-8</sup>). For a more detailed description of both the previously described and the new potential regulatory motifs present in the detected blocks, please refer to the Supplementary website <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
				<p>Besides these well-described benchmark datasets, we applied our method to six additional datasets, differing in composition from the benchmark datasets. They all contained a combination of four mammalian sequences (rat, mouse, human, chimp or dog) to be used in the data reduction step and an additional set of sequences originating from more distantly related orthologs (chicken, <it>Fugu</it>, <it>Tetraodon nigroviridis </it>and zebrafish in different combinations) added in the motif detection step. Four of the six additional datasets were derived from genes functioning in developmental regulation, including three homeobox genes (<it>GSH1</it>, <it>Meis2</it>, <it>HOXB5</it>) and one encoding the zinc finger protein EGR3. Besides these regulators involved in development, two genes, <it>PCDH8 </it>and <it>HIV-EP1</it>, were included, which are, according to our knowledge, unrelated to development. PCDH8 is believed to function as a calcium-dependent cell-adhesion protein and HIV-EP1 binds to enhancer elements present in several viral promoters and in a number of cellular promoters such as those of the class I MHC, interleukin-2 receptor, and interferon-beta genes. In the additional datasets involved in development, we detected several strongly conserved blocks: <it>GSH1 </it>contained four blocks that are conserved among human, chimp, mouse, rat and pufferfish (<it>Fugu </it>and <it>Tetraodon</it>); in <it>Meis2</it>, two blocks were recovered that are retained in all organisms under study except for <it>Fugu</it>; and in <it>HOXB5</it>, six strongly conserved blocks were detected in mammals and pufferfish, while the motif seems to have been lost in chicken. In <it>EGR3</it>, two blocks were found conserved in mammals and fish. In the non-developmental related datasets, only in <it>PCDH8 </it>was one large block detected, conserved in human, chimp, mouse, rat, chicken, <it>Tetraodon </it>and <it>Fugu</it>, but not in zebrafish. This shows that conserved regions might also exist in genes not involved in development, although a possible involvement of this additional gene in developmental processes cannot be ruled out. Detailed results of these analyses can be found in Additional data file 1 and in <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Because the motifs in these additional datasets have not been studied as extensively as those of the benchmark datasets, we cannot guarantee all detected blocks are biologically functional.</p>
			</sec>
			<sec>
				<st>
					<p>Evaluation of the developed procedure</p>
				</st>
				<p>To compare the performance of our newly developed two-step strategy to that of other frequently used algorithms, we evaluated to what extent MotifSampler <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, MAVID <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and 'Threaded Blockset Aligner' (TBA) <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> could recover known motifs in our benchmark sets.</p>
				<p>First, we studied the performance of the alignment algorithms MAVID and TBA in detecting conserved regions within our four benchmark datasets. Since MAVID and TBA were originally developed to perform multiple alignments on long sequences, we applied these algorithms to the initial full-length benchmark datasets, that is, the complete mammalian and <it>Fugu </it>intergenics. We evaluated to what extent motifs or conserved regions described in original articles were correctly aligned using either MAVID or TBA. The results are summarized in Table <tblr tid="T5">5</tblr> (MAVID and TBA columns) and in <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
				<tbl id="T5" hint_layout="double">
					<title>
						<p>Table 5</p>
					</title>
					<caption>
						<p>Comparison of two-step procedure with other methodologies</p>
					</caption>
					<tblbdy cols="8">
						<r>
							<c ca="left">
								<p>Gene</p>
							</c>
							<c ca="center">
								<p>Number of motifs</p>
							</c>
							<c ca="center">
								<p>Two-step BS</p>
							</c>
							<c ca="center">
								<p>BS</p>
							</c>
							<c ca="center">
								<p>Two-step MS</p>
							</c>
							<c ca="center">
								<p>MS</p>
							</c>
							<c ca="center">
								<p>MAVID</p>
							</c>
							<c ca="center">
								<p>TBA</p>
							</c>
						</r>
						<r>
							<c cspan="8">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>cfos</it>
								</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>hoxb2</it>
								</p>
							</c>
							<c ca="center">
								<p>17</p>
							</c>
							<c ca="center">
								<p>8 (+5)</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>pax6</it>
								</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>1*</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>scl</it>
								</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>3 (+1)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="center">
								<p>17 (+6)</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Number of motifs: the number of motifs reported by Blanchette and Tompa [26] in <it>cfos</it>, Scemama <it>et al. </it>[46] in <it>hoxb2</it>, Kammandel <it>et al. </it>[47] in <it>pax6 </it>and G&#246;ttgens <it>et al. </it>[48] in <it>scl</it>. Two-step BS: the number of previously described motifs detected by the two-step procedure, combining data reduction and motif detection using BlockSampler. The numbers in parentheses are the number of motifs present in non-significant blocks. BS: the number of previously described motifs detected by BlockSampler in initial full-length datasets. Two-step MS: the number of previously described motifs detected by combining data reduction and motif detection using MotifSampler. MS: the number of previously described motifs detected by MotifSampler in initial full-length datasets. MAVID: the number of previously described motifs detected (correctly aligned) by MAVID. TBA: the number of previously described motifs detected by TBA. *Only part of a motif was detected.</p>
					</tblfn>
				</tbl>
				<p>MAVID alignment of all three <it>cfos </it>datasets (mammalian orthologs combined with each of the three <it>Fugu </it>paralogs) could not recover either of the two motifs previously described by Blanchette and Tompa <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> (Table <tblr tid="T5">5</tblr>). This is in line with our results showing the overall low homology between the <it>cfos </it>mammalian and <it>Fugu </it>orthologs. The MAVID alignment of most of the <it>hoxb2 </it>blocks containing previously described motifs shows that a conserved region in the mammalian intergenic sequences is broken up into small conserved parts interrupted by gaps when aligned to the longer <it>Fugu </it>sequence, resulting in an incorrect alignment of the regulatory motifs: previously reported motifs were not recovered in the MAVID alignment (Table <tblr tid="T5">5</tblr>). Our method performs better because the most heterogeneous sequence is only aligned in a second step, using a highly flexible local alignment procedure (BlockSampler). Regarding <it>pax6</it>, most of the blocks containing previously described motifs were correctly aligned by MAVID and all the motifs described by Kammandel <it>et al. </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp> could be correctly retrieved over all the orthologs under study (Table <tblr tid="T5">5</tblr>). This dataset is probably relatively well suited for MAVID because the mammalian sequences are only twice as large as the pufferfish <it>pax6 </it>intergenic region (Table <tblr tid="T6">6</tblr>). Although the lengths of the intergenic regions in the <it>scl </it>dataset (Table <tblr tid="T6">6</tblr>) are in the same order of magnitude (ranging from 16.5 to 40 kb), MAVID did not succeed in identifying any of the motifs previously described by G&#246;ttgens <it>et al. </it><abbrgrp><abbr bid="B48">48</abbr></abbrgrp> (Figure <figr fid="F3">3</figr>, Table <tblr tid="T5">5</tblr>).</p>
				<tbl id="T6" hint_layout="double">
					<title>
						<p>Table 6</p>
					</title>
					<caption>
						<p>Base pair lengths of the intergenic sequences for each benchmark dataset</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="center">
								<p>Gene</p>
							</c>
							<c ca="center">
								<p>
									<it>Hs</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>Mm</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>Rn</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>Pt</it>
								</p>
							</c>
							<c ca="center">
								<p>
									<it>Fr</it>
								</p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>cfos</it>
								</p>
							</c>
							<c ca="center">
								<p>40,154</p>
							</c>
							<c ca="center">
								<p>33,157</p>
							</c>
							<c ca="center">
								<p>40,132</p>
							</c>
							<c ca="center">
								<p>40,154</p>
							</c>
							<c ca="center">
								<p>3,606*</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>3,606<sup>&#8224;</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>1,244<sup>&#8225;</sup></p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>hoxb2</it>
								</p>
							</c>
							<c ca="center">
								<p>4,973</p>
							</c>
							<c ca="center">
								<p>6,744</p>
							</c>
							<c ca="center">
								<p>7,640</p>
							</c>
							<c ca="center">
								<p>4,878</p>
							</c>
							<c ca="center">
								<p>39,219</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>pax6</it>
								</p>
							</c>
							<c ca="center">
								<p>40,102</p>
							</c>
							<c ca="center">
								<p>40,000</p>
							</c>
							<c ca="center">
								<p>40,000</p>
							</c>
							<c ca="center">
								<p>40,000</p>
							</c>
							<c ca="center">
								<p>21,204</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<it>scl</it>
								</p>
							</c>
							<c ca="center">
								<p>20,981</p>
							</c>
							<c ca="center">
								<p>16,471</p>
							</c>
							<c ca="center">
								<p>20,343</p>
							</c>
							<c ca="center">
								<p>39,999</p>
							</c>
							<c ca="center">
								<p>20,155</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>The <it>Fugu cfos </it>intergenic sequences are derived from *SINFRUG00000132418, <sup>&#8224;</sup>SINFRUG00000132419 and <sup>&#8225;</sup>SINFRUG00000143787. The Ensemble IDs (+ 1 Genebank accession number) are given in [56]. <it>Fr</it>,<it>Fugu rubripes</it>; <it>Hs</it>, <it>Homo sapiens</it>; <it>Mm</it>, <it>Mus musculus</it>; <it>Pt</it>, <it>Pan troglotydes</it>; <it>Rn</it>, <it>Rattus norvegicus</it>.</p>
					</tblfn>
				</tbl>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Comparison of two-step strategy with MAVID for the <it>scl </it>data set <b>(a) </b>Conserved block: alignment of the different <it>scl </it>orthologs</p>
					</caption>
					<text>
						<p>Comparison of two-step strategy with MAVID for the <it>scl </it>data set <b>(a) </b>Conserved block: alignment of the different <it>scl </it>orthologs. The conserved block as identified by BlockSampler - is marked with a boxed area. <b>(b) </b>Visualization of the MAVID alignment of the corresponding region. The dashed line denotes a gap in the alignment. <it>Rn</it>, <it>Rattus norvegicus</it>; <it>Mm</it>, <it>Mus musculus</it>; <it>Pt</it>, <it>Pan troglotydes</it>; <it>Hs</it>, <it>Homo sapiens</it>; <it>Fr</it>, <it>Fugu rubripes</it>.</p>
					</text>
					<graphic file="gb-2005-6-13-r113-3" hint_layout="double"/>
				</fig>
				<p>Although TBA has been shown to outperform MAVID in aligning more divergent sequences <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, applying this alignment tool to the benchmark datasets generated similar results as MAVID: all known <it>pax6</it>-regulating motifs were detected, while motifs present in the other benchmark datasets were not recovered (Table <tblr tid="T5">5</tblr>, TBA column).</p>
				<p>Besides detecting the blocks with previously described motifs, our two-step methodology also discovered blocks (block pax6 2.4, for instance) that could not be recovered when aligning the intergenic sequences with MAVID or TBA <abbrgrp><abbr bid="B44">44</abbr><abbr bid="B57">57</abbr></abbrgrp>.</p>
				<p>Overall, based on our benchmark analysis, the two-step method performs better than MAVID or TBA in identifying conserved blocks in distantly related orthologs: the proposed method is able to recover in our benchmark sets all the known motifs identified by MAVID and TBA but, in addition, finds several previously described motifs ignored by these algorithms (Table <tblr tid="T5">5</tblr>, two-step BS, MAVID and TBA columns). Using the two-step procedure, first selecting strongly conserved orthologous sequences, clearly facilitates alignment with the more divergent (lower overall similarity) sequence.</p>
				<p>We also tested the performance of MotifSampler as an example of a probabilistic motif detection procedure on the unreduced dataset. In this case, only one previously described motif was detected (Table <tblr tid="T5">5</tblr>, MS column). This was to be expected as in unreduced datasets the signal to noise ratio is too high for standard motif detection procedures to give reliable and interpretable results.</p>
				<p>Our two-step procedure includes two adaptations over previous existing methods: first, it allows for a data reduction step; and secondly, we developed a motif detection procedure specifically adapted to the purpose of detecting large conserved blocks (BlockSampler). To assess the relative contribution of each of these adaptations to the overall result, we set up the following experiment: to study the specific influence of the data reduction step, we compared the results of applying BlockSampler to both the unreduced benchmark datasets and the datasets obtained after data reduction. Table <tblr tid="T5">5</tblr> (BS and two-step BS columns) shows the results of this comparison. Overall, the results seem comparable: application of BlockSampler to the complete intergenic sequences results in recovery of 15 of the 30 previously reported motifs (in all four datasets), while the two-step method identified 17. Thus, at first sight, there does not seem to be a major contribution from the data reduction step. A closer look at Table <tblr tid="T5">5</tblr>, however, shows that the positive contribution of the data reduction (increasing the signal-to-noise ratio) is strongly dependent on the lengths of the intergenic sequences to be aligned. A major positive effect is observed for the large <it>pax6 </it>and <it>scl </it>datasets, whereas for the <it>hoxb2 </it>set, in which the sequences under study are rather short, the data reduction does not offer a clear advantage. To assess the specific improvements of using BlockSampler instead of standard motif detection approaches, we compared the results of BlockSampler to those of MotifSampler when both were applied to the reduced datasets. A reduced dataset thus consists of a subcluster of mammalian sequences (Figure <figr fid="F4">4</figr>) and a complete <it>Fugu </it>ortholog. The performance of MotifSampler was far below that of BlockSampler: MotifSampler only detected two previously described motifs (Table <tblr tid="T5">5</tblr>, two-step MS column), both in the <it>hoxb2 </it>set, while BlockSampler recovered 17 previously described motifs (Table <tblr tid="T5">5</tblr>, two-step BS column). Moreover, because MotifSampler searches for short motifs (default eight nucleotides (nt)), it detects many false positive hits. These results show that independent of the data reduction step, BlockSampler is clearly more suited for detecting large conserved blocks than MotifSampler.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Schematic representation of subclusters, that is, clusters of conserved orthologous sequences that contain one region in each ortholog</p>
					</caption>
					<text>
						<p>Schematic representation of subclusters, that is, clusters of conserved orthologous sequences that contain one region in each ortholog. See text for details. <it>Rn</it>, <it>Rattus norvegicus</it>; <it>Mm</it>, <it>Mus musculus</it>; <it>Pt</it>, <it>Pan troglotydes</it>; <it>Hs</it>, <it>Homo sapiens</it>.</p>
					</text>
					<graphic file="gb-2005-6-13-r113-4" hint_layout="double"/>
				</fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>We developed a two-step methodology to search for regions (motifs) conserved over different phylogenetic lineages in long intergenic sequences of heterogeneous size. In a first step, an alignment method is used to select conserved subsequences in intergenic orthologous sequences of comparable size of closely related vertebrate genomes, since these are expected to be enriched for regulatory motifs <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B41">41</abbr></abbrgrp>. The combination of this preselected dataset of conserved sequences and the full-length intergenic sequence of a more distant ortholog, which is more likely to differ in size and overall homology, is subjected to probabilistic motif detection. The preselection step facilitates motif detection by enhancing the signal-to-noise ratio in the dataset. For the second motif detection step we used an extension of a Gibbs sampling based algorithm <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> with a higher performance in detecting large conserved blocks within a set of orthologous sequences. Using the strategy mentioned above, we could combine the advantages of alignment methods, which have been shown to be very suitable for aligning long, highly conserved intergenic sequences, and the probabilistic algorithms for motif detection that usually are more appropriate when looking for smaller regions of conservation (lower degree of similarity).</p>
			<p>We applied this two-step methodology to four well-studied datasets for which functional phylogenetically conserved motifs had been extensively described. Our approach identified most of the previously described motifs. In addition, we detected several blocks not previously described in the literature or not present in any of the two genome browsers (UCSC and UCR) we compared our results with. Because highly conserved blocks most probably consist of consecutive transcription factor binding sites <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B41">41</abbr><abbr bid="B56">56</abbr></abbrgrp>, we screened the conserved blocks with the Transfac motif database <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. These blocks contained abundant copies of homeodomain binding sites. This is not unexpected since most of the genes we were studying function in the regulation of development <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B58">58</abbr></abbrgrp>. These blocks most probably contain, besides the motifs obtained with the Transfac screening, many more motifs not yet annotated in Transfac. Alternatively, they might have other, not yet characterized biological functions, for example, transcripts of unknown function <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
			<p>Some previously described motifs were missed, however, because of the strong selection criteria we used: since regulatory elements tend to be grouped <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B41">41</abbr><abbr bid="B56">56</abbr><abbr bid="B60">60</abbr></abbrgrp>, we assumed that the sequences surrounding a regulatory motif are also conserved (due to the presence of other binding sites). Motifs located in a variable context will probably go undetected.</p>
			<p>By applying our method to additional datasets with configurations different from the benchmark dataset we could demonstrate that our methodology is more generally applicable.</p>
			<p>Comparing the performance of the two-step procedure with that of MAVID and TBA, as representatives of multiple alignment methods, and MotifSampler, as an example of a motif detection method, showed that our approach outperformed these alternative methods when the intergenic sequences became either too long or too heterogeneous in size.</p>
			<p>Additionally, we studied the marginal contribution of the data reduction step and the improved method for motif detection on the final performance of the two-step procedure: overall, BlockSampler performed better than the related algorithm MotifSampler, both on long sequences and on intergenic regions reduced in size. The data reduction step seemed essential when the length of the intergenic sequences to be compared becomes excessive.</p>
			<p>Although our two-step procedure has proven successful, there is still room for improvement, for instance by taking into account the phylogenetic relationships between the sequences under study in the second motif detection step. The contribution of finding a motif in an ortholog to the global motif score could be weighted according to its phylogenetic distance from the other sequences in which the motif is also present. Indeed, this way we would account for the specific composition of a dataset because closely related orthologs are less informative than further related ones. If one wanted to relax the assumption of conserved order of motifs in the first data reduction step, it would suffice to replace AVID in this step with a more local aligner such as BLAT <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Also, our motif detection algorithm could be extended for more advanced background models <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>We developed a two-step approach that combines the advantages of both motif detection and multiple alignment algorithms. It has shown to be well suited for identifying conserved regions in intergenic sequences from distantly related orthologs that show a low overall homology and that are heterogeneous in size. The strength of our approach lies in the combination of data reduction and improved motif detection: the first data reduction step is essential when it concerns long intergenic sequences. BlockSampler, the algorithm used in the second motif detection step, has been shown to be optimally suited to identify large conserved regions among orthologous sequences. Applying our method to benchmark sets showed that, although it recovered most of the motifs/blocks previously described in these datasets, some were missed due to the assumptions underlying our analysis and the stringent selection criteria applied. These results indicate that, given the chosen criteria, our method offers a fully automated analysis flow that is highly specific for detecting motifs conserved over different vertebrate lineages in complete intergenic sequences.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Benchmark datasets</p>
				</st>
				<p>The benchmark datasets were generated as follows. First, a set of orthologous genes was defined using the Ensembl genome browser version 23 <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>. In this study, the benchmark datasets included genes from human (<it>Homo sapiens</it>), mouse (<it>Mus musculus</it>), rat (<it>Rattus norvegicus</it>), chimp (<it>Pan troglodytes</it>) and pufferfish (<it>Fugu rubripes</it>). Regarding the <it>cfos </it>dataset, Ensembl identified three <it>Fugu </it>paralogs - SINFRUG00000132418, SINFRUG00000132419 and SINFRUG00000143787 - that were all included in the analysis. The additional datasets <it>EGR3, GSH1, HIV-EP1, HOXB5, Meis2, PCDH8 </it>contain multiple distantly related orthologs (see <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>).</p>
				<p>Subsequently, the intergenic regions of these orthologs were selected using the Ensembl mart database release 21.1. The region upstream of the transcription start (as defined by Ensembl) was limited to 40 kb. Additionally, the 5' untranslated region was included. Lengths of the respective intergenics are given in Table <tblr tid="T6">6</tblr>; the benchmark datasets, <it>cfos</it>, <it>hoxb2</it>, <it>pax6 </it>and <it>scl </it>can be found in <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The rat <it>cfos </it>ortholog ENSRNOG00000008015, <it>Fugu hoxb2 </it>ortholog SINFRUG00000136637, chimp <it>pax6 </it>ortholog ENSPTRG00000003474, and <it>scl </it>chimp ENSPTRG00000003474 contain long N-stretches, probably as a result of incomplete preliminary annotation.</p>
				<p>Remarkably, where <it>Fugu </it>is known to have a very compact genome <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, the <it>Fugu hoxb2 </it>mentioned above is very long compared to the mammalian <it>hoxb2 </it>intergenic sequences (Table <tblr tid="T6">6</tblr>). This is probably due to the presence of a pseudogene (SINFRUG00000157209) in the intergenic region of SINFRUG00000136637 at circa 5.9 kb from the transcription start site of <it>hoxb2</it>, which was not yet annotated in the release version 23 of ENSEMBL.</p>
				<p>All intergenic sequences were selected as described above, except the intergenic sequence of the <it>Fugu scl </it>ortholog. Because the putative <it>scl </it>ortholog annotated by ENSEMBL (SINFRUG00000145588) did not contain motifs shown to be present in the <it>Fugu scl </it>ortholog by G&#246;ttgens <it>et al. </it><abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, we used the Genbank <it>Fugu scl </it>sequence [Genbank: <ext-link ext-link-type="gen" ext-link-id="AJ131019">AJ131019</ext-link>]. This sequence (referring to a cosmid sequence of circa 33 kb) was also used in the original study of Barton <it>et al. </it><abbrgrp><abbr bid="B63">63</abbr></abbrgrp>. To delineate the intergenic region of <it>scl</it>, we aligned the coding sequence from the <it>scl </it>homolog SINFRUG00000145588 with the AJ131019 sequence using 'blast 2 sequences' <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>. The coding region was located from positions 20,156 to 22,165; we then selected the upstream region (from positions 1 to 20,155).</p>
			</sec>
			<sec>
				<st>
					<p>A two-step procedure for phylogenetic footprinting</p>
				</st>
				<p>A schematic representation of the developed two-step procedure is given in Figure <figr fid="F1">1</figr>.</p>
				<sec>
					<st>
						<p>Step 1: data reduction</p>
					</st>
					<p>In this step, a dataset consisting of the complete intergenic sequences of comparable size originating from orthologs of closely related organisms is reduced to a dataset of preselected sequences conserved among all/most compared orthologs. First, related vertebrate intergenic regions of comparable size (in this study these sequences corresponded to the mammalian human, chimp, rat, mouse and dog sequences) are aligned using the pairwise alignment algorithm AVID (using default parameters) <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. For each ortholog, sequences corresponding to the significantly conserved regions of the pairwise alignment are selected using VISTA <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Significance of the alignment is defined by two parameters (VISTA parameters): the window length (L), the region for which the percent identity is calculated; and the conservation level (C) in the selected window, the minimal percent identity of the aligned region to be considered as significantly conserved. The parameter settings were adapted to the evolutionary distance of the compared organisms. The closer the organisms were related, the higher the threshold on the degree of conservation chosen. The conservation parameters used were: for human-mouse comparison, 85% over 200 nt; human-rat, 85% over 200 nt; human-chimp, 85% over 350 nt; human-dog, 80% over 200 nt; mouse-rat, 85% over 350 nt; mouse-chimp, 85% over 200 nt; mouse-dog, 80% over 200 nt; rat-chimp, 85% over 200 nt; rat-dog, 80% over 200 nt; and chimp-dog, 80% over 200 nt.</p>
					<p>To identify orthologous regions conserved in multiple related vertebrate sequences of comparable size (that is, multiple alignment), homologies between all preselected sequences were determined (using AVID with default parameters). Subsequently, multiple conserved regions were identified using the graph based clustering TribeMCL <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. We chose TribeMCL as this is a well-known graph-based clustering algorithm that was originally designed to recover transitivity relations between biological sequences (that is, orthologous proteins). Each resulting cluster corresponds to a region conserved in multiple sequences and consists of a set of preselected sequences originating from the different related orthologs of comparable size that mutually show a minimal degree of conservation. Several runs of TribeMCL were performed for each dataset, using different values of clustering parameters I and P (see <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>). The parameter I did not seem to have a major influence on the size of the clusters and, therefore, was set at 4. For the <it>P </it>value, three different values were tested per dataset and the parameter that resulted in small tightly linked clusters was chosen as these clusters correspond to strongly conserved regions. The parameters of choice for the benchmark datasets were: for <it>cfos</it>, I = 4 and <it>P </it>= 0; for <it>hoxb2</it>, I = 4 and <it>P </it>= -10; for <it>pax6</it>, I = 4 and <it>P </it>= 0; and for <it>scl</it>, I = 4 and <it>P </it>= -10. Concerning the additional datasets, the parameter setting of choice was I = 4 and <it>P </it>= 0 for <it>EGR3, HIV-EP1, HOXB5, Meis2 </it>and <it>PCHD8 </it>and I = 4 and <it>P </it>= -10 for <it>GSH1</it>.</p>
					<p>Some clusters contain different subsequences derived from the intergenic sequence of a single organism that match one larger sequence of another organism; for example, two subsequences in rat that match one larger sequence in human. To minimize the noise in the datasets used for motif detection, such clusters are split into subclusters. Subclusters contain only a single subsequence of each ortholog (paralog; Figure <figr fid="F4">4</figr>). A subcluster is tagged by a profile containing the IDs of the different subsequences composing this subcluster. The input dataset for motif detection (Figure <figr fid="F1">1</figr>) thus consists of the mammalian subsequences in a subcluster together with the intergenic region of the corresponding <it>Fugu </it>ortholog.</p>
				</sec>
				<sec>
					<st>
						<p>Step 2: Motif detection</p>
					</st>
					<p>To find motifs conserved in the preselected intergenic sequences of orthologous genes, we developed BlockSampler as an extension of MotifSampler <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. In contrast to the previous version of MotifSampler, which could only handle a single background model, in BlockSampler each orthologous intergenic sequence in the input dataset is scored with its appropriate species-specific background model. Previous studies have shown that using the correct species-specific higher order background model improves the reliability of the results <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B68">68</abbr></abbrgrp>. In this study we used species-specific third-order background models.</p>
					<p>The current implementation also allows selecting a user-defined core ortholog. This is the sequence of interest in which the motif should be present (in our case the sequence of heterogeneous length - the <it>Fugu </it>sequence). The idea behind this is that we are interested in motifs present in this core sequence that are supported by their presence in the preselected conserved orthologous regions. In this study, the most divergent <it>Fugu </it>orthologs were chosen as core sequences. The Gibbs sampling procedure searches for a common motif that has exactly one occurrence in the core sequence and no or one occurrence in the remainder of the sequences. After short motif seeds are identified, these are extended using a simple protocol to find larger conserved blocks: if the consensus score over a 5 nt region adjacent to the current motif exceeds a given threshold, the motif is extended with one nucleotide (in that direction). The larger a conserved block, the higher the confidence in the motif.</p>
					<p>BlockSampler was run 100 times for each input set (subcluster plus <it>Fugu </it>ortholog) and corresponding random sets using default parameters; searching plus strand only (s = 0), prior set to 0.2, initial motif length of 8 nt. Only the threshold of the consensus score (default 1.0) was augmented to 1.2, selecting stronger conserved blocks. This generated 100 conserved blocks for each input set. To avoid redundancy, blocks overlapping more than 80% were merged. Concerning the benchmark datasets that consisted of only one distantly related ortholog, namely <it>Fugu</it>, we then selected those blocks that were conserved among all vertebrates under study. When studying more diverse datasets containing multiple distantly related species (with regard to mammals), we relaxed this requirement by allowing a block to be absent from one of the orthologs under study.</p>
					<p>To account for the fact that short blocks are more likely to have a higher degree of conservation than long bocks, consensus scores <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> were compensated for their length. Blocks were then ranked according to this normalized consensus score (Cs<sub>ad</sub>), calculated using the formula Cs<sub>ad </sub>= (L/L+E)Cs, where L is the length of the conserved block, E is an empirical factor (set to 5) and Cs the consensus score.</p>
					<p>To assess the relative individual contributions of the data reduction and motif detection steps to the final result, we applied BlockSampler on the full-length benchmark datasets. We used the same parameter setting as described above but, because of the longer sequence length in the full datasets, we increased the number of runs (1,000 runs for each benchmark dataset). Blocks were selected as described above. The best scoring 10% of the remaining blocks were searched for known motifs.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Randomization</p>
				</st>
				<p>To set a threshold on the adapted consensus score of the blocks (blocks with a score above the threshold are considered relevant), we compared block scores of the genuine set with those of corresponding random sets. For each genuine dataset, 100 random sets were generated. A corresponding random set contains, besides the different homologous regions of the genuine subcluster under study, a random <it>Fugu </it>intergenic sequence. This additional random sequence was not orthologous with the mammalian sequences and thus is unlikely to contain the same motifs. In each random set, motifs were identified using the same procedure as described for the genuine set. For each random set the best scoring motif was selected, that is, the block with the highest normalized consensus score. This resulted in a group of the best scoring 100 false positive motifs. These scores were approximately normally distributed. As a threshold, we choose the 90th percentile of the best scoring random motifs.</p>
			</sec>
			<sec>
				<st>
					<p>Motif validation</p>
				</st>
				<p>For each block we detected, a BLAT search against the human genome (May 2004 assembly) was performed <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B69">69</abbr></abbrgrp>. This linked to the UCSC genome browser <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, where alignments between multiple vertebrate organisms were generated using MULTIZ <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. Subsequently, we checked in the UCR browser <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> whether UCRs were identified in the intergenic regions under study.</p>
				<p>To assess whether known transcription factor binding sites are located in the detected blocks, we compared the consensus sequence of each block with motifs described in the literature. In addition, we scanned the block consensus sequence with the Transfac 6.0 public database of vertebrate transcription factor binding site profiles <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. This scanning was performed using MotifLocator <abbrgrp><abbr bid="B70">70</abbr><abbr bid="B71">71</abbr><abbr bid="B72">72</abbr></abbrgrp> with a 0<sup>th </sup>order vertebrate background model. Hits with a score &gt;0.9 were regarded as potential binding sites. The binding sites are indicated by the Transfac factor name <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>.</p>
				<p>To calculate the statistical over-representation of homeodomain binding sites, 100 sequences were selected randomly from the <it>Fugu </it>genome and screened to make sure they differed from the genes under study. These random sequences were screened with matrix models from homeodomain binding sites (obtained from TRANSFAC 8.2) using MotifLocator, as described above. We calculated the chi-square statistic with Yates correction of the 2 &#215; 2 contingency table test for the set of homeodomain binding sites <abbrgrp><abbr bid="B73">73</abbr></abbrgrp>. Homeobox binding sites were significantly over-represented in a certain block at a <it>p</it> value of 10<sup>-8</sup>.</p>
			</sec>
			<sec>
				<st>
					<p>Performance evaluation</p>
				</st>
				<p>To evaluate our newly developed procedure, we compared its performance to that of two algorithms often used for phylogenetic footprinting, namely the motif detection algorithm MotifSampler <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and the multiple alignment procedures MAVID <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B65">65</abbr></abbrgrp> and TBA <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. These three algorithms were applied to the benchmark datasets and the resulting motifs (conserved in all organisms under study) were compared to those detected by the two-step procedure. We aligned the full-length initial datasets (Table <tblr tid="T6">6</tblr>) <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> using the online MAVID version at <abbrgrp><abbr bid="B74">74</abbr></abbrgrp> with the default parameter setting <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
				<p>Besides MAVID, we used TBA as it has been shown to outperform MAVID <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. All the necessary tools were obtained from the Miller Lab website <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>. To generate a multiple alignment using TBA, we first pairwise aligned the initial datasets using blastz. We used the evolutionary tree ((human chimp)(rat mouse) <it>Fugu</it>); the additional blastz parameter file (latest version) was obtained from the E Margulies ftp site <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>. The final multiple alignment was obtained by running the TBA executable.</p>
				<p>We applied MotifSampler both on the reduced datasets (subcluster + complete <it>Fugu </it>intergenic sequence <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>) and on the complete intergenic sequences (initial datasets). For the reduced sets we performed 100 MotifSampler runs, while for the complete datasets MotifSampler was run 1,000 times, each time using the standard parameter settings of the algorithm: the algorithm searches for only one motif (n = 1) of 8 nt (w = 8) on both strands (s = 1) and the prior probability of 1 motif copy (p) is 0.5. A third order vertebrate background model was used.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> contains the list of significant blocks detected in the six additional datasets and, for each block, the results of the Transfac screening. Additional data file <supplr sid="S2">2</supplr> contains the stand-alone version of BlockSampler. Additional data file <supplr sid="S3">3</supplr> contains the corresponding BlockSampler help file.</p>
			<suppl id="S1">
				<title>
					<p>Additional data file 1</p>
				</title>
				<caption>
					<p>The list of significant blocks detected in the six additional datasets and, for each block, the results of the Transfac screening</p>
				</caption>
				<text>
					<p>The list of significant blocks detected in the six additional datasets and, for each block, the results of the Transfac screening</p>
				</text>
				<file name="gb-2005-6-13-r113-S1.doc">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S2">
				<title>
					<p>Additional data file 2</p>
				</title>
				<caption>
					<p>The stand-alone version of BlockSampler</p>
				</caption>
				<text>
					<p>The stand-alone version of BlockSampler</p>
				</text>
				<file name="gb-2005-6-13-r113-S2.af2">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S3">
				<title>
					<p>Additional data file 3</p>
				</title>
				<caption>
					<p>The corresponding BlockSampler help file</p>
				</caption>
				<text>
					<p>The corresponding BlockSampler help file</p>
				</text>
				<file name="gb-2005-6-13-r113-S3.doc">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>R Van Hellemont is a fellow of the IWT. This work is partially supported by: IWT project GBOU-SQUAD-20160; Research Council KULeuven GOA Mefisto-666, GOA-Ambiorics, EF/05/007 SymBioSys, IDO genetic networks; FWO projects G.0115.01, G.0413.03 and G.0318.05; IUAP V-22 (2002-2006). The authors would like to thank Tine Blomme for help with the identification of orthologs.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Embryonic epsilon and gamma globin genes of a prosimian primate (<it>Galago crassicaudatus</it>). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints.</p>
				</title>
				<aug>
					<au>
						<snm>Tagle</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Koop</snm>
						<fnm>BF</fnm>
					</au>
					<au>
						<snm>Goodman</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Slightom</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Hess</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>RT</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1988</pubdate>
				<volume>203</volume>
				<fpage>439</fpage>
				<lpage>455</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0022-2836(88)90011-3</pubid>
						<pubid idtype="pmpid" link="fulltext">3199442</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Discovery and modeling of transcriptional regulatory regions.</p>
				</title>
				<aug>
					<au>
						<snm>Fickett</snm>
						<fnm>JW</fnm>
					</au>
					<au>
						<snm>Wasserman</snm>
						<fnm>WW</fnm>
					</au>
				</aug>
				<source>Curr Opin Biotechnol</source>
				<pubdate>2000</pubdate>
				<volume>11</volume>
				<fpage>19</fpage>
				<lpage>24</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0958-1669(99)00049-X</pubid>
						<pubid idtype="pmpid" link="fulltext">10679343</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Enrichment of regulatory signals in conserved non-coding genomic sequence.</p>
				</title>
				<aug>
					<au>
						<snm>Levy</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Hannenhalli</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Workman</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>871</fpage>
				<lpage>877</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/17.10.871</pubid>
						<pubid idtype="pmpid" link="fulltext">11673231</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Phylogenetic shadowing of primate sequences to find functional regions of the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Boffelli</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>McAuliffe</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Ovcharenko</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Lewis</snm>
						<fnm>KD</fnm>
					</au>
					<au>
						<snm>Ovcharenko</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Pachter</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2003</pubdate>
				<volume>299</volume>
				<fpage>1391</fpage>
				<lpage>1394</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1081331</pubid>
						<pubid idtype="pmpid" link="fulltext">12610304</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci.</p>
				</title>
				<aug>
					<au>
						<snm>Chapman</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Donaldson</snm>
						<fnm>IJ</fnm>
					</au>
					<au>
						<snm>Gilbert</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Grafham</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Rogers</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Green</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Gottgens</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>313</fpage>
				<lpage>318</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">327107</pubid>
						<pubid idtype="pmpid" link="fulltext">14718377</pubid>
						<pubid idtype="doi">10.1101/gr.1759004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
				</title>
				<aug>
					<au>
						<snm>Thompson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Gibson</snm>
						<fnm>TJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1994</pubdate>
				<volume>22</volume>
				<fpage>4673</fpage>
				<lpage>4680</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">308517</pubid>
						<pubid idtype="pmpid">7984417</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>DIALIGN: finding local similarities by multiple sequence alignment.</p>
				</title>
				<aug>
					<au>
						<snm>Morgenstern</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Frech</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Dress</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Werner</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>290</fpage>
				<lpage>294</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/14.3.290</pubid>
						<pubid idtype="pmpid" link="fulltext">9614273</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.</p>
				</title>
				<aug>
					<au>
						<snm>Morgenstern</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1999</pubdate>
				<volume>15</volume>
				<fpage>211</fpage>
				<lpage>218</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/15.3.211</pubid>
						<pubid idtype="pmpid" link="fulltext">10222408</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>MAVID multiple alignment server.</p>
				</title>
				<aug>
					<au>
						<snm>Bray</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Pachter</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>3525</fpage>
				<lpage>3526</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">169029</pubid>
						<pubid idtype="pmpid" link="fulltext">12824358</pubid>
						<pubid idtype="doi">10.1093/nar/gkg623</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>MAVID: constrained ancestral alignment of multiple sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Bray</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Pachter</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>693</fpage>
				<lpage>699</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">383315</pubid>
						<pubid idtype="pmpid" link="fulltext">15060012</pubid>
						<pubid idtype="doi">10.1101/gr.1960404</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.</p>
				</title>
				<aug>
					<au>
						<snm>Brudno</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Do</snm>
						<fnm>CB</fnm>
					</au>
					<au>
						<snm>Cooper</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Davydov</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Green</snm>
						<fnm>ED</fnm>
					</au>
					<au>
						<snm>Sidow</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Batzoglou</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>721</fpage>
				<lpage>731</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">430158</pubid>
						<pubid idtype="pmpid" link="fulltext">12654723</pubid>
						<pubid idtype="doi">10.1101/gr.926603</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Identification of a gadd45beta 3' enhancer that mediates SMAD3- and SMAD4-dependent transcriptional induction by transforming growth factor beta.</p>
				</title>
				<aug>
					<au>
						<snm>Major</snm>
						<fnm>MB</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>DA</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>2004</pubdate>
				<volume>279</volume>
				<fpage>5278</fpage>
				<lpage>5287</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.M311517200</pubid>
						<pubid idtype="pmpid" link="fulltext">14630914</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Identification of regulatory regions which confer muscle-specific gene expression.</p>
				</title>
				<aug>
					<au>
						<snm>Wasserman</snm>
						<fnm>WW</fnm>
					</au>
					<au>
						<snm>Fickett</snm>
						<fnm>JW</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1998</pubdate>
				<volume>278</volume>
				<fpage>167</fpage>
				<lpage>181</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1998.1700</pubid>
						<pubid idtype="pmpid" link="fulltext">9571041</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Novel vertebrate genes and putative regulatory elements identified at kidney disease and NR2E1/fierce loci.</p>
				</title>
				<aug>
					<au>
						<snm>Abrahams</snm>
						<fnm>BS</fnm>
					</au>
					<au>
						<snm>Mak</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Berry</snm>
						<fnm>ML</fnm>
					</au>
					<au>
						<snm>Palmquist</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Saionz</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Tay</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Tan</snm>
						<fnm>YH</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Simpson</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Venkatesh</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Genomics</source>
				<pubdate>2002</pubdate>
				<volume>80</volume>
				<fpage>45</fpage>
				<lpage>53</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/geno.2002.6795</pubid>
						<pubid idtype="pmpid" link="fulltext">12079282</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Detecting conserved regulatory elements with the model genome of the Japanese puffer fish, <it>Fugu rubripes</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Aparicio</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Morrison</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Gould</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Gilthorpe</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Chaudhuri</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Rigby</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Krumlauf</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1995</pubdate>
				<volume>92</volume>
				<fpage>1684</fpage>
				<lpage>1688</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">42584</pubid>
						<pubid idtype="pmpid" link="fulltext">7878040</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Comparative genomics of the SOX9 region in human and <it>Fugu rubripes</it>: conservation of short regulatory sequence elements within large intergenic regions.</p>
				</title>
				<aug>
					<au>
						<snm>Bagheri-Fam</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ferraz</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Demaille</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Scherer</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Pfeifer</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Genomics</source>
				<pubdate>2001</pubdate>
				<volume>78</volume>
				<fpage>73</fpage>
				<lpage>82</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/geno.2001.6648</pubid>
						<pubid idtype="pmpid" link="fulltext">11707075</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Comparative analysis of the ETV6 gene in vertebrate genomes from pufferfish to human.</p>
				</title>
				<aug>
					<au>
						<snm>Montpetit</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Sinnett</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Oncogene</source>
				<pubdate>2001</pubdate>
				<volume>20</volume>
				<fpage>3437</fpage>
				<lpage>3442</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/sj.onc.1204444</pubid>
						<pubid idtype="pmpid" link="fulltext">11423994</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Scanning human gene deserts for long-range enhancers.</p>
				</title>
				<aug>
					<au>
						<snm>Nobrega</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Ovcharenko</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Afzal</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2003</pubdate>
				<volume>302</volume>
				<fpage>413</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1088328</pubid>
						<pubid idtype="pmpid" link="fulltext">14563999</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters.</p>
				</title>
				<aug>
					<au>
						<snm>Santini</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Boore</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Meyer</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1111</fpage>
				<lpage>1122</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403639</pubid>
						<pubid idtype="pmpid" link="fulltext">12799348</pubid>
						<pubid idtype="doi">10.1101/gr.700503</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Sandelin</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Bailey</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Bruce</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Engstrom</snm>
						<fnm>PG</fnm>
					</au>
					<au>
						<snm>Klos</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Wasserman</snm>
						<fnm>WW</fnm>
					</au>
					<au>
						<snm>Ericson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lenhard</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>BMC Genomics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>99</fpage>
				<lpage>107</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">544600</pubid>
						<pubid idtype="pmpid" link="fulltext">15613238</pubid>
						<pubid idtype="doi">10.1186/1471-2164-5-99</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Highly conserved non-coding sequences are associated with vertebrate development.</p>
				</title>
				<aug>
					<au>
						<snm>Woolfe</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Goodson</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Goode</snm>
						<fnm>DK</fnm>
					</au>
					<au>
						<snm>Snell</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>McEwen</snm>
						<fnm>GK</fnm>
					</au>
					<au>
						<snm>Vavouri</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>North</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Callaway</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Kelly</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>PLoS Biol</source>
				<pubdate>2005</pubdate>
				<volume>3</volume>
				<fpage>e7.0116</fpage>
				<lpage>e7.0130</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1371/journal.pbio.0030007</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Whole-genome shotgun assembly and analysis of the genome of <it>Fugu rubripes</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Aparicio</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Chapman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Stupka</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Putnam</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Chia</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Dehal</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Christoffels</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rash</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Hoon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Smit</snm>
						<fnm>A</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>297</volume>
				<fpage>1301</fpage>
				<lpage>1310</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1072104</pubid>
						<pubid idtype="pmpid" link="fulltext">12142439</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Characterization of the pufferfish (<it>Fugu</it>) genome as a compact model vertebrate genome.</p>
				</title>
				<aug>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Elgar</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Sandford</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Macrae</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Venkatesh</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Aparicio</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1993</pubdate>
				<volume>366</volume>
				<fpage>265</fpage>
				<lpage>268</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/366265a0</pubid>
						<pubid idtype="pmpid" link="fulltext">8232585</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p><it>Fugu</it>: a compact vertebrate reference genome.</p>
				</title>
				<aug>
					<au>
						<snm>Venkatesh</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Gilligan</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>2000</pubdate>
				<volume>476</volume>
				<fpage>3</fpage>
				<lpage>7</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(00)01659-8</pubid>
						<pubid idtype="pmpid" link="fulltext">10878239</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach.</p>
				</title>
				<aug>
					<au>
						<snm>Elemento</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Tavazoie</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>1</fpage>
				<lpage>R18</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1186/gb-2005-6-2-r18</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Discovery of regulatory elements by a computational method for phylogenetic footprinting.</p>
				</title>
				<aug>
					<au>
						<snm>Blanchette</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Tompa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>739</fpage>
				<lpage>748</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">186562</pubid>
						<pubid idtype="pmpid" link="fulltext">11997340</pubid>
						<pubid idtype="doi">10.1101/gr.6902</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>FootPrinter: A program designed for phylogenetic footprinting.</p>
				</title>
				<aug>
					<au>
						<snm>Blanchette</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Tompa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>3840</fpage>
				<lpage>3842</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">169012</pubid>
						<pubid idtype="pmpid" link="fulltext">12824433</pubid>
						<pubid idtype="doi">10.1093/nar/gkg606</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>The value of prior knowledge in discovering motifs with MEME.</p>
				</title>
				<aug>
					<au>
						<snm>Bailey</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Elkan</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Proc Int Conf Intell Syst Mol Biol</source>
				<pubdate>1995</pubdate>
				<volume>3</volume>
				<fpage>21</fpage>
				<lpage>29</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7584439</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Identification of consensus patterns in unaligned DNA sequences known to be functionally related.</p>
				</title>
				<aug>
					<au>
						<snm>Hertz</snm>
						<fnm>GZ</fnm>
					</au>
					<au>
						<snm>Hartzell</snm>
						<fnm>GW</fnm>
						<suf>III</suf>
					</au>
					<au>
						<snm>Stormo</snm>
						<fnm>GD</fnm>
					</au>
				</aug>
				<source>Comput Appl Biosci</source>
				<pubdate>1990</pubdate>
				<volume>6</volume>
				<fpage>81</fpage>
				<lpage>92</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2193692</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Hertz</snm>
						<fnm>GZ</fnm>
					</au>
					<au>
						<snm>Stormo</snm>
						<fnm>GD</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1999</pubdate>
				<volume>15</volume>
				<fpage>563</fpage>
				<lpage>577</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/15.7.563</pubid>
						<pubid idtype="pmpid" link="fulltext">10487864</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Boguski</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>JS</fnm>
					</au>
					<au>
						<snm>Neuwald</snm>
						<fnm>AF</fnm>
					</au>
					<au>
						<snm>Wootton</snm>
						<fnm>JC</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1993</pubdate>
				<volume>262</volume>
				<fpage>208</fpage>
				<lpage>214</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8211139</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>McCue</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Carmack</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ryan</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>JS</fnm>
					</au>
					<au>
						<snm>Derbyshire</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Lawrence</snm>
						<fnm>CE</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>774</fpage>
				<lpage>782</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">30389</pubid>
						<pubid idtype="pmpid" link="fulltext">11160901</pubid>
						<pubid idtype="doi">10.1093/nar/29.3.774</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Surveying <it>Saccharomyces </it>genomes to identify functional elements by comparative DNA sequence analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Cliften</snm>
						<fnm>PF</fnm>
					</au>
					<au>
						<snm>Hillier</snm>
						<fnm>LW</fnm>
					</au>
					<au>
						<snm>Fulton</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Graves</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Miner</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Gish</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Waterston</snm>
						<fnm>RH</fnm>
					</au>
					<au>
						<snm>Johnston</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>1175</fpage>
				<lpage>1186</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.182901</pubid>
						<pubid idtype="pmpid" link="fulltext">11435399</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Computational identification of cis-regulatory elements associated with groups of functionally related genes in <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Hughes</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Estep</snm>
						<fnm>PW</fnm>
					</au>
					<au>
						<snm>Tavazoie</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>GM</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>296</volume>
				<fpage>1205</fpage>
				<lpage>1214</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2000.3519</pubid>
						<pubid idtype="pmpid" link="fulltext">10698627</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>ANN-Spec: a method for discovering transcription factor binding sites with improved specificity.</p>
				</title>
				<aug>
					<au>
						<snm>Workman</snm>
						<fnm>CT</fnm>
					</au>
					<au>
						<snm>Stormo</snm>
						<fnm>GD</fnm>
					</au>
				</aug>
				<source>Pac Symp Biocomput</source>
				<pubdate>2000</pubdate>
				<fpage>467</fpage>
				<lpage>478</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10902194</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes.</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Brutlag</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>JS</fnm>
					</au>
				</aug>
				<source>Pac Symp Biocomput</source>
				<pubdate>2001</pubdate>
				<fpage>127</fpage>
				<lpage>138</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11262934</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling.</p>
				</title>
				<aug>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Lescot</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Rombauts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Rouze</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Moreau</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>1113</fpage>
				<lpage>1122</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/17.12.1113</pubid>
						<pubid idtype="pmpid" link="fulltext">11751219</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>INCLUSive: integrated clustering, upstream sequence retrieval and motif sampling.</p>
				</title>
				<aug>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Moreau</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>De Smet</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Mathys</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lescot</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rombauts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Rouze</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>331</fpage>
				<lpage>332</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.2.331</pubid>
						<pubid idtype="pmpid" link="fulltext">11847086</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>A Gibbs sampling method to detect overrepresented motifs in the upstream regions of coexpressed genes.</p>
				</title>
				<aug>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Lescot</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rombauts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Rouze</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Moreau</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>J Comput Biol</source>
				<pubdate>2002</pubdate>
				<volume>9</volume>
				<fpage>447</fpage>
				<lpage>464</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1089/10665270252935566</pubid>
						<pubid idtype="pmpid" link="fulltext">12015892</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>Assessing computational tools for the discovery of transcription factor binding sites.</p>
				</title>
				<aug>
					<au>
						<snm>Tompa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Bailey</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Eskin</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Favorov</snm>
						<fnm>AV</fnm>
					</au>
					<au>
						<snm>Frith</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Fu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2005</pubdate>
				<volume>23</volume>
				<fpage>137</fpage>
				<lpage>144</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt1053</pubid>
						<pubid idtype="pmpid" link="fulltext">15637633</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Insights from human/mouse genome comparisons.</p>
				</title>
				<aug>
					<au>
						<snm>Pennacchio</snm>
						<fnm>LA</fnm>
					</au>
				</aug>
				<source>Mamm Genome</source>
				<pubdate>2003</pubdate>
				<volume>14</volume>
				<fpage>429</fpage>
				<lpage>436</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s00335-002-4001-1</pubid>
						<pubid idtype="pmpid" link="fulltext">12925891</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Small is beautiful: comparative genomics with the pufferfish (<it>Fugu rubripes</it>).</p>
				</title>
				<aug>
					<au>
						<snm>Elgar</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Sandford</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Aparicio</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Macrae</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Venkatesh</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>1996</pubdate>
				<volume>12</volume>
				<fpage>145</fpage>
				<lpage>150</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0168-9525(96)10018-4</pubid>
						<pubid idtype="pmpid" link="fulltext">8901419</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p><it>In silico </it>identification of metazoan transcriptional regulatory regions.</p>
				</title>
				<aug>
					<au>
						<snm>Wasserman</snm>
						<fnm>WW</fnm>
					</au>
					<au>
						<snm>Krivan</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Naturwissenschaften</source>
				<pubdate>2003</pubdate>
				<volume>90</volume>
				<fpage>156</fpage>
				<lpage>166</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12712249</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>Supplementary Website</p>
				</title>
				<url>http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Information_VanHel_2005/SuppWebsite.html</url>
			</bibl>
			<bibl id="B45">
				<title>
					<p><it>In silico </it>identification and experimental validation of PmrAB targets in <it>Salmonella typhimurium </it>by regulatory motif detection.</p>
				</title>
				<aug>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>De Keersmaecker</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Monsieurs</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>van Boxel</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Lemmens</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Vanderleyden</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R9.1</fpage>
				<lpage>R9.20</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1186/gb-2004-5-2-r9</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>Evolutionary divergence of vertebrate Hoxb2 expression patterns and transcriptional regulatory loci.</p>
				</title>
				<aug>
					<au>
						<snm>Scemama</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Hunter</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>McCallum</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Prince</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Stellwag</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>J Exp Zool</source>
				<pubdate>2002</pubdate>
				<volume>294</volume>
				<fpage>285</fpage>
				<lpage>299</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/jez.90009</pubid>
						<pubid idtype="pmpid" link="fulltext">12362434</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>Distinct cis-essential modules direct the time-space pattern of the Pax6 gene activity.</p>
				</title>
				<aug>
					<au>
						<snm>Kammandel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Chowdhury</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Stoykova</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Aparicio</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gruss</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Dev Biol</source>
				<pubdate>1999</pubdate>
				<volume>205</volume>
				<fpage>79</fpage>
				<lpage>97</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/dbio.1998.9128</pubid>
						<pubid idtype="pmpid" link="fulltext">9882499</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Transcriptional regulation of the stem cell leukemia gene (SCL) - comparative analysis of five vertebrate SCL loci.</p>
				</title>
				<aug>
					<au>
						<snm>Gottgens</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Barton</snm>
						<fnm>LM</fnm>
					</au>
					<au>
						<snm>Chapman</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Sinclair</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Knudsen</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Grafham</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Gilbert</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Rogers</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Bentley</snm>
						<fnm>DR</fnm>
					</au>
					<au>
						<snm>Green</snm>
						<fnm>AR</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>749</fpage>
				<lpage>759</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">186570</pubid>
						<pubid idtype="pmpid" link="fulltext">11997341</pubid>
						<pubid idtype="doi">10.1101/gr.45502</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>The human genome browser at UCSC.</p>
				</title>
				<aug>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Sugnet</snm>
						<fnm>CW</fnm>
					</au>
					<au>
						<snm>Furey</snm>
						<fnm>TS</fnm>
					</au>
					<au>
						<snm>Roskin</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>Pringle</snm>
						<fnm>TH</fnm>
					</au>
					<au>
						<snm>Zahler</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Haussler</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>996</fpage>
				<lpage>1006</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">186604</pubid>
						<pubid idtype="pmpid" link="fulltext">12045153</pubid>
						<pubid idtype="doi">10.1101/gr.229102. Article published online before print in May 2002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>UCSC Genome Browser</p>
				</title>
				<url>http://genome.ucsc.edu/</url>
			</bibl>
			<bibl id="B51">
				<title>
					<p>The UCSC Genome Browser Database.</p>
				</title>
				<aug>
					<au>
						<snm>Karolchik</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Baertsch</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Diekhans</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Furey</snm>
						<fnm>TS</fnm>
					</au>
					<au>
						<snm>Hinrichs</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Lu</snm>
						<fnm>YT</fnm>
					</au>
					<au>
						<snm>Roskin</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>Schwartz</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sugnet</snm>
						<fnm>CW</fnm>
					</au>
					<au>
						<snm>Thomas</snm>
						<fnm>DJ</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>51</fpage>
				<lpage>54</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">165576</pubid>
						<pubid idtype="pmpid" link="fulltext">12519945</pubid>
						<pubid idtype="doi">10.1093/nar/gkg129</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B52">
				<title>
					<p>Aligning multiple genomic sequences with the threaded blockset aligner.</p>
				</title>
				<aug>
					<au>
						<snm>Blanchette</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Riemer</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Elnitski</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Smit</snm>
						<fnm>AF</fnm>
					</au>
					<au>
						<snm>Roskin</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>Baertsch</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Rosenbloom</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Clawson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Green</snm>
						<fnm>ED</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>708</fpage>
				<lpage>715</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">383317</pubid>
						<pubid idtype="pmpid" link="fulltext">15060014</pubid>
						<pubid idtype="doi">10.1101/gr.1933104</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>UCR Browser</p>
				</title>
				<url>http://mordor.cgb.ki.se/UCRbrowse/</url>
			</bibl>
			<bibl id="B54">
				<title>
					<p>BLAT - the BLAST-like alignment tool.</p>
				</title>
				<aug>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>656</fpage>
				<lpage>664</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">187518</pubid>
						<pubid idtype="pmpid" link="fulltext">11932250</pubid>
						<pubid idtype="doi">10.1101/gr.229202. Article published online before March 2002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B55">
				<title>
					<p>The TRANSFAC system on gene expression regulation.</p>
				</title>
				<aug>
					<au>
						<snm>Wingender</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Fricke</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Geffers</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Hehl</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Liebich</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Krull</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Matys</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Michael</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Ohnhauser</snm>
						<fnm>R</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>281</fpage>
				<lpage>283</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29801</pubid>
						<pubid idtype="pmpid" link="fulltext">11125113</pubid>
						<pubid idtype="doi">10.1093/nar/29.1.281</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>Identification and characterization of multi-species conserved sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Margulies</snm>
						<fnm>EH</fnm>
					</au>
					<au>
						<snm>Blanchette</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Haussler</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Green</snm>
						<fnm>ED</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>2507</fpage>
				<lpage>2518</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403793</pubid>
						<pubid idtype="pmpid" link="fulltext">14656959</pubid>
						<pubid idtype="doi">10.1101/gr.1602203</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B57">
				<title>
					<p>Comparative genomics: genome-wide analysis in metazoan eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Ureta-Vidal</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Ettwiller</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Birney</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>251</fpage>
				<lpage>262</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrg1043</pubid>
						<pubid idtype="pmpid" link="fulltext">12671656</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B58">
				<title>
					<p>Comparative genomics at the vertebrate extremes.</p>
				</title>
				<aug>
					<au>
						<snm>Boffelli</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Nobrega</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>456</fpage>
				<lpage>465</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrg1350</pubid>
						<pubid idtype="pmpid" link="fulltext">15153998</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B59">
				<title>
					<p>The ENCODE (ENCyclopedia Of DNA Elements) Project.</p>
				</title>
				<aug>
					<au>
						<cnm>Encode Project Consortium</cnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2004</pubdate>
				<volume>306</volume>
				<fpage>636</fpage>
				<lpage>640</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1105136</pubid>
						<pubid idtype="pmpid" link="fulltext">15499007</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>Comparative analyses of multi-species sequences from targeted genomic regions.</p>
				</title>
				<aug>
					<au>
						<snm>Thomas</snm>
						<fnm>JW</fnm>
					</au>
					<au>
						<snm>Touchman</snm>
						<fnm>JW</fnm>
					</au>
					<au>
						<snm>Blakesley</snm>
						<fnm>RW</fnm>
					</au>
					<au>
						<snm>Bouffard</snm>
						<fnm>GG</fnm>
					</au>
					<au>
						<snm>Beckstrom-Sternberg</snm>
						<fnm>SM</fnm>
					</au>
					<au>
						<snm>Margulies</snm>
						<fnm>EH</fnm>
					</au>
					<au>
						<snm>Blanchette</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Siepel</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Thomas</snm>
						<fnm>PJ</fnm>
					</au>
					<au>
						<snm>McDowell</snm>
						<fnm>JC</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2003</pubdate>
				<volume>424</volume>
				<fpage>788</fpage>
				<lpage>793</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01858</pubid>
						<pubid idtype="pmpid" link="fulltext">12917688</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B61">
				<title>
					<p>NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.</p>
				</title>
				<aug>
					<au>
						<snm>Down</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Hubbard</snm>
						<fnm>TJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>1445</fpage>
				<lpage>1453</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1064142</pubid>
						<pubid idtype="pmpid" link="fulltext">15760844</pubid>
						<pubid idtype="doi">10.1093/nar/gki282</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B62">
				<title>
					<p>Ensembl Genome Browser</p>
				</title>
				<url>http://www.ensembl.org</url>
			</bibl>
			<bibl id="B63">
				<title>
					<p>Regulation of the stem cell leukemia (SCL) gene: a tale of two fishes.</p>
				</title>
				<aug>
					<au>
						<snm>Barton</snm>
						<fnm>LM</fnm>
					</au>
					<au>
						<snm>G&#246;ttgens</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Gering</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Gilbert</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Grafham</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Rogers</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Bentley</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Patient</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Green</snm>
						<fnm>AR</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>6747</fpage>
				<lpage>6752</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">34424</pubid>
						<pubid idtype="pmpid" link="fulltext">11381108</pubid>
						<pubid idtype="doi">10.1073/pnas.101532998</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B64">
				<title>
					<p>BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusova</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
				</aug>
				<source>FEMS Microbiol Lett</source>
				<pubdate>1999</pubdate>
				<volume>174</volume>
				<fpage>247</fpage>
				<lpage>250</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1097(99)00149-4</pubid>
						<pubid idtype="pmpid" link="fulltext">10339815</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B65">
				<title>
					<p>AVID: A global alignment program.</p>
				</title>
				<aug>
					<au>
						<snm>Bray</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Dubchak</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Pachter</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>97</fpage>
				<lpage>102</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">430967</pubid>
						<pubid idtype="pmpid" link="fulltext">12529311</pubid>
						<pubid idtype="doi">10.1101/gr.789803</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B66">
				<title>
					<p>VISTA: computational tools for comparative genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Frazer</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Pachter</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Poliakov</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Dubchak</snm>
						<fnm>I</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>W273</fpage>
				<lpage>W279</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">441596</pubid>
						<pubid idtype="pmpid" link="fulltext">15215394</pubid>
						<pubid idtype="doi">10.1093/nar/gkh053</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B67">
				<title>
					<p>An efficient algorithm for large-scale detection of protein families.</p>
				</title>
				<aug>
					<au>
						<snm>Enright</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Van Dongen</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>1575</fpage>
				<lpage>1584</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">101833</pubid>
						<pubid idtype="pmpid" link="fulltext">11917018</pubid>
						<pubid idtype="doi">10.1093/nar/30.7.1575</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B68">
				<title>
					<p>Genome-specific higher-order background models to improve motif detection.</p>
				</title>
				<aug>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>De Keersmaecker</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Monsieurs</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Vanderleyden</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Trends Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>11</volume>
				<fpage>61</fpage>
				<lpage>66</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0966-842X(02)00030-6</pubid>
						<pubid idtype="pmpid" link="fulltext">12598125</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B69">
				<title>
					<p>BLAT Search Genome</p>
				</title>
				<url>http://genome.ucsc.edu/cgi-bin/hgBlat</url>
			</bibl>
			<bibl id="B70">
				<title>
					<p>Toucan: deciphering the cis-regulatory logic of coregulated genes.</p>
				</title>
				<aug>
					<au>
						<snm>Aerts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Coessens</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Staes</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Moreau</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>1753</fpage>
				<lpage>1764</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">152870</pubid>
						<pubid idtype="pmpid" link="fulltext">12626717</pubid>
						<pubid idtype="doi">10.1093/nar/gkg268</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B71">
				<title>
					<p>INCLUSive: A web portal and service registry for microarray and regulatory sequence analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Coessens</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Aerts</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Marchal</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>De Smet</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Engelen</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Glenisson</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Moreau</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Mathys</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>3468</fpage>
				<lpage>3470</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">169021</pubid>
						<pubid idtype="pmpid" link="fulltext">12824346</pubid>
						<pubid idtype="doi">10.1093/nar/gkg615</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B72">
				<title>
					<p>BioI@SCD Software</p>
				</title>
				<url>http://homes.esat.kuleuven.be/~dna/Bioi/Software.html</url>
			</bibl>
			<bibl id="B73">
				<title>
					<p>Identifying combinatorial regulation of transcription factors and binding motifs.</p>
				</title>
				<aug>
					<au>
						<snm>Kato</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hata</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Banerjee</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Futcher</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>MQ</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R56.1</fpage>
				<lpage>R56.13</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1186/gb-2004-5-8-r56</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B74">
				<title>
					<p>MAVID Multiple Alignment Server</p>
				</title>
				<url>http://baboon.math.berkeley.edu/mavid/</url>
			</bibl>
			<bibl id="B75">
				<title>
					<p>Miller Lab</p>
				</title>
				<url>http://bio.cse.psu.edu/</url>
			</bibl>
			<bibl id="B76">
				<title>
					<p>E Margulies FTP Site</p>
				</title>
				<url>ftp://kronos.nhgri.nih.gov/pub/outgoing/elliott/tba/</url>
			</bibl>
		</refgrp>
	</bm>
</art>
