<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-7-76</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Software</dochead>
		<bibl>
			<title>
				<p>BIPAD: A web server for modeling bipartite sequence elements</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Bi</snm>
					<fnm>Chengpeng</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>cbi@cmh.edu</email>
				</au>
				<au ca="yes" id="A2">
					<snm>Rogan</snm>
					<mi>K</mi>
					<fnm>Peter</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>progan@cmh.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Laboratory of Human Molecular Genetics, Children's Mercy Hospital &amp; Clinics, 2401 Gillham Road, Kansas City, MO 64108, USA</p>
				</ins>
				<ins id="I2">
					<p>School of Computer Science and Engineering, University of Missouri-Kansas City, 5115 Oak St., MO 64110, USA</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<issn>1471-2105</issn>
			<pubdate>2006</pubdate>
			<volume>7</volume>
			<issue>1</issue>
			<fpage>76</fpage>
			<url>http://www.biomedcentral.com/1471-2105/7/76</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16503993</pubid><pubid idtype="doi">10.1186/1471-2105-7-76</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>08</day>
					<month>9</month>
					<year>2005</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>17</day>
					<month>2</month>
					<year>2006</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>17</day>
					<month>2</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Bi and Rogan; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Many dimeric protein complexes bind cooperatively to families of bipartite nucleic acid sequence elements, which consist of pairs of conserved half-site sequences separated by intervening distances that vary among individual sites.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We introduce the Bipad Server <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, a web interface to predict sequence elements embedded within unaligned sequences. Either a bipartite model, consisting of a pair of one-block position weight matrices (PWM's) with a gap distribution, or a single PWM matrix for contiguous single block motifs may be produced. The Bipad program performs multiple local alignment by entropy minimization and cyclic refinement using a stochastic greedy search strategy. The best models are refined by maximizing incremental information contents among a set of potential models with varying half site and gap lengths.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>The web service generates information positional weight matrices, identifies binding site motifs, graphically represents the set of discovered elements as a sequence logo, and depicts the gap distribution as a histogram. Server performance was evaluated by generating a collection of bipartite models for distinct DNA binding proteins.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Dimeric transcription factors often bind to bipartite genomic sequence elements (TFBS) in promoters, which are composed of two adjacent degenerate motifs with four possible orientations, separated by a flexible nucleotide spacer of unspecified sequence <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. For example, nuclear receptor transcription factors, which form homo- or heterodimeric complexes, can potentiate transcription of downstream target genes by binding of degenerate bipartite sites that display partial internal sequence symmetry <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Characterization of these motifs, locating these sites, orientations and estimation of their binding affinities is crucial to understanding transcriptional responses to developmental and environmental cues.</p>
			<p>Bipartite sequence patterns can be discovered by <it>de novo </it>methods that enumerate, such as spaced dyad <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and structured motif <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp> algorithms, and with position weight matrices (PWM), such as those used by BioProspector <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and Bipad <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B9">9</abbr></abbrgrp>. Given a set of unaligned DNA sequences sharing a common bipartite or single-block pattern, the Bipad algorithm finds such patterns that maximizes total information content <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, which is computed from the information contents for left- and right motifs, and a gap penalty based on the surprisal function. The site information contents are related to their binding strengths <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, which can then be verified in the laboratory <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Bipad simultaneously searches each of the four possible types of orientations (see below) for a given bipartite pattern. The single-block motif can be treated as a bipartite pattern with zero length nucleotide gap between half-sites. Bipad outputs two PWM matrices for half-site models and associated gap distribution for a bipartite pattern search or one PWM matrix for one-block motif. Using a stochastic greedy search strategy driven by a set of randomly seeds (bipartite coordinates), Bipad performs multiple local alignment and cyclic refinement of the search operation for a specified number of cycles. Additional cycles lead the search toward the preferred solution and reduce the likelihood of producing inferior alignments that may arise during a single cycle.</p>
			<p>Bipad performed equivalently or better in both <it>de novo </it>single-block and bipartite motif discovery, particularly for sites with conserved binding sequences that are present on both strands <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. We benchmarked Bipad against other popular <it>de novo </it>local alignment software, including GLAM, Gibbs Sampler, and CONSENSUS, using experimentally-verified <it>E. coli </it>CRP binding sites. Bipad exhibited better sensitivity and specificity than Gibbs and CONSENSUS and results equivalent to those obtained with GLAM <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. For bipartite motif discovery, we compared Bipad with BioProspector <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Bipad is uniquely designed to recognize binding sites on either strand (in all four potential orientations), increasing its sensitivity for detection of reverse direct and inverted half sites <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Unlike BioProspector, Bipad assumes a uniform underlying genomic composition. The objective function is thus reduced to one that minimizes Shannon entropy, which simplifies the mathematical model and accelerates numerical computation. The convergence in each cycle is therefore very rapid; this property supports implementation of Bipad as a distributed computational process that will be especially useful in aligning large datasets. Indeed, Bipad generalizes the bipartite motif discovery problem to allow any range of gap lengths to be specified and permits the two half motifs to be either homogeneous (perfect or imperfect repetitive half sites) or heterogeneous (i.e. different patterns or motif widths of two half sites).</p>
			<p>However, multiple local alignment algorithms such as Bipad are susceptible to producing sub-optimal alignments that result from detection of local minima of the objective function, rather than producing a global optimal alignment. Bipad avoids local minima by running a specified number of cycles, with each cycle initiating with different sets of binding site coordinates. This stochastic cycling strategy has proven to be efficient, but it does not ensure a global optimum <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
			<p>The Bipad web server performs sequence pattern discovery of functionally-related DNA sequences, typically containing binding sites recognized by a cognate protein(s), embedded within a heterogeneous sequence background. The public web interface, which is written in Perl, executes the Bipad program. The Web program displays the bipartite (or single-block) information model as a graphical sequence logo that reveals conserved sequence patterns and their corresponding gap histogram of spacer lengths. A table is also produced that indicates the individual information contents of each of the binding sites and other important model characteristics.</p>
		</sec>
		<sec>
			<st>
				<p>Implementation</p>
			</st>
			<sec>
				<st>
					<p>Web input</p>
				</st>
				<p>The program is run once all required parameters are specified and sequences are either entered directly or uploaded from a file. Results are either sent via email or are generated on-the-fly. A detailed description, a web snapshot and sample datasets are available on-line <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
				<sec>
					<st>
						<p>Registration</p>
					</st>
					<p>The Bipad server can be accessed in either guest or registration mode. Guests are required to enter a valid email address in order to receive results, since individual information theory is patented. Graphical output (sequence logo and gap histogram) is produced only for registered users. A project title is required, which is also used to label the sequence logo of the binding site alignment. Unless explicitly specified, text output is sent by electronic mail.</p>
				</sec>
				<sec>
					<st>
						<p>Search pattern and alignment mode</p>
					</st>
					<p>A search pattern is defined in the most general sense. The search pattern can be either a single-block or bipartite motif which is specified by the lengths of the conserved sequence elements and lengths of the sequences, if any, which separate them. The sequence models may be constrained by specifying that the search identify either one site per input sequence (OOPS) or zero or one sites per sequence (ZOOPS). The respective half-site motifs may be homogeneous (consisting of perfect or imperfect repeats of the same sequence) or heterogeneous (containing different motif widths and/or sequence patterns). The configuration of a bipartite pattern may have four potential half-site orientations: direct repeat (DR), reversed DR (RDR), inverted repeat (IR), and everted repeat (ER).</p>
					<p>The Bipad program can search for all possible orientations on either a single strand (DR) or on both strands (DR, RDR, IR, ER). The alignment mode should be specified based on biological evidence or hypothesis, for example, structural or experimental evidence that a protein contacts sequences on a single or both strands. Searches for the best alignments of sequences on both strands are slower than those involving a single strand.</p>
					<p>The motif width for a single-block search is specified in the first box of the site width field. It is recommended that a variety of motif widths be explored based either on functional binding site laboratory data or empirical methods (see discussion below;<abbrgrp><abbr bid="B9">9</abbr></abbrgrp>). If the optimal bipartite search pattern is requested, the left- and right-half motifs are specified in the first and second fill-in boxes, respectively. The minimum and maximum gap lengths are also respectively entered in adjacent fill-in boxes.</p>
					<p>Typical searches for bipartite patterns require about twice as long as single-block motifs having the same total width. Note that a single-block is equivalent to zero-gap bipartite pattern, in which two motifs are merged together. Lengthy gaps or broad gap length ranges can increase the time required to determine bipartite motifs, but in most instances, program completes within a minute (Figure <figr fid="F1">1</figr>). The default parameters for half-site and gap length widths have been restricted in order to ensure reasonable multiuser server performance. Half-site widths are permitted to range between 4 to 500 nucleotides in length. Gap lengths between half sites can range from 0 to 200 nucleotides.</p>
					<fig id="F1">
						<title>
							<p>Figure 1</p>
						</title>
						<caption>
							<p>Performance of Bipad server for alignment of Scaffold/Matrix attachment regions sequences</p>
						</caption>
						<text>
							<p><b>Performance of Bipad server for alignment of Scaffold/Matrix attachment regions sequences</b>. Performance of Bipad server for alignment of Scaffold/Matrix attachment region (S-MAR) sites. The graph indicates the linear relationship between cycles and time required for convergence on the optimal model (filled squares), and that relationship between cycles and total information content is asymptotic at two cycles (filled triangles).</p>
						</text>
						<graphic file="1471-2105-7-76-1"/>
					</fig>
				</sec>
				<sec>
					<st>
						<p>Number of cycles</p>
					</st>
					<p>In a single Monte Carlo cycle, it is possible for the bipartite local alignment to converge to a local optimum. However, large datasets, searches for subtle motifs characterized by low average information, and bipartite alignment on both strands each require additional cycles to ensure that the preferred alignment will be produced. Increasing the number of cycles increases the confidence that results obtained are the best alignment achievable with this algorithm, at the expense of only a modest drop in server performance (Figure <figr fid="F1">1</figr>). A 500 cycle limit is imposed on model building due to finite server capacity.</p>
					<p>The relationships between cycle number and run time, and between cycle number and total information content are illustrated by the bipartite alignment of chromatin scaffold/matrix attachment region binding sites (Figure <figr fid="F1">1</figr>). Increases in cycle number are correlated linearly with the time needed to find the optimal alignment. In this instance, two cycles were sufficient to determine the best bipartite alignment, and further cycling was unnecessary. This default parameter is set to 10 cycles, however it is recommended that a variety of cycling criteria be tested to ensure that a stable solution is obtained.</p>
				</sec>
				<sec>
					<st>
						<p>DNA sequences</p>
					</st>
					<p>FASTA-formatted DNA sequences are entered in the sequence field text box or files can be uploaded. Only unambiguous lower or upper case nucleotide symbols are permissible {A, C, G, and T}. Each sequence may be up to 5 kb in length and the number of input sequences is limited to a maximum of 2,000.</p>
				</sec>
				<sec>
					<st>
						<p>Refinement</p>
					</st>
					<p>Model refinement is not performed unless this option is specifically requested. The refinement procedure batch executes a series of runs over the specified range of site and gap length widths. The procedure outputs a unit information incremental (UII) plot, which facilitates comparison of information contents gained among a series of potential binding site models. The best model achievable with the Bipad algorithm is the one exhibiting the maximum UII<abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
					<p>The specified range of core motif widths should be guided by experimental binding site evidence, if available. Generally speaking, a range of half-site widths averaging 5-mer length half-site would be a reasonable starting point to perform multiple trials of many DNA-protein interactions. Input site lengths shorter than 4 nucleotides are more likely to generate false positive motifs and generally do not represent biologically meaningful binding sites <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Very high sequence conservation across a putative binding site warrants further exploration of a wider range of binding site motifs in order to mitigate against the possibility that the model may be truncated at either end. False positive motifs with high information contents may be mitigated by eliminating or pre-filtering, recurrent, low-complexity tandem repeated sequences, if possible, from input sequences <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. This is particularly important for sets of eukaryotic intergenic non-coding sequences which frequently contain runs of often imperfect homopolymeric sequences that produce minimum entropies that may not be relevant to protein binding.</p>
				</sec>
				<sec>
					<st>
						<p>Other options</p>
					</st>
					<p>The lengths of sequences flanking a motif, and the output delivery method may be specified. Flanking nucleotides are displayed in lower case and motifs are shown as upper case. If the specified flanking sequences extend beyond the boundaries of the input sequence, a dash is indicated at those positions. By default, flanking nucleotides are not displayed.</p>
					<p>Constraints on the number of aligned DNA sequences, the maximum number of Monte Carlo cycles, maximum half-site widths and gap lengths can be relaxed upon request.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Web output</p>
				</st>
				<p>Results can be optionally displayed either on-line or sent to the user by electronic mail. If results are output via electronic mail (default), the <it>bipad-mailer.pl </it>program sends the bipad output text file to the destination specified in the username field. This file contains: (1) the search parameters and minimum entropy after search; (2) information weight matrix or matrices and a separate frequency matrix formatted for bipartite logo plotter (see below); (3) gap length distribution; (4) a list of nucleotide motifs, their sequence coordinates and information contents for each potential binding site; and (5) parameters used to generate sequence logo with <it>bipad_logo.pl</it>. The on-line display dynamically produces a sequence logo (single- or two-block <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, generated with the program <it>seqlogo.pl </it>from the WebLogo site; <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>) drawn in PDF, PNG and EPS formats. A bipartite logo is produced by inserting a central gap between two half-site motif logos at the zero coordinate. If the central gap length exceeds maximum permissible gap (currently 10-nucleotides due to limitations on logo image size), a 10-nucleotide gap will be displayed. In addition, we provide an auxiliary sequence logo plotting function to display motif using pre-aligned or user-defined matrices (see below).</p>
				<p>Bipartite sequence models <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> dynamically generate a histogram in both graphic and text format that corresponds to the lengths of gaps between half-sites. In addition, a motif table is generated as part of the bipad text output which displays the names of the sequences taken from the FASTA input and the corresponding half site and total individual information contents for each site.</p>
				<sec>
					<st>
						<p>Auxiliary function: Bipartite sequence logo plotter</p>
					</st>
					<p>Given either a single frequency matrix or two half site matrices (termed "first" and "second"), the plotter will draw the corresponding sequence logos. The plotter is capable of performing several operations on the original matrices including transforming the "first" matrix by reverse complementation, transforming the "second" matrix through the same operation, transformation of "both" or "none" of the matrices. Only untransformed ("none") and "first" matrix transformation operations can be carried out on single-block matrices. The central gap length of the sequence logo may be specified for bipartite matrices, however the default size is 4 bp. Other options, ie. logo name and size, can also be defined by the user. Detailed input instructions and a working example can be found at <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The bipartite output file also includes a separate frequency matrix specifically formatted for use with the bipartite logo plotter.</p>
				</sec>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Case studies</p>
				</st>
				<p>To illustrate results produced by the server, we analyzed single-block and bipartite sequences recognized by several DNA-binding proteins (Figure <figr fid="F2">2</figr>). Hormone responsive elements (HRE's) may be recognized by nuclear hormone receptors which bind as monomers (FTZ-F1&#945;), homodimers (HNF4&#945;) and heterodimers (CAR/RXR&#945;). The server was also used to compute models of chromatin matrix attachment regions (S/MAR) which are composed of heterogeneous bipartite binding elements. Bipartite models for the same datasets generated with either the OOPS or the ZOOPS parameter produced identical alignments. The unaligned sequences used in the preparation of these models are available on the Bipad website.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>A gallery of sequence logos</p>
					</caption>
					<text>
						<p><b>A gallery of sequence logos</b>. For bipartite logos, the companion gap histogram is shown on the right. (A) FTZ-F1&#945; monomer binding site; (B) CAR/RXR&#945; PBREM sites; the right-half motif starts at position 4 and positions 0&#8211;3 correspond to the central gap between the half-sites; (C) HNF4&#945; homodimer binding sites; the right-half motif starts at position 1, with the variable length gap placed at position 0; (D) MRS bipartite binding sites; the second half-site motif begins at position 1; the variable length gap denoted by the distribution corresponds to position 0 of the logo. Corresponding Bipad text file output for these models can be viewed at [1].</p>
					</text>
					<graphic file="1471-2105-7-76-2"/>
				</fig>
				<p>FTZ-F1&#945; (Figure <figr fid="F2">2A</figr>) is an orphan nuclear receptor known to bind as a monomer to HREs containing the consensus sequence, TCAAGGTCA. Expression of FTZ-F1&#945; occurs in precursors of adrenal steroidogenic tissue and gonadal steroid-producing cells <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Experimentally verified monomeric binding sequences <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> were aligned with Bipad. The motif width was set as 8 bps on forward strand and one cycle was run to find the motif, as this is a small dataset in which motif length is equal to the sequence length. The average information content is 8.78 bits per site. The single-block sequence logo is indicated in Figure <figr fid="F2">2A</figr>.</p>
				<p>Figure <figr fid="F2">2B</figr> indicates a bipartite model based on recognition sites bound by the nuclear receptor constitutive androstane receptor (CAR), which forms a heterodimeric complex with the retinoid X receptor (RXR&#945;) that binds to phenobarbital-responsive elements (PBREM) of target genes to regulate their expression <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. CAR/RXR&#945; recognizes a degenerate PBREM sequence consisting of a bipartite pattern of two half-sites with separated by flexible nucleotide spacer. Our bipartite algorithm is well suited for modeling CAR/RXR&#945; sites, as the heterodimer has been shown to recognize DR, RDR, IR and ER patterns <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The sequences of 30 human CAR/RXR&#945; binding sites were extracted from the <it>CYP3A4, CYP3A7, CYP2C9, CYP2C19, CYP2B6, UGT1A1, MRP2 </it>and <it>iNOS </it>genes and aligned. Alignment of half sites on both strands was permitted, consistent with published binding studies indicating that all possible orientations should be considered. The half-site and gap range lengths were set to 6&lt;[0, 8]&gt;6 (see below for a refinement procedure). A single cycle was needed to find the best alignment. The model has an average information content per bipartite site is 13.87 bits and the degenerate patterns discovered are consistent with the experimentally verified sites (RKKTCA&lt;0&#8211;8&gt;RKKTCA) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Analysis of the same set of binding sites with BioProspector <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> produced a similar alignment; however the logo contained only 12.9 bits because conserved half sites present on both strands were not detected.</p>
				<p>HNF4&#945; (Figure <figr fid="F2">2C</figr>) binds as a homodimer to DR HREs separated by one or two nucleotides (DR1, DR2). HNF4&#945; was initially identified as a transcription factor required for liver-specific gene expression, and later was shown to be expressed high level in liver, kidney, intestine, and pancreas and at low levels in the testis <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. The average information model is based on 63 validated binding sequences and flanking sequences that have been collated from multiple genes and species <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>; (F. Sladek, personal communication). Due to limited available flanking sequence and experimental observation, the bipartite search pattern was constrained to 6&lt;[1, 2]&gt;6 and the optimal bipartite alignment was found in a single Monte Carlo cycle. Assuming that all orientations could be bound, nearly all of the sites identified by Bipad were DR. The average information content per bipartite site is 11.23 bits. The discovered patterns are consistent with the experimentally verified sites <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and those produced by BioProspector.</p>
				<p>In Figure <figr fid="F2">2D</figr>, genomic elements of Scaffold/Matrix attachment regions (S/MAR), which delineate structural and functional organization in eukaryotic genomes<abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, are modelled. A bipartite sequence element associated with S/MARs has been reported based on sequences of 23 bipartite elements from different species (chicken, Chinese hamster, <it>Drosophila</it>, rabbit, yeast, SV40, human and mouse). The resultant model is expected to detect a highly conserved subset of potential S/MAR elements that are common to these and related species. The sequences containing potential sites were embedded on the same strand in an average human genomic background (G and C content are 21%, and A and T content are 29%, respectively) to form sequences 250 nucleotides in length, allowing for a large range of potential inter-half site distances. Five overlapping sites separated by 1-bp gaps were embedded, since Bipad is not configured to handle overlapping binding sites. The proposed matrix attachment region recognition signature (MRS) is represented by a pair of degenerate asymmetric half-sites (often containing the sequences: AATAAYAA and AWWRTAANNWWGNNNC), separated by a nucleotide spacer of up to 200 bp in length. The bipartite search pattern was set to 16&lt;[0, 200]&gt;8 on the forward strand and 2 cycles was sufficient to locate all the embedded MRS sites, except for two left half-sites. In the first case, the half-site is shifted 6 nucleotides away from its original location and has higher information content (13.615 bits) than the embedded sequence (9.819 bits), whereas the second left half-site is 81 bps downstream of the implanted motif and its information content is very similar to that of the original site (11.5 bits). For this reason, the aligned sequence motif has, on average, slightly more information (12.2 bits) than the experimentally determined sequence (12.0 bits) for left-half site. The total average information content per MRS site is 25.8 bits, with the right-half site being more conserved (13.6 bits) than the left-half (12.2 bits). Thus, the right-half is somewhat more highly conserved than the left-half site (Figure <figr fid="F2">2D</figr>). The sequence logo reveals that the model to be more heterogeneous than the published MRS consensus sequence and the half site sequence patterns are somewhat different.</p>
				<sec>
					<st>
						<p>Refinement</p>
					</st>
					<p>Figure <figr fid="F3">3</figr> shows the progressive refinement of the bipartite CAR/RXR&#945; binding motif models. To enable this program function, the 'refine' option is selected and the initial or basis search pattern is defined. The program generates and evaluated models with left and right motifs of increasing lengths. Treating 5&lt;[0,8]&gt;5 as the basis motif, we calculate the UII value for each motif. The UII plot orders these models along the X-axis, with model 1 corresponding to the motif pattern: 5&lt;[0,8]&gt;5, Model 2 to 5&lt;[0,8]&gt;6, through Model 30, which has the pattern, 10&lt;[0,8]&gt;9. The 6&lt;[0,8]&gt;6 motif (Figure <figr fid="F3">3</figr>) displays the highest information increment (see bipartite logo in Figure <figr fid="F2">2B</figr>), ie. the highest level of information density (bits per unit length) over all of the motifs analyzed, making it arguably the optimal binding site model.</p>
					<fig id="F3">
						<title>
							<p>Figure 3</p>
						</title>
						<caption>
							<p>Refinement of CAR/RXR bipartite binding motif models</p>
						</caption>
						<text>
							<p><b>Refinement of CAR/RXR bipartite binding motif models</b>. The x-axis represents an index of binding sites models of increasing site widths beginning with the initial input parameters defining site width and gap range were 5&lt;[0,8]&gt;5. For example, Model 1 corresponds to the motif pattern: 5&lt;[0,8]&gt;5, Model 2 is 5&lt;[0,8]&gt;6; where the final model, number 30, corresponds to the pattern 10&lt;[0,8]&gt;9. The unit incremental information (UII) value is computed for each motif and displayed on the Y-axis. The maximum UII usually has the highest information density and is indicative of the optimal model.</p>
						</text>
						<graphic file="1471-2105-7-76-3"/>
					</fig>
				</sec>
				<sec>
					<st>
						<p>Performance vs. sequence length</p>
					</st>
					<p>To examine the performance of Bipad for detection of true binding sites in sequences of varying lengths, we embedded the MRS binding sites in background sequences having either a uniform equiprobable nucleotide distribution or an average human genomic composition (described above). We embedded exactly one MRS site in each background sequence (23 sequences in total) and varied the lengths of each of the background sequences from 250 bp to 2000 bp (repeated three times for each such simulation). The average performance (based on detection of the embedded sequence) in each group is shown in Figure <figr fid="F4">4</figr>. For sequences less than 1 kb in length, binding sites were detected with accuracy of over 80% regardless of background composition; however, as the sequence length increases, the performance decreases monotonically. It is interesting to note, however, that MRS sites embedded in a background having a composition similar to that of the human genome were more easily detected in longer sequences (e.g. 2 kb) compared to sites embedded in uniformly-distributed background.</p>
					<fig id="F4">
						<title>
							<p>Figure 4</p>
						</title>
						<caption>
							<p>Bipad performance for various input sequence lengths</p>
						</caption>
						<text>
							<p><b>Bipad performance for various input sequence lengths</b>. The graph shows the performance of Bipad (Y-axis) for recognition of S-MAR binding sites embedded in background sequences of varying lengths. Each S-MAR site was embedded in a background either with a uniform composition [black line], or having the average human genomic composition [blue line]. The background sequence was varied from 250 to 2000 bp in length (X-axis). The performance calculation is given in Reference [2]; each data point has been averaged over three replicates.</p>
						</text>
						<graphic file="1471-2105-7-76-4"/>
					</fig>
				</sec>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>The Web service presented here can be used for either <it>a priori </it>detection or <it>ab initio </it>discovery of single-block or bipartite binding sites. The examples provided demonstrate that Bipad can be broadly applied to many different types of motifs, regardless of their level of sequence conservation <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. We evaluated server performance by constructing models of published binding sites for several transcription factors and chromatin binding proteins. The motif sites found by the Bipad server are consistent with sequences that have been experimentally identified as binding sites. However, a domain-specific understanding of the protein-nucleic acid interactions for particular protein is essential in selecting realistic parameters (site lengths and orientations) that take advantage of Bipad's capabilities. Site information contents predicted by Bipad are related to their corresponding binding affinity and can be experimentally validated. By interactively exploring various pattern lengths and orientations, the web server efficiently provides reasonable computational models for experimentally-validated binding site data.</p>
			<p>The Bipad algorithm assumes zero or one bipartite site to be present in each training sequence. Bipad does not utilize multiple degenerate TFBS recognized by the same factor in a single sequence; to include all experimentally validated sites in the same promoter in a bipartite model, intervals containing individual TFBS should be separated into different input sequences.</p>
			<p>The software was originally designed and implemented for localizing nuclear receptor binding sites that are often bipartite patterns, some containing half-sites in all possible orientations. However, the program can be used to efficiently identify single-block motifs as well <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, as this is a special case of bipartite motif definition.</p>
			<p>We plan to extend Bipad for large-scale genomic sequence analysis, however this task will be challenging. Although many tools for discovery of TFBS elements have been developed, a comprehensive solution that accurately defines binding sites in genomic sequences has been elusive for a variety of reasons <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The known limitations in computational methods cannot be overcome until several significant laboratory-derived problems are addressed. Collections of binding sites recognized by the same protein are known to exhibit pervasive systematic bias <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. As we have shown, inadequate localization of binding sites in sequence data from chromatin immunoprecipitation assays can compromise accurate detection of subtle binding site signals. Finally, false positive binding sites can be introduced through microarray- derived artifacts in ChIP-chip hybridizations<abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. A complete and accurate biological understanding of DNA-protein interactions is a prerequisite to the accurate identification of binding sites in long genomic sequences.</p>
		</sec>
		<sec>
			<st>
				<p>List of abbreviations</p>
			</st>
			<p>Bipad &#8211; Bipartite pattern discovery</p>
		</sec>
		<sec>
			<st>
				<p>Availability and requirements</p>
			</st>
			<p>&#8226; <b>Project name: </b>Modeling Bipartite cis-elements, Bipad</p>
			<p>&#8226; <b>Project home page: </b><url>http://bipad.cmh.edu</url></p>
			<p>&#8226; <b>Operating system(s): </b>Platform independent</p>
			<p>&#8226; <b>Programming language: </b>C++ and Perl</p>
			<p>&#8226; <b>Other requirements: </b>None</p>
			<p>&#8226; <b>License: </b>see:<url>http://www.lecb.ncifcrf.gov/~toms/contacts.html</url></p>
			<p>&#8226; <b>Use restrictions for non-academics: </b>see <url>http://www.lecb.ncifcrf.gov/~toms/contacts.html</url></p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>CB and PKR conceived of the project, and CB designed and implemented the algorithms. Both authors wrote the manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>Support from the Katharine B. Richardson Trust (Grant #4185) and the National Institute of Environmental Health Sciences (ES10855-02) is gratefully acknowledged. We thank Dr. Sladek providing updated HNF4&#945; binding site sequence data. Funding to pay the Open Access publication charges for this article was provided by the Katharine B. Richardson Trust.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Bipad</p>
				</title>
				<aug>
					<au>
						<snm>Bi</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
				</aug>
				<pubdate>2004</pubdate>
				<url>http://bipad.cmh.edu</url>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Bipartite pattern discovery by entropy minimization-based multiple local alignment</p>
				</title>
				<aug>
					<au>
						<snm>Bi</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Research</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<fpage>4979</fpage>
				<lpage>4991</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">521645</pubid>
						<pubid idtype="pmpid" link="fulltext">15388800</pubid>
						<pubid idtype="doi">10.1093/nar/gkh825</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>DNA recognition by nuclear receptors</p>
				</title>
				<aug>
					<au>
						<snm>Claessens</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Gerwith</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Essays in Biochemistry</source>
				<pubdate>2004</pubdate>
				<volume>40</volume>
				<fpage>59</fpage>
				<lpage>72</lpage>
				<xrefbib>
					<pubid idtype="pmpid">15242339</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Nuclear hormone receptors and gene expression</p>
				</title>
				<aug>
					<au>
						<snm>Aranda</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Pasucal</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Physiological Reviews</source>
				<pubdate>2001</pubdate>
				<volume>81</volume>
				<fpage>1269</fpage>
				<lpage>1304</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11427696</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Discovering regulatory elements in non-coding sequences by analysis of spaced dyads</p>
				</title>
				<aug>
					<au>
						<snm>van Helden</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Rios</snm>
						<fnm>AF</fnm>
					</au>
					<au>
						<snm>Collado-Vides</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Research</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>1808</fpage>
				<lpage>1818</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102821</pubid>
						<pubid idtype="pmpid" link="fulltext">10734201</pubid>
						<pubid idtype="doi">10.1093/nar/28.8.1808</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Alogorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification</p>
				</title>
				<aug>
					<au>
						<snm>Marsan</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Sagot</snm>
						<fnm>MF</fnm>
					</au>
				</aug>
				<source>Journal of Computational Biology</source>
				<pubdate>2000</pubdate>
				<volume>7</volume>
				<fpage>345</fpage>
				<lpage>365</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1089/106652700750050826</pubid>
						<pubid idtype="pmpid" link="fulltext">11108467</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Occurrence probability of structured motifs in random sequences</p>
				</title>
				<aug>
					<au>
						<snm>Robin</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Daudin</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Richard</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Sagot</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Schbath</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Journal of Computational Biology</source>
				<pubdate>2002</pubdate>
				<volume>9</volume>
				<fpage>761</fpage>
				<lpage>773</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1089/10665270260518254</pubid>
						<pubid idtype="pmpid" link="fulltext">12614545</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes</p>
				</title>
				<aug>
					<au>
						<snm>Liu</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Brutlag</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>JS</fnm>
					</au>
				</aug>
				<pubdate>2001</pubdate>
				<volume>6</volume>
				<fpage>127</fpage>
				<lpage>138</lpage>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Determining thresholds for binding site sequence models using information theory: Proceedings of the 8th Joint Conference on Information Sciences/6th International Symposium on Computational Biology and Genome Informatics;23-26 July; Salt Lake City, UT					</p>
				</title>
				<aug>
					<au>
						<snm>Bi</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
				</aug>
				<publisher/>
				<pubdate>2005</pubdate>
				<fpage>1286</fpage>
				<lpage>1290</lpage>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Information content of individual genetic sequences</p>
				</title>
				<aug>
					<au>
						<snm>Schneider</snm>
						<fnm>TD</fnm>
					</au>
				</aug>
				<source>Journal of Theoretical Biology</source>
				<pubdate>1997</pubdate>
				<volume>189</volume>
				<fpage>427</fpage>
				<lpage>441</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jtbi.1997.0540</pubid>
						<pubid idtype="pmpid" link="fulltext">9446751</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Information theory as a model of genomic sequences</p>
				</title>
				<aug>
					<au>
						<snm>Bi</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
				</aug>
				<source>Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics</source>
				<publisher>London , John Wiley &amp; Sons</publisher>
				<editor>Subramaniam S</editor>
				<pubdate>2005</pubdate>
				<fpage>DOI:10.1002/047001153X.g402204</fpage>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Development and refinement of pregnane X receptor DNA binding site model using information theory: Insights into PXR mediated gene regulation</p>
				</title>
				<aug>
					<au>
						<snm>Vyhlidal</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
					<au>
						<snm>Leeder</snm>
						<fnm>JS</fnm>
					</au>
				</aug>
				<source>Journal of Biological Chemistry</source>
				<pubdate>2004</pubdate>
				<volume>279</volume>
				<fpage>46779</fpage>
				<lpage>46786</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.M408395200</pubid>
						<pubid idtype="pmpid" link="fulltext">15316010</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>A comprehensive Library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome</p>
				</title>
				<aug>
					<au>
						<snm>Robinson</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>McGuire</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>GM</fnm>
					</au>
				</aug>
				<source>Journal of Molecular Biology</source>
				<pubdate>1998</pubdate>
				<volume>284</volume>
				<fpage>241</fpage>
				<lpage>254</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1998.2160</pubid>
						<pubid idtype="pmpid" link="fulltext">9813115</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Sequence logos: a new way to display consensus sequences</p>
				</title>
				<aug>
					<au>
						<snm>Schneider</snm>
						<fnm>TD</fnm>
					</au>
					<au>
						<snm>Stephens</snm>
						<fnm>RM</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Research</source>
				<pubdate>1990</pubdate>
				<volume>18</volume>
				<fpage>6097</fpage>
				<lpage>6100</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">332411</pubid>
						<pubid idtype="pmpid">2172928</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Weblogo: a sequence logo generator</p>
				</title>
				<aug>
					<au>
						<snm>Crooks</snm>
						<fnm>GE</fnm>
					</au>
					<au>
						<snm>Hon</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Chandonia</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>SE</fnm>
					</au>
				</aug>
				<source>Genome Research</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>1188</fpage>
				<lpage>1190</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">419797</pubid>
						<pubid idtype="pmpid" link="fulltext">15173120</pubid>
						<pubid idtype="doi">10.1101/gr.849004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Bilogo Plotter</p>
				</title>
				<aug>
					<au>
						<snm>Bi</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
				</aug>
				<pubdate>2005</pubdate>
				<url>http://bipad.cmh.edu/bilogo.html</url>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Orphan nuclear receptors: From gene to function</p>
				</title>
				<aug>
					<au>
						<snm>Giguere</snm>
						<fnm>V</fnm>
					</au>
				</aug>
				<source>Endocrine Reviews</source>
				<pubdate>1999</pubdate>
				<volume>20</volume>
				<fpage>689</fpage>
				<lpage>725</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1210/er.20.5.689</pubid>
						<pubid idtype="pmpid" link="fulltext">10529899</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>The orphan receptors NGFI-B and steroidogenic factor 1 establish monomer binding as a third paradigm of nuclear receptor-DNA interaction</p>
				</title>
				<aug>
					<au>
						<snm>Wilson</snm>
						<fnm>TE</fnm>
					</au>
					<au>
						<snm>Fahrner</snm>
						<fnm>TJ</fnm>
					</au>
					<au>
						<snm>Milbrandt</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Molecular and Cellular Biology</source>
				<pubdate>1993</pubdate>
				<volume>13</volume>
				<fpage>5794</fpage>
				<lpage>5804</lpage>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Induction of drug metabolism: The role of nuclear receptors</p>
				</title>
				<aug>
					<au>
						<snm>Handschin</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Meyer</snm>
						<fnm>UA</fnm>
					</au>
				</aug>
				<source>Pharmacology Review</source>
				<pubdate>2003</pubdate>
				<volume>55</volume>
				<fpage>649</fpage>
				<lpage>673</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1124/pr.55.4.2</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Hepatocyte nuclear factor 4a</p>
				</title>
				<aug>
					<au>
						<snm>Sladek</snm>
						<fnm>FM</fnm>
					</au>
					<au>
						<snm>Seidel</snm>
						<fnm>SD</fnm>
					</au>
				</aug>
				<source>Nuclear Receptors and Genetic Disease</source>
				<publisher> Academic Press</publisher>
				<pubdate>2001</pubdate>
			</bibl>
			<bibl id="B21">
				<title>
					<p>A bipartite sequence element associated with matrix/scaffold attachment regions</p>
				</title>
				<aug>
					<au>
						<snm>van Drunen</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Sewalt</snm>
						<fnm>RG</fnm>
					</au>
					<au>
						<snm>Oosterling</snm>
						<fnm>RW</fnm>
					</au>
					<au>
						<snm>Weisbeek</snm>
						<fnm>PJ</fnm>
					</au>
					<au>
						<snm>Smeekens</snm>
						<fnm>SC</fnm>
					</au>
					<au>
						<snm>Van Driel</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Research</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>2924</fpage>
				<lpage>2930</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148508</pubid>
						<pubid idtype="pmpid" link="fulltext">10390535</pubid>
						<pubid idtype="doi">10.1093/nar/27.14.2924</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Correlation between Scaffold/Matrix Attachment Region (S/MAR) binding activity and DNA duplex destabilization energy</p>
				</title>
				<aug>
					<au>
						<snm>Bode</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Winkelman</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gotze</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Spiker</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Tsutsui</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bi</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Ak</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Benham</snm>
						<fnm>CJ</fnm>
					</au>
				</aug>
				<source>Journal of Molecular Biology</source>
				<note>doi:10.1016/j.jmb.2005.11.073</note>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Assessing computational tools for the discovery of transcription factor binding sites.</p>
				</title>
				<aug>
					<au>
						<snm>Tompa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Bailey</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Church</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>De Moor</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Eskin</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Favorov</snm>
						<fnm>AV</fnm>
					</au>
					<au>
						<snm>Frith</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Fu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kent</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Makeev</snm>
						<fnm>VJ</fnm>
					</au>
					<au>
						<snm>Mironov</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Noble</snm>
						<fnm>WS</fnm>
					</au>
					<au>
						<snm>Pavesi</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Pesole</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Regnier</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Simonis</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Sinha</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Thijs</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Van Helden</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Vanderbogaert</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Weng</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Workman</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ye</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Zhu</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Nature Biotechnology</source>
				<pubdate>2005</pubdate>
				<volume>23</volume>
				<fpage>137</fpage>
				<lpage>144</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt1053</pubid>
						<pubid idtype="pmpid" link="fulltext">15637633</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Distortion of quantitative genomic and expression hybridization by Cot-1 DNA: mitigation of this effect</p>
				</title>
				<aug>
					<au>
						<snm>Newkirk</snm>
						<fnm>HL</fnm>
					</au>
					<au>
						<snm>Knoll</snm>
						<fnm>JHM</fnm>
					</au>
					<au>
						<snm>Rogan</snm>
						<fnm>PK</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Research</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<issue>22</issue>
				<fpage>e191</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1316118</pubid>
						<pubid idtype="pmpid" link="fulltext">16356923</pubid>
						<pubid idtype="doi">10.1093/nar/gni190</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
