<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2148-7-S1-S3</ui>
	<ji>1471-2148</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Inferring angiosperm phylogeny from EST data with widespread gene duplication</p>
			</title>
			<aug>
				<au id="A1" ca="yes" ce="yes">
					<snm>Sanderson</snm>
					<mi>J</mi>
					<fnm>Michael</fnm>
					<insr iid="I1"/>
					<email>sanderm@email.arizona.edu</email>
				</au>
				<au id="A2" ce="yes">
					<snm>McMahon</snm>
					<mi>M</mi>
					<fnm>Michelle</fnm>
					<insr iid="I2"/>
					<email>mcmahonm@email.arizona.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA</p>
				</ins>
				<ins id="I2">
					<p>Department of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA</p>
				</ins>
			</insg>
			<source>BMC Evolutionary Biology</source>
			<supplement>
				<title>
					<p>First International Conference on Phylogenomics</p>
				</title>
				<editor>Herv&#233; Philippe, Mathieu Blanchette</editor>
				<note>Proceedings</note>
			</supplement>
			<conference>
				<title>
					<p>First International Conference on Phylogenomics</p>
				</title>
				<location>Sainte-Ad&#232;le, Qu&#233;bec, Canada</location>
				<date-range>15&#8211;19 March, 2006</date-range>
				<url>http://www.bioinfo.umontreal.ca/evenements/phylogenomics.html</url>
			</conference>
			<issn>1471-2148</issn>
			<pubdate>2007</pubdate>
			<volume>7</volume>
			<issue>Suppl 1</issue>
			<fpage>S3</fpage>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17288576</pubid><pubid idtype="doi">10.1186/1471-2148-7-S1-S3</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>8</day>
					<month>2</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>Sanderson and McMahon; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Most studies inferring species phylogenies use sequences from single copy genes or sets of orthologs culled from gene families. For taxa such as plants, with very high levels of gene duplication in their nuclear genomes, this has limited the exploitation of nuclear sequences for phylogenetic studies, such as those available in large EST libraries. One rarely used method of inference, gene tree parsimony, can infer species trees from gene families undergoing duplication and loss, but its performance has not been evaluated at a phylogenomic scale for EST data in plants.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>A gene tree parsimony analysis based on EST data was undertaken for six angiosperm model species and <it>Pinus</it>, an outgroup. Although a large fraction of the tentative consensus sequences obtained from the TIGR database of ESTs was assembled into homologous clusters too small to be phylogenetically informative, some 557 clusters contained promising levels of information. Based on maximum likelihood estimates of the gene trees obtained from these clusters, gene tree parsimony correctly inferred the accepted species tree with strong statistical support. A slight variant of this species tree was obtained when maximum parsimony was used to infer the individual gene trees instead.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>Despite the complexity of the EST data and the relatively small fraction eventually used in inferring a species tree, the gene tree parsimony method performed well in the face of very high apparent rates of duplication.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Since the advent of efficient nucleotide sequencing technology in the 1980's, sampling of plant genomes to build species phylogenies has emphasized organellar markers, especially in the chloroplast genome, and a few nuclear loci such as ribosomal RNA genes. Though not universal (see e.g., <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>), phylogeneticists' avoidance of the nuclear genome of plants is in no small part due to its relative complexity &#8211; mainly the frequent occurrence of paralogous copies of genes derived from gene duplications <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Not only is polyploidy widespread in plants, but recent evidence derived from whole genome sequencing projects suggests a cryptic history of whole genome duplication and diploidization not predicted by cytogenetic evidence, including for example the prospect that <it>Arabidopsis </it>has undergone three complete genome doublings since the origin of seed plants, legumes two, and cereals two or more <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. This contributes to already complex dynamics of gene family expansion and contraction driven by functional divergence <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. In <it>Arabidopsis</it>, 65% of genes are members of gene families <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, and because of silencing of alternative paralogs in other taxa, in addition to sporadic background rates of gene duplication, phylogenetic studies will undoubtedly sample even more duplications as they increase in taxonomic scope.</p>
			<p>Phylogenetic methods are relatively poorly adapted to inferring species trees from gene trees that contain duplications, despite steady work since Goodman et al.'s <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> pioneering paper <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Complicating matters further, homologous recombination in gene families (e.g. gene conversion) can add reticulate patterns to gene family histories. Most efforts to use nuclear markers have therefore focused on finding true single copy loci or on extracting subsets of orthologs from gene families <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. However, the problem of detecting and extracting orthologs is itself quite challenging: a diversity of techniques have been proposed, ranging from reciprocal best BLAST searches to more phylogenetically driven approaches <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>.</p>
			<p>Thus plant biologists are now in the curious position of having increasingly rich and deep phylogenomic data sets but lack a full spectrum of tools to build species phylogenies from them. In addition to whole genome projects, large EST libraries have been assembled for dozens of crops and model plants. At the moment these data provide the most taxonomically broad source of potential phylogenomic data in plants, but they are characterized by numerous gene families and to date only orthologous subsets have been exploited to build species phylogenies <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The data themselves also present numerous challenges because of the laboratory methods by which they are extracted <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, the complex informatics procedures by which they are assembled <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B30">30</abbr></abbrgrp>, and the diversity of molecular variation at the level of expression that they reflect (e.g. alternative splice variants; <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>). This paper examines both the phylogenetic informativeness of the EST data and current methodologies for building species phylogenies from duplication-rich gene families to address the potential utility of such data for constructing the phylogeny of plants.</p>
			<p>That the signature of species phylogeny can be found in complex gene trees displaying a mosaic of orthologous and paralogous relationships has been recognized for decades <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The first piece of possible strategy to infer such relationships was provided by Goodman et al. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, who developed an algorithm for fitting a given species tree and gene tree together to determine the minimum number of duplications necessary to explain the data. This problem came to be known as "tree reconciliation", and several algorithms were developed to solve it efficiently <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B14">14</abbr><abbr bid="B32">32</abbr></abbrgrp>. Figure <figr fid="F1">1</figr> illustrates some of the complexities involved. For example, a simple re-rooting of the gene tree can have dramatic effects on inferences about the history of gene duplication. The second element of the strategy is a search among candidate species trees, determining the minimum duplication score for each species tree relative to one or more gene trees that are assumed to be known <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>. This is an optimization problem entirely analogous to maximum parsimony or likelihood, but in which the optimality criterion is the summed duplication score (or perhaps the summed duplication plus loss scores) across all gene trees for a given species tree. The rationale for this "gene tree parsimony" (GTP) approach is that we should seek the species tree that imposes the fewest assumptions of unnecessary duplications in the collection of gene trees available. Though rarely used <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>, Cotton and Page <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> showed in an extensive analysis of vertebrate gene families that it was possible to reconstruct a very credible species tree of vertebrates using this approach. One reason it has not been explored much in real data may be the lack of available software tools to implement the tree search part of GTP. Though several tools are available to do tree reconciliation <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>, Page's program GeneTree <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, is the only widely available software to implement tree search heuristics, but these are relatively simple, having only tree rearrangement heuristics and no sequential addition steps.</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Tree reconciliation example</p>
				</caption>
				<text>
					<p><b>Tree reconciliation example</b>. Two alternative rootings of the same unrooted gene tree (thin black lines) imbedded in a species tree (thick grey lines) visualized with the tool PrIMETV [76]. The gene tree is the maximum likelihood tree for a data set with 12 tentative consensus (TC) sequences assembled from ESTs from seven taxa (our cluster 13024). Bars indicate duplications within species (in-duplications) and black circles indicate out-duplications (those followed by a speciation event). A. The gene tree rooted to minimize the number of duplications required to reconcile the trees (two out-duplications required). B. The gene tree rooted using midpoint rooting, which places the root along the branch to the <it>Arabidopsis </it>sequences. This rooting is less optimal, requiring five out-duplications.</p>
				</text>
				<graphic file="1471-2148-7-S1-S3-1"/>
			</fig>
			<p>One way to assess the utility of a phylogenetic method is to compare its output to a "known phylogeny". In this paper we examine the efficacy of GTP for reconstructing species relationships across angiosperms using an "accepted" angiosperm tree for six taxa, together with pine as an outgroup (Fig. <figr fid="F2">2</figr>). Limiting the problem to this size accomplishes two things: first, it permits exhaustive searches of the species tree space, avoiding the problem of developing efficient heuristics for searching tree space; second, it provides an immediate test of the quality of the results. The six angiosperms chosen span deep and relatively shallow phylogenetic relationships, ranging from the monocot-eudicot split (~120 Ma) to splits within one clade of eudicots, the legumes, which is a relatively recent radiation (~60 Ma). Phylogenetic relationships of these six angiosperms are strongly supported by numerous studies from multiple single copy (or effectively single copy in the case of 18S rDNA) loci <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>, and in some cases from nuclear gene family data in which ortholog groups have been extracted <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. In the case of legumes, the number of loci is fewer but both the monophyly of legumes and the indicated three-taxon statement within legumes are supported by multiple loci <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>.</p>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>Accepted species tree for seven plant model species</p>
				</caption>
				<text>
					<p><b>Accepted species tree for seven plant model species</b>. Names of clades are indicated at internal nodes. See text for discussion of strength of evidence for this phylogeny.</p>
				</text>
				<graphic file="1471-2148-7-S1-S3-2"/>
			</fig>
			<p>A final ingredient in any assessment of utility of methods and data is a statistical estimate of reliability of results. Agreement or disagreement of results with the accepted phylogeny takes on added meaning if the estimated tree is strongly supported. Little work has addressed confidence limits in gene tree parsimony studies. However, bootstrap procedures may provide some useful indication of strength of evidence <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. In addition to the "usual" error expected in phylogenetics &#8211; incorrect gene trees owing to noise in the sequence data or bias in the inference &#8211; there is an additional important source of error stemming from incorrect rooting of the gene tree. Gene tree parsimony methods require that both the species and gene tree be rooted. Whereas rooting is generally accomplished in species-level phylogenetics by outgroup analysis (often after an unrooted analysis is completed), this is usually more problematic in gene families, because of the difficulty of identifying the correct ortholog for the entire ingroup. As suggested by several authors <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B32">32</abbr></abbrgrp> one way to sidestep this source of error is to implement GTP across all rootings of each gene tree; in other words, to calculate the GTP score by finding the rooting that minimizes it for each gene tree. This conservative approach is adopted here.</p>
			<p>Some terminology associated with gene family data warrants definition. We refer to gene duplication events as <it>in-duplications </it>(i.e. producing <it>inparalogs</it>, <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>) or <it>out-duplications </it>(producing <it>outparalogs</it>, Fig. <figr fid="F1">1</figr>). In-duplications result in descendants within a single species and are therefore inferred to have occurred since the most recent common ancestor of the species and its sister group. This can include within-species duplications (or species-specific alleles), or duplications that appear to be within-species because of incomplete species sampling. Because the descendants of an in-duplication remain in a single species, they cannot prefer one species tree to another. Out-duplications, in contrast, occur earlier than the most recent speciation event and produce descendants in two (or more) species. An out-duplication therefore can (indeed, must) disagree with the species tree and can contribute to the preference of one species tree over another.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Sequence data and gene trees</p>
				</st>
				<p>The TIGR Gene Indices Database provided 172,900 Tentative Consensus (TCs) sequences for the seven focal taxa (Table <tblr tid="T1">1</tblr>). After discarding sequences for which there was no open reading frame (ORF) at least 500 nucleotides (nt) in length, 105,453 TCs remained. These were trimmed to their longest ORF, producing sequences with average length of 1094 nt (336 nt shorter, on average, than the original TCs).</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Sequence and cluster data for each taxon</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="left">
								<p>Taxon<sup>a</sup></p>
							</c>
							<c ca="center">
								<p>Release<sup>b</sup></p>
							</c>
							<c ca="center">
								<p>Original TCs<sup>c</sup></p>
							</c>
							<c ca="center">
								<p>MaxORFs<sup>d</sup></p>
							</c>
							<c ca="center">
								<p>Clusters<sup>e</sup></p>
							</c>
							<c ca="center">
								<p>Final TCs<sup>f</sup></p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Arabidopsis thaliana</it>
								</p>
							</c>
							<c ca="center">
								<p>12.1</p>
							</c>
							<c ca="center">
								<p>28900</p>
							</c>
							<c ca="center">
								<p>23737</p>
							</c>
							<c ca="center">
								<p>343</p>
							</c>
							<c ca="center">
								<p>729</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Glycine max</it>
								</p>
							</c>
							<c ca="center">
								<p>12.0</p>
							</c>
							<c ca="center">
								<p>31928</p>
							</c>
							<c ca="center">
								<p>13930</p>
							</c>
							<c ca="center">
								<p>538</p>
							</c>
							<c ca="center">
								<p>1065</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Lotus japonicus</it>
								</p>
							</c>
							<c ca="center">
								<p>3.0</p>
							</c>
							<c ca="center">
								<p>12485</p>
							</c>
							<c ca="center">
								<p>3116</p>
							</c>
							<c ca="center">
								<p>365</p>
							</c>
							<c ca="center">
								<p>452</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Medicago truncatula</it>
								</p>
							</c>
							<c ca="center">
								<p>8.0</p>
							</c>
							<c ca="center">
								<p>18612</p>
							</c>
							<c ca="center">
								<p>12254</p>
							</c>
							<c ca="center">
								<p>528</p>
							</c>
							<c ca="center">
								<p>852</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Oryza sativa</it>
								</p>
							</c>
							<c ca="center">
								<p>16.0</p>
							</c>
							<c ca="center">
								<p>36381</p>
							</c>
							<c ca="center">
								<p>25842</p>
							</c>
							<c ca="center">
								<p>199</p>
							</c>
							<c ca="center">
								<p>418</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Pinus</it>
									<sup>g</sup>
								</p>
							</c>
							<c ca="center">
								<p>6.0</p>
							</c>
							<c ca="center">
								<p>23531</p>
							</c>
							<c ca="center">
								<p>13949</p>
							</c>
							<c ca="center">
								<p>159</p>
							</c>
							<c ca="center">
								<p>315</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>
									<it>Solanum tuberosum</it>
								</p>
							</c>
							<c ca="center">
								<p>10.0</p>
							</c>
							<c ca="center">
								<p>21063</p>
							</c>
							<c ca="center">
								<p>12625</p>
							</c>
							<c ca="center">
								<p>378</p>
							</c>
							<c ca="center">
								<p>705</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>172900</p>
							</c>
							<c ca="center">
								<p>105453</p>
							</c>
							<c ca="center">
								<p>577</p>
							</c>
							<c ca="center">
								<p>4536</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p><sup>a </sup>Taxon as given by TIGR for the EST collection assembled in the Gene Index Database.</p>
						<p><sup>b </sup>Versions used in this paper, current as of 18 February 2006.</p>
						<p><sup>c </sup>The 363,971 sequences in the database for these taxa were screened to include only those sequences assembled by TIGR into Tentative Consensus (TC) sequences.</p>
						<p><sup>d </sup>TCs were trimmed to the largest sense-direction ORF that was at least 500 nt in length; shorter sequences were discarded.</p>
						<p><sup>e</sup>Number of clusters in which the taxon is represented, after screening for phylogenetic informativeness (at least three taxa and at least four sequences).</p>
						<p><sup>f </sup>Total number of sequences from each taxon in the final set of clusters.</p>
						<p><sup>g</sup>TIGR assembled this library from several species of <it>Pinus</it>.</p>
					</tblfn>
				</tbl>
				<p>Clustering of sequences implemented with BLAST and single-linkage clustering produced a wide diversity of cluster sets (Table <tblr tid="T2">2</tblr>) depending on how we set the minimum hit fraction, which is the set union of the sets of locally aligned sites (hits) reported by BLAST. With this minimum value set to zero, nearly 40,000 clusters were assembled, some 4423 of them phylogenetically informative. However, the largest contained 6565 sequences, and the sequences in it were extremely heterogeneous in length, sequence, and annotation, and were not homologous to any level that would be useful in phylogenetic inference. Clearly the stringency of overlap set by the minimum hit fraction was too low. When we increased the minimum hit fraction the size of the largest cluster was reduced and the data became more fragmented, as reflected in the increasing number of clusters (Table <tblr tid="T2">2</tblr>), but also more homogeneous within clusters. Ultimately we selected a hit fraction of 0.7 in an effort to maximize the amount of information retained while attempting to minimize within-cluster heterogeneity (see also e.g. Schlueter et al. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>; who impose analogous requirements, although on fractional overlap of an entire hit rather than the set union of hits, as we do).</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Effects of hit fraction threshold on cluster assembly. Bold indicates the threshold chosen for the current study.</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="center">
								<p>Hit fraction<sup>a</sup></p>
							</c>
							<c ca="center">
								<p>Clusters<sup>b</sup></p>
							</c>
							<c ca="center">
								<p>Singletons<sup>c</sup></p>
							</c>
							<c ca="center">
								<p>Phylogenetically informative clusters<sup>d</sup></p>
							</c>
							<c ca="center">
								<p>Max size<sup>e</sup></p>
							</c>
							<c ca="center">
								<p>TCs in phylogenetically informative clusters<sup>f</sup></p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.0</p>
							</c>
							<c ca="center">
								<p>39924</p>
							</c>
							<c ca="center">
								<p>26782</p>
							</c>
							<c ca="center">
								<p>4423</p>
							</c>
							<c ca="center">
								<p>6565</p>
							</c>
							<c ca="center">
								<p>54051</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.1</p>
							</c>
							<c ca="center">
								<p>47798</p>
							</c>
							<c ca="center">
								<p>32824</p>
							</c>
							<c ca="center">
								<p>4079</p>
							</c>
							<c ca="center">
								<p>1947</p>
							</c>
							<c ca="center">
								<p>42406</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.2</p>
							</c>
							<c ca="center">
								<p>57229</p>
							</c>
							<c ca="center">
								<p>41327</p>
							</c>
							<c ca="center">
								<p>3324</p>
							</c>
							<c ca="center">
								<p>1362</p>
							</c>
							<c ca="center">
								<p>29403</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.3</p>
							</c>
							<c ca="center">
								<p>64691</p>
							</c>
							<c ca="center">
								<p>48864</p>
							</c>
							<c ca="center">
								<p>2561</p>
							</c>
							<c ca="center">
								<p>330</p>
							</c>
							<c ca="center">
								<p>21504</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.4</p>
							</c>
							<c ca="center">
								<p>71333</p>
							</c>
							<c ca="center">
								<p>56383</p>
							</c>
							<c ca="center">
								<p>1876</p>
							</c>
							<c ca="center">
								<p>117</p>
							</c>
							<c ca="center">
								<p>15457</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.5</p>
							</c>
							<c ca="center">
								<p>77564</p>
							</c>
							<c ca="center">
								<p>63890</p>
							</c>
							<c ca="center">
								<p>1340</p>
							</c>
							<c ca="center">
								<p>98</p>
							</c>
							<c ca="center">
								<p>10721</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.6</p>
							</c>
							<c ca="center">
								<p>83435</p>
							</c>
							<c ca="center">
								<p>71539</p>
							</c>
							<c ca="center">
								<p>897</p>
							</c>
							<c ca="center">
								<p>95</p>
							</c>
							<c ca="center">
								<p>7105</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>
									<b>0.7</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>88864</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>79122</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>577</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>94</b>
								</p>
							</c>
							<c ca="center">
								<p>
									<b>4536</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.8</p>
							</c>
							<c ca="center">
								<p>94296</p>
							</c>
							<c ca="center">
								<p>87186</p>
							</c>
							<c ca="center">
								<p>324</p>
							</c>
							<c ca="center">
								<p>92</p>
							</c>
							<c ca="center">
								<p>2529</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>0.9</p>
							</c>
							<c ca="center">
								<p>99843</p>
							</c>
							<c ca="center">
								<p>95975</p>
							</c>
							<c ca="center">
								<p>103</p>
							</c>
							<c ca="center">
								<p>89</p>
							</c>
							<c ca="center">
								<p>872</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>1.0</p>
							</c>
							<c ca="center">
								<p>105144</p>
							</c>
							<c ca="center">
								<p>104860</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p><sup>a </sup>Minimum proportion of sequence similarity based on BLAST's pairwise comparisons. The hit fraction determines whether a sequence is linked to another (if a pair is linked, they will be placed in the same cluster) and thus affects the level of heterogeneity within clusters and the number of assembled clusters. Original number of sequences is 105,453 TCs.</p>
						<p><sup>b </sup>Total number of assembled clusters.</p>
						<p><sup>c </sup>Number of single-sequence clusters.</p>
						<p><sup>d </sup>Phylogenetically informative clusters for this study are those that include at least three species and at least four sequences.</p>
						<p><sup>e </sup>Number of tentative consensus sequences (TCs) in the largest phylogenetically informative cluster.</p>
						<p><sup>f </sup>Total TCs in all phylogenetically informative clusters.</p>
					</tblfn>
				</tbl>
				<p>The chosen cluster set contained 88,864 clusters, only 577 of which were potentially phylogenetically informative; that is, they had at least four sequences and at least three taxa (most contained just a single sequence or a single taxon: Table <tblr tid="T3">3</tblr> and Table <tblr tid="T4">4</tblr>). Fifty-nine clusters contained sequences from all seven taxa. The largest informative cluster contained 94 sequences, including several from each of the seven taxa. On the other hand, an extraordinarily large number of clusters, 79,122, were singletons (only one sequence). The contributions of each taxon to the final data sets ranged widely: from 315 to 1065 TCs and membership in 159 to 538 clusters (Table <tblr tid="T1">1</tblr>). The number of sequences excluded due to insufficiently long ORFs also varied tremendously across taxa (Table <tblr tid="T1">1</tblr>). In the end only 4536 TC sequences of the original 105,453 found their way into phylogenetic analysis (4.3%). This was about one tenth of the sequences produced by the least stringent clustering requiring 0% hit fraction overlap, but those clusters were largely unusable because of their heterogeneity as described above.</p>
				<tbl id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Distributions of cluster sizes by number of taxa</p>
					</caption>
					<tblbdy cols="2">
						<r>
							<c ca="center">
								<p>Number of taxa in cluster</p>
							</c>
							<c ca="center">
								<p>Number of clusters</p>
							</c>
						</r>
						<r>
							<c cspan="2">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>86022</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1986</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>478</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>162</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>90</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>67</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>59</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<tbl id="T4">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>Distributions of cluster sizes by number of tentative consensus sequences (TCs)</p>
					</caption>
					<tblbdy cols="2">
						<r>
							<c ca="center">
								<p>Number of TCs in cluster</p>
							</c>
							<c ca="center">
								<p>Number of clusters</p>
							</c>
						</r>
						<r>
							<c cspan="2">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>79122</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>2&#8211;3</p>
							</c>
							<c ca="center">
								<p>8645</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>4&#8211;9</p>
							</c>
							<c ca="center">
								<p>930</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>10&#8211;94</p>
							</c>
							<c ca="center">
								<p>167</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>The collection of gene trees reconstructed using parsimony (henceforth "parsimony gene trees") was quite similar to that reconstructed under likelihood ("likelihood gene trees"). In fact, 354 of the clusters produced the same tree topology or same set of equally optimal tree topologies. In 34 other clusters the set of ML trees was a proper subset of the set of MP trees, and in one cluster the reverse was true. Finally, in 187 clusters, the set of MP trees and the set of ML trees were disjoint. Not surprisingly, these tended to be the clusters with more sequences (mean 12.4 sequences, whereas the mean across all 577 clusters was 7.9 sequences).</p>
			</sec>
			<sec>
				<st>
					<p>Tree reconciliation: duplication scores on the accepted species tree</p>
				</st>
				<p>The distribution of the number of clusters inferred to have a given number of duplications is highly skewed for both parsimony and likelihood gene trees with many clusters having zero duplications but the maximum number of duplications in any cluster still being quite large (Table <tblr tid="T5">5</tblr>).</p>
				<tbl id="T5">
					<title>
						<p>Table 5</p>
					</title>
					<caption>
						<p>Distribution of duplication scores among clusters</p>
					</caption>
					<tblbdy cols="3">
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>MP gene trees</p>
							</c>
							<c ca="left">
								<p>ML gene trees</p>
							</c>
						</r>
						<r>
							<c cspan="3">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of clusters with zero duplications</p>
							</c>
							<c ca="left">
								<p>40</p>
							</c>
							<c ca="left">
								<p>42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Number of clusters with zero <it>out</it>-duplications</p>
							</c>
							<c ca="left">
								<p>211</p>
							</c>
							<c ca="left">
								<p>226</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Maximum duplications in any cluster</p>
							</c>
							<c ca="left">
								<p>83</p>
							</c>
							<c ca="left">
								<p>81</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Maximum <it>out</it>-duplications in any cluster</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>17</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>On the accepted species tree, the inference method affected the number of duplications inferred for 154 clusters but had no effect on the other 323 clusters. Of those for which it made a difference, in 87 clusters the likelihood gene trees fit the accepted species tree better i.e. with fewer duplications, and in 67 clusters the parsimony trees fit better.</p>
			</sec>
			<sec>
				<st>
					<p>Gene tree parsimony: finding the optimal species tree</p>
				</st>
				<p>The optimal species trees differed slightly depending on whether the parsimony or likelihood gene trees were used. Based on the likelihood gene trees, the optimal tree was exactly the accepted species tree, with an (out-) duplication score of 779.0 (Fig. <figr fid="F3">3A</figr>). Based on the parsimony gene trees, the optimal species tree was very similar to the accepted tree, except for a rearrangement within the legumes (Fig. <figr fid="F3">3B</figr>). Its score was 771.9 (out-) duplications. The duplication score of the accepted species tree based on the parsimony gene trees was 796.3 and it was ranked fourth among all species trees. Fractional scores reflect weighting of multiple equally parsimonious or equally likely gene trees within a cluster. Note also that the rankings and relative scores are the same when counting all duplications as when counting out-duplications only. As they are restricted to only a single species, in-duplications are akin to autapomorphies in being phylogenetically uninformative.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Species tree inferred by gene tree parsimony</p>
					</caption>
					<text>
						<p><b>Species tree inferred by gene tree parsimony</b>. A. The best species tree obtained using gene tree parsimony based on the maximum likelihood gene tree collection. It is identical to the accepted tree in Figure 2. B. The best species tree obtained using GTP based on the maximum parsimony gene tree collection. It differs from the accepted tree only within the legumes. Bootstrap II support values (resampling the gene trees: see text) are shown in plain text for each bipartition in the tree. Bootstrap I values (resampling the data within the original clusters) are shown in italics for tree B.</p>
					</text>
					<graphic file="1471-2148-7-S1-S3-3"/>
				</fig>
				<p>Because of the exhaustive enumeration algorithm we could obtain the entire distribution of duplication scores for the two analyses (Fig. <figr fid="F4">4</figr>). The duplication scores for the likelihood trees ranged from 779.0 &#8211; 1152.0, whereas those for the parsimony trees ranged from 771.9 &#8211; 1165.9. Both distributions were highly skewed with a long tail of low scoring trees, suggesting the presence of phylogenetic signal, at least by analogy to skewness indices that have been used to study parsimony score distributions <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Distribution of duplication scores across all species trees</p>
					</caption>
					<text>
						<p><b>Distribution of duplication scores across all species trees</b>. Distributions of out-duplication scores across all 945 binary angiosperm species trees (all rooted with <it>Pinus</it>). An out-duplication score is the sum of all out-duplications required to reconcile all 577 gene trees (or sets of trees) to that species tree. The upper panel shows the distribution of scores when the gene trees were estimated using maximum parsimony; the lower panel gives the same for the maximum likelihood gene trees. Arrows indicate the bins in which the accepted species tree occurs. For the MP gene trees, the accepted species tree was fourth from the best and had a score of 796.3 (the optimal species tree had a score of 771.9). For the ML gene trees, the optimal tree was the same as the accepted tree and had a score of 779.0.</p>
					</text>
					<graphic file="1471-2148-7-S1-S3-4"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Support levels and hypothesis testing</p>
				</st>
				<p>Bootstrap I values could only be calculated for the parsimony gene tree collection because of computational limits (100 maximum likelihood searches on 557 data sets was prohibitive). Support was &gt;95% for all nodes in the species tree derived from the parsimony gene trees except for the rosid clade, which was only supported at 48% (Fig. <figr fid="F3">3</figr>). Bootstrap II support values were also moderate for the Rosid clade in both parsimony and likelihood gene tree analyses (71% and 68% respectively). In addition, the relationship within the legumes, which conflicts between the two optimal species trees (<it>Glycine </it>+ <it>Lotus </it>versus <it>Medicago </it>+ <it>Lotus</it>), is weakly supported (66%) in the likelihood analysis, but strongly supported (99%) in the parsimony analysis.</p>
				<p>Because our analyses supported two different trees depending on which collection of gene trees was used, we examined whether these two trees were statistically distinguishable on the basis of the data at hand. Let <it>T</it><sub><it>L </it></sub>be the optimal species tree found based on the likelihood gene trees (identical to the accepted tree) and <it>T</it><sub><it>P </it></sub>be the optimal tree found with the parsimony gene trees. We examined the difference in support for these two trees based on either the likelihood gene tree collection or the parsimony gene tree collection using the analog of the paired-sites test described in the Methods. Based on the parsimony gene trees, there was weak but significant support (<it>P </it>= 0.04) for a difference between <it>T</it><sub><it>L </it></sub>and <it>T</it><sub><it>P</it></sub>. Of the 577 gene trees, 403 showed no difference in unrooted duplication scores between the two trees; 101 had better (lower) scores for <it>T</it><sub><it>P </it></sub>compared to <it>T</it><sub><it>L</it></sub>; 77 had better scores for <it>T</it><sub><it>L</it></sub>. On the other hand, there was no significant difference in support (<it>P </it>= 0.36) based on the likelihood gene tree collection for a difference between <it>T</it><sub><it>L </it></sub>and <it>T</it><sub><it>P</it></sub>. Of the 577 gene trees, 408 showed no difference in unrooted duplication scores between the two trees; 85 had better (lower) scores for <it>T</it><sub><it>P </it></sub>compared to <it>T</it><sub><it>L</it></sub>; 84 had better scores for <it>T</it><sub><it>L</it></sub>. These results are congruent with the bootstrap II comparisons in that they suggest the parsimony gene tree collection makes a more decisive claim about the difference in the species trees than does the likelihood gene tree collection.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<sec>
				<st>
					<p>Phylogenetic sparseness of the EST data</p>
				</st>
				<p>Phylogenomic data sets, whether derived from whole genome sequencing <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, database mining <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, or EST assemblies <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B27">27</abbr></abbrgrp> have yet to combine into one analysis more than a few hundred clusters of sequence homologs ("loci"). The reasons for this are many, but a primary one is the tradeoff between completeness of a data set and lack of homology that eventually limits cluster construction. Sanderson and Driskell <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and Driskell et al. <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> illustrated this graphically by showing the low density of concatenated data matrices assembled from GenBank data mining approaches. <it>Density </it>can be defined as the fraction of sequences present in a "data availability matrix" consisting of all taxa in an analysis by all clusters. The reason why phylogenetic data matrices derived from whole genome analyses do not include <it>all </it>the genes in the genome is partly because lack of homology between sequences in these taxa limits how many taxa actually share the gene in common (either due to gene loss or excessive divergence). Clusters used in phylogenetic analysis are sometimes explicitly constructed to have all taxa or a minimum fraction of such taxa <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, thus keeping data density above a threshold, but also greatly limiting the eventual size of the data matrix.</p>
				<p>EST-based studies also seem to fit into this same paradigm. For example, using small EST libraries to identify orthologous clusters of ESTs, Hughes et al. <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> constructed supermatrices with 71% missing data, and this was <it>after </it>exclusion of most of the data because of extensive paralogy. In our data, for the cluster set used for most analyses, we identified 88,864 clusters for the seven taxa. However, 79,122 of these were singleton clusters, meaning that a whopping 75% of the original 105,453 TCs did not pass our minimal homology threshold. Moreover, as only 577 of the remaining clusters were actually potentially <it>phylogenetically </it>informative, the final density in the phylogenetic data availability matrix could not possibly exceed 577/88,864 or 0.6%. Higher densities are possible if the cluster assembly stringency is relaxed, but as we have seen, this leads to very heterogeneous clusters with few regions of homology &#8211; presumably engendering downstream problems in subsequent phylogenetic analysis.</p>
				<p>This very small fraction of the EST data that appear to be potentially useful for phylogenetic studies raises questions about the relative costs and benefits of obtaining EST data for phylogenetic work <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. However, several other factors are important to consider. First, using available EST libraries as tools to screen for loci useful for phylogenetic inference may justify their expense in a small number of pilot taxa. Primers can be developed for later use in extensive taxon surveys (e.g. <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>). Second, as local alignment tools (e.g. <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>) and phylogenetic inference algorithms improve, it should be possible to assemble clusters with more heterogeneity and distant homologies, and hence exploit more of the original data. Finally, it may be necessary to view the problem as one that will eventually be overcome by improvements in technology and reductions in expense. After all, for many single loci sequenced in conventional phylogenetic analysis, most sites are conserved and uninformative. The only factor that makes this palatable is the (now) relative inexpensiveness of sequencing technology.</p>
			</sec>
			<sec>
				<st>
					<p>Extent of duplication and implications for species tree inference</p>
				</st>
				<p>Among the 577 phylogenetically informative clusters, most showed evidence of gene duplication by conflicting with the accepted species tree. Even if we conservatively regard inparalogs as multiple alleles or multiple accessions of the same locus, there are 351 clusters that show at least one out-duplication when reconciled against the species tree using the likelihood gene trees. If we take a more liberal view, counting all duplications, then 535 of the clusters show evidence of duplication. Similar numbers obtain if the parsimony gene trees are used. Since duplications are minimized across all rootings of the gene trees, our estimated number of duplicated loci is probably somewhat lower even than the true value. On the other hand, the fact that the gene trees themselves have error is unaccounted for by our methods, and, failing to take uncertainty into account may inflate the inferred number of duplications <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Regardless of these considerations, the fraction of the phylogenetically useful data in clusters that are locked up in gene families in plants, as opposed to single-copy genes, seems to be extremely high. To exploit the nuclear genome in plants to build species trees therefore seems to require methods that can handle extensive duplication (and gene loss or failure to sample), such as GTP or alternative frameworks <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B54">54</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Performance of species tree inference</p>
				</st>
				<p>Despite the extensive heterogeneity in the data themselves, and the complex informatics pipeline that ultimately filtered out most of the original data, remarkably strong signal for the accepted species phylogeny was evident in the GTP analysis. The GTP analysis of the likelihood gene trees yielded the correct "accepted" species tree. The analysis of the parsimony gene trees yielded a species tree close to the accepted tree (which was ranked fourth out of 945). We find these results both surprising and promising for three reasons. First, gene families are subject to a variety of processes that can destroy the hierarchical signature of phylogenetic history, such as gene conversion between paralogs. Methods are available to detect such events <abbrgrp><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp> and to incorporate them into phylogenetic inference <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>, but the latter are still in their infancy.</p>
				<p>Second, the EST data themselves were "messy" compared to other data sets we have examined <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. EST tentative consensus sequences, which formed the start of our analysis, are themselves assembled from individual short EST sequences using complex and assumption-laden informatics protocols <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Among the factors that these assemblies contend with are filtering contaminants, correct assembly of pieces of the same paralog in gene families, and handling of alternative splicing, all in the presence of the usual issues raised in local and global homology algorithms. Though EST data have been widely used in evolutionary studies (e.g. of whole genome duplications: <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>), they have rarely been used en masse in phylogenetic analysis of any taxon <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B51">51</abbr></abbrgrp>, and it was reasonable to think that one cause for this was that these complexities overcame any underlying signal. Apparently this is not the case.</p>
				<p>Finally, GTP has only been used to build species trees in a few studies. Although a few issues have been raised in criticism of GTP (see below), one cannot help but think that GTP has not been used more either because of lack of software tools, or lack of data. Although many implementations of gene tree reconciliation are available <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B40">40</abbr><abbr bid="B54">54</abbr></abbrgrp>, few tools for GTP itself have been available except for Page's COMPONENT <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and later GeneTree programs <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Neither of these is set up to handle large numbers of loci easily or is scriptable, a necessity for much high throughput informatics work. Moreover, the search strategies rely only on branch swapping from random starting trees. If GTP is as difficult an optimization problem as maximum parsimony, with as messy data, experience suggests that this heuristic is not likely to perform very well. However, we have not solved that particular problem either. Instead, we <it>avoided </it>it by implementing an exact exhaustive enumeration possible only because of the small species tree in our problem.</p>
				<p>Another reason for the lack of GTP studies may be the lack of available gene family data for many taxa. Phylogeneticists have done their best to filter out gene families in the search for single-copy "magic bullets" that are easily sequenced by direct PCR, and to avoid cloning, Southern blots, or other labor-intensive techniques that are often necessary to initially identify paralogous copies of loci <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. However, large databases of protein families exist and have been relatively underexploited for species tree inference (except see Cotton and Page's <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> analysis of the HOVERGEN database). For many taxa that phylogeneticists find interesting, however, such data are simply not available. The taxa are not model species by and large, and there has been no compelling reason to seek a diversity of loci in relatively obscure taxa. This will probably change as more and more sequencing projects and EST libraries build bridges to nonmodel taxa.</p>
				<p>Criticisms of GTP are numerous <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>, and many reflect the same concerns as have been raised about supertree analysis <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> &#8211; in particular, that by taking a set of trees as the input, information about the uncertainty in those trees is lost (and hidden information within each data set cannot synergistically emerge). Certainly some number of duplications are inferred incorrectly simply because the gene tree is wrong <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. To address this, relative clade support scores can be incorporated when reconciling gene trees with species trees <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. However, the sheer volume of gene trees used here apparently overcame the errors associated with any one incorrect gene tree, implying a lack of systematic bias in the gene tree estimates, at least when likelihood was used to infer the gene trees. A more niggling issue is that the standard GTP algorithms all still require binary input trees. Chang <abbrgrp><abbr bid="B61">61</abbr></abbrgrp> has developed an algorithm that solves this problem, but it is not yet implemented. Both of these problems can be addressed at least partly through bootstrap procedures <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, which, when constrained to generate binary gene trees, sample across much of the diversity that is entailed by multifurcations arising from either lack of data for that node or conflicting signals.</p>
			</sec>
			<sec>
				<st>
					<p>Future work</p>
				</st>
				<p>Currently the main factor limiting the application of GTP to species tree inference seems to be a paucity of implemented tree search heuristics. Three related algorithmic challenges remain in this arena. First, gene tree uncertainty has to be integrated more directly into the tree reconciliation calculations. Durand et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> developed algorithms to calculate improved duplication scores in the presence of gene tree uncertainty and demonstrated the dramatic reduction in estimated score that can ensue. These or similar approaches must ultimately be imbedded in GTP algorithms. Second, multifurcations in the gene tree and species tree have to be accommodated <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>. Finally, to address the growing size of data sets, it will be necessary to integrate these aspects of the GTP problem with whatever tree search heuristics are developed, so that redundant re-calculation of scores for subtrees are avoided.</p>
				<p>On the data analysis side, the size and taxonomic diversity of EST libraries will continue to grow, and our results suggest that these will be useful sources of data in the future for inferences about phylogeny. However, much work remains on the pre-processing side of the analysis, prior to gene tree construction and GTP analysis. The assembly of ESTs is a computationally and biologically challenging problem, especially in light of the high frequency of duplication in plant genomes, and the not infrequent occurrence of alternative splicing <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Perhaps the greatest challenge will be to develop methods that properly account for ascertainment bias: the failure to sample all paralogs in a gene family for some or all taxa. Although model-based approaches (e.g. <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>) to gene tree reconciliation offer a direct route to incorporate models of sample bias into the problem, these are computationally expensive methods, and it may be possible to use faster weighting schemes in some modification of the GTP framework.</p>
				<p>Finally, the ubiquity of whole genome duplications (e.g. <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B62">62</abbr></abbrgrp>) has important implications for inferring species trees from gene families. Page and Cotton <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> looked for clustering of episodes of duplication in vertebrate gene families but found little evidence for it based solely on the phylogenetic position of the duplications. Subsequently <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> they added to their phylogenetic approach inferred duplication times and then found support for an ancient round of accelerated duplication rates in vertebrates, though not the recent episode that has been reported elsewhere <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>. Their approach complements a much more widely used approach of examining the distribution of duplications ages for peaks at different points in time (e.g. <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>). In addition to providing insights into genome evolution, these approaches suggest that supplementing the GTP inference problem with divergence time information to constrain its structure may be profitable, if only the accuracy of such information can be assured.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Sequence data and gene trees</p>
				</st>
				<p>We downloaded EST data for seven plant taxa, including six angiosperms (<it>Oryza sativa</it>, <it>Solanum tuberosum</it>, <it>Arabidopsis thaliana</it>, <it>Glycine max</it>, <it>Lotus japonicus</it>, <it>Medicago truncatula</it>) and one conifer, <it>Pinus </it>(Fig. <figr fid="F2">2</figr>) to serve as an outgroup. Data were obtained from the TIGR Gene Indices Database <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr></abbrgrp> (Table <tblr tid="T1">1</tblr>). Initial data analysis protocols were similar to those reported in <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. We extracted all TCs (tentative consensus sequences) for each taxon and used the EMBOSS program getorf <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> to find open reading frames of at least 500 nt in length in the sense direction. Default settings were used (ORF defined as a region between stop codons according to the standard genetic code). These filtered TCs were then used in subsequent analyses.</p>
				<p>Clusters of homologous TCs were obtained using all-by-all BLAST nucleotide similarity searches <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> on the filtered and trimmed TCs (low-complexity filter DUST turned on; maximum Expect (E) value of 1.0e-10). BLAST was undertaken on nucleotide sequences despite the high level of divergence at third codon positions because of the possibility of mistaken amino acid translations based on incorrect ORF identifications in these data in which alternative splicing was not uncommon. Single-linkage clustering was used to assemble clusters based on BLAST output (program <it>blink </it>available at MJS's web site <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>; additional utility scripts available from authors). High levels of within-cluster heterogeneity among sequences can lead to severe alignment problems <abbrgrp><abbr bid="B72">72</abbr></abbrgrp>. Therefore, a pair of sequences was considered as a hit if it was reported as a BLAST hit and it surpassed a minimum "hit fraction" of 0.70 for each sequence, i.e., at least 70% of each sequence must align to the other sequence with E values lower than the threshold &#8211; though not necessarily in a single contiguous hit. The threshold was imposed symmetrically for both query and target sequence. The value 0.70 was experimentally determined by simultaneously attempting to maximize the number of sequences assigned to clusters and minimizing the heterogeneity, both in terms of sequence divergence and length differences, of the resulting clusters.</p>
				<p>Resulting clusters were screened for potential phylogenetic informativeness. To provide potential information in a GTP analysis, which fundamentally requires a rooted species tree and one or more rooted gene trees, the gene trees and the clusters used to construct them must consist of three or more sequences from three or more species. However, because we are not using external evidence to root the gene trees but rather are examining all duplication scores across all possible rootings, our gene trees must have at least four sequences. If a gene tree with three sequences only is rerooted, it will be congruent with all rooted species tree for <it>some </it>gene tree rooting, and therefore it will not provide any information to discriminate among species relationships in the GTP analysis. If on the other hand, the gene trees were rooted using a molecular clock or midpoint rooting, for example, then clusters with only three sequences could potentially incur duplication scores that differed from species tree to species tree.</p>
				<p>Once the clusters were screened for informativeness (with specific regard to gene tree parsimony), we used the global alignment program Clustal W <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> to align nucleotide sequences within the clusters. A sample of alignments was checked manually for obvious alignment mistakes, none were found, and consequently the alignments were not edited further. Gene trees were reconstructed for each cluster using heuristic maximum parsimony and maximum likelihood implemented in PAUP* 4.0b10 <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>. Because only binary trees may be used in available algorithms for gene tree parsimony, zero-length branches were not collapsed in either method. Heuristic parsimony searches consisted of simple-addition sequences with tree-bisection-reconnection branch swapping, keeping a maximum of 10000 equally parsimonious trees (which was never exceeded). Heuristic maximum likelihood searches used a neighbor-joining starting tree followed by TBR branch swapping, time-limited to 6 hours, using an HKY85 + &#915; model of evolution in which all parameters were estimated from the data. All phylogenetic analyses were conducted on a dual Xeon 2.80 Ghz CPU with 3 GB of RAM or on a 35 node Linux cluster, in which the head node is a dual Xeon 2.66 ghz CPU with 3 GB RAM and each node is a dual AMD 1.4 Ghz CPU with 1 GB RAM.</p>
				<p>To construct a confidence set of trees for each cluster in parsimony analyses, we bootstrapped the sequence data (100 pseudoreplicates, saving each gene tree, or set of trees, each weighted by the inverse of the number of trees found for that particular replicate). Searches were conducted with the same settings as for searches on the original clusters. The computational overhead was too high to do the same for maximum likelihood (worst case running time: six hours &#215; 100 replications &#215; 557 data sets).</p>
			</sec>
			<sec>
				<st>
					<p>Gene tree reconciliation: duplication scores on the accepted species tree</p>
				</st>
				<p>To reconcile the gene trees to species trees by minimizing the number of duplication events, we implemented the algorithm of Zmasek and Eddy <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> in a C program available from MJS at his web site <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. This algorithm runs, under the rarely expected worst case, in <it>O</it>(<it>n</it><sup>2</sup>) time <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, but its average behavior is much better, as shown both by Zmasek and Eddy's experimental results <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> and our experience with the present data set. We implemented their algorithm in C to run quickly for the large numbers of gene trees and species trees analyzed in this paper: each gene tree parsimony analysis had to reconcile 557 gene trees under all possible rootings for each of 945 species trees. Although Durand et al.'s <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> recently released NOTUNG 2.1 program is quite full featured and would have been an appropriate tool for this task, its Java implementation and requirement that it be re-executed for each species tree/gene tree pair made it too slow for this problem.</p>
				<p>Other criteria can be used to reconcile gene trees to species trees such as the sum of duplications plus losses or, for recently diverged lineages, coalescent depth <abbrgrp><abbr bid="B75">75</abbr></abbrgrp>. We chose not to incorporate the number of losses into the optimality criterion because the data sets for the current study, largely derived from ESTs, are exceptionally prone to incomplete sampling, and a true evolutionary loss is therefore difficult to distinguish from mere ascertainment bias. Moreover, adding losses to the optimality criterion introduces the difficult problem of weighting the relative importance of duplications, losses due to evolutionary deletion, and "losses" due to sampling omissions. The duplication score alone is expected to be a more robust indicator of gene family diversity in these circumstances <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
				<p>Reconciliation of a gene tree to a species tree requires that both trees be rooted <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The species tree of angiosperms is rooted with an outgroup to angiosperms among seed plants, the conifer, <it>Pinus</it>. However, outgroup rooting is not possible for the gene trees, because, for example, a gene tree might have two paralogs from <it>Pinus </it>in different parts of the tree, leaving the position of the root uncertain. Occasionally, the root may be inferred in simple scenarios in which a single duplication has occurred prior to all taxa in the analysis and the root is clearly between the two paralog trees, but in general, this will not be the case. Therefore, as suggested previously <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B32">32</abbr></abbrgrp>, we reconciled the species and gene tree by evaluating the duplication score for all possible roots of the gene tree, selecting the root(s) that minimize the number of duplications inferred. Some EST clusters produced multiple equally parsimonious trees. In these cases an average duplication score was constructed across the set of equally parsimonious trees.</p>
				<p>Because clusters lacking duplications are of special significance to species level phylogenetics, e.g. they can potentially be concatenated in "supermatrix" analyses, we estimated their occurrence in the data. A cluster was scored as lacking duplications if <it>all </it>equally parsimonious trees for that cluster had an unrooted duplication score of zero. These values are reported both for all duplications and for only out-duplications.</p>
			</sec>
			<sec>
				<st>
					<p>Gene tree parsimony: finding the optimal species tree</p>
				</st>
				<p>Because of the relatively small size of the species tree, gene tree parsimony searches for the optimal species tree were implemented by exhaustively enumerating all 945 species trees (rooted with <it>Pinus</it>), and calculating the summed gene duplication scores across all gene trees for each of these species trees. This procedure was repeated for both parsimony and likelihood collections of gene trees. This strategy obviously would not be feasible for species trees much larger than this. A benefit of exhaustive enumeration is that it provides the exact distribution of GTP scores across all the species trees. This allowed, among other things, a ranking of all species trees according to GTP score and a comparison of the relative position of the optimal GTP tree and the true tree.</p>
			</sec>
			<sec>
				<st>
					<p>Support levels and hypothesis tests</p>
				</st>
				<p>Little work has been done to develop confidence assessments in GTP analyses, per se, although several authors have taken a bootstrap approach to identification of orthologs with gene tree reconciliation <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Cotton and Page <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> suggested a bootstrap analysis to account for gene tree uncertainty, in which each of <it>k </it>data sets used to generate the <it>k </it>gene trees is bootstrapped <it>N </it>times, generating a set of <it>k </it>bootstrap profiles. Then a higher level GTP bootstrap analysis is done by taking the <it>i</it>th tree from each of the <it>k </it>profiles and performing a complete GTP search for the species tree, generating species tree <it>i</it>, and repeating this for <it>i </it>= 1, ..., <it>N</it>. The collection of <it>N </it>species tree then forms a confidence set of species trees, and majority rule consensus is used to summarize support, as in conventional bootstrapping <abbrgrp><abbr bid="B76">76</abbr></abbrgrp>. We refer to this as <it>Bootstrap I</it>.</p>
				<p>An alternative bootstrap procedure uses the gene trees themselves as the sampling unit. In a single bootstrap replicate, a set of <it>k </it>gene trees is assembled by sampling from the original set of <it>k </it>gene trees randomly with replacement. Then a species tree is built by GTP, and the process is repeated <it>N </it>times. Again a majority rule tree can be constructed. We refer to this as <it>Bootstrap II</it>.</p>
				<p>Finally, because the optimal species tree may be different from the accepted species tree in Figure <figr fid="F2">2</figr>, it is useful to test whether there is a significant difference in support from the gene duplication data. For this, we propose a simple analog to paired sites tests used extensively for parsimony and likelihood tree inference (reviewed in <abbrgrp><abbr bid="B77">77</abbr></abbrgrp>). For each gene tree, we calculate the duplication score on tree 1 and tree 2. Under the null hypothesis of equal support for the two trees, the mean difference in score of these across sites should be zero. A paired <it>t</it>-test provides a test of significance taking the variance into account. As is now well known however, if one of the two trees is the optimal tree (as it will be here), the test is one-sided and the <it>P</it>-value must be the appropriate one-sided version <abbrgrp><abbr bid="B77">77</abbr></abbrgrp>. Additional analogous tests presumably could be constructed to account for multiple test issues that might arise if we examined many trees <abbrgrp><abbr bid="B78">78</abbr></abbrgrp>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>MJS wrote the code to implement GTP. MMM performed a majority of the informatics data analyses. The authors participated equally in writing the manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Mathieu Blanchette and Herv&#233; Philippe for organizing the conference on phylogenomics, which spurred the present work. We also thank Rod Page and Oliver Eulenstein for stimulating discussion over many years on this topic. Comments of Mark Simmons and four anonymous reviewers were greatly appreciated. This work was supported by an AToL grant from the US NSF.</p>
				<p>This article has been published as part of <it>BMC Evolutionary Biology </it>Volume 7, Supplement 1, 2007: First International Conference on Phylogenomics. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcevolbiol/7?issue=S1</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Reassessment of phylogenetic relationships in <it>Clarkia </it>sect. <it>Sympherica</it></p>
				</title>
				<aug>
					<au>
						<snm>Ford</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Gottlieb</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Amer J Bot</source>
				<pubdate>2003</pubdate>
				<volume>90</volume>
				<fpage>284</fpage>
				<lpage>292</lpage>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Phylogeny of the New World diploid cottons (<it>Gossypium </it>L., Malvaceae) based on sequences of three low-copy nuclear genes</p>
				</title>
				<aug>
					<au>
						<snm>Alvarez</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Cronn</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Wendel</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Plant Syst Evol</source>
				<pubdate>2005</pubdate>
				<volume>252</volume>
				<fpage>199</fpage>
				<lpage>214</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1007/s00606-004-0294-0</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>ESTimating plant phylogeny: lessons from partitioning</p>
				</title>
				<aug>
					<au>
						<snm>de la Torre</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Egan</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Katari</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Stevenson</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Coruzzi</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Desalle</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>BMC Evol Biol</source>
				<pubdate>2006</pubdate>
				<volume>6</volume>
				<fpage>48</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1564041</pubid>
						<pubid idtype="pmpid" link="fulltext">16776834</pubid>
						<pubid idtype="doi">10.1186/1471-2148-6-48</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>The evolutionary dynamics of plant duplicate genes</p>
				</title>
				<aug>
					<au>
						<snm>Moore</snm>
						<fnm>RC</fnm>
					</au>
					<au>
						<snm>Purugganan</snm>
						<fnm>MD</fnm>
					</au>
				</aug>
				<source>Curr Opin Plant Biol</source>
				<pubdate>2005</pubdate>
				<volume>8</volume>
				<fpage>122</fpage>
				<lpage>128</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.pbi.2004.12.001</pubid>
						<pubid idtype="pmpid" link="fulltext">15752990</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>The hidden duplication past of <it>Arabidopsis thaliana</it></p>
				</title>
				<aug>
					<au>
						<snm>Simillion</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Vandepoele</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Van Montagu</snm>
						<fnm>MCE</fnm>
					</au>
					<au>
						<snm>Zabeau</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Van de Peer</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>13627</fpage>
				<lpage>13632</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">129725</pubid>
						<pubid idtype="pmpid" link="fulltext">12374856</pubid>
						<pubid idtype="doi">10.1073/pnas.212522399</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics</p>
				</title>
				<aug>
					<au>
						<snm>Paterson</snm>
						<fnm>AH</fnm>
					</au>
					<au>
						<snm>Bowers</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Chapman</snm>
						<fnm>BA</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2004</pubdate>
				<volume>101</volume>
				<fpage>9903</fpage>
				<lpage>9908</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">470771</pubid>
						<pubid idtype="pmpid" link="fulltext">15161969</pubid>
						<pubid idtype="doi">10.1073/pnas.0307901101</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Analysis of the genome sequence of the flowering plant <it>Arabidopsis thaliana</it></p>
				</title>
				<aug>
					<au>
						<cnm>Arabidopsis Genome Initiative</cnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>408</volume>
				<fpage>796</fpage>
				<lpage>815</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35048692</pubid>
						<pubid idtype="pmpid" link="fulltext">11130711</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences</p>
				</title>
				<aug>
					<au>
						<snm>Goodman</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Czelusniak</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Moore</snm>
						<fnm>GW</fnm>
					</au>
					<au>
						<snm>Romero-Herrera</snm>
						<fnm>AE</fnm>
					</au>
					<au>
						<snm>Matsuda</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Syst Zool</source>
				<pubdate>1979</pubdate>
				<volume>28</volume>
				<fpage>132</fpage>
				<lpage>163</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2412519</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Maps between trees and cladistic analysis of historical associations among genes, organisms and areas</p>
				</title>
				<aug>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>1994</pubdate>
				<volume>43</volume>
				<fpage>58</fpage>
				<lpage>77</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2413581</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>How should species phylogenies be inferred from sequence data?</p>
				</title>
				<aug>
					<au>
						<snm>Slowinski</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>1999</pubdate>
				<volume>48</volume>
				<fpage>814</fpage>
				<lpage>825</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1080/106351599260030</pubid>
						<pubid idtype="pmpid">12066300</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Phylogeny reconstruction using duplicate genes</p>
				</title>
				<aug>
					<au>
						<snm>Simmons</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bailey</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Nixon</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2000</pubdate>
				<volume>17</volume>
				<fpage>469</fpage>
				<lpage>473</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10742039</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Going nuclear: gene family evolution and vertebrate phylogeny reconciled</p>
				</title>
				<aug>
					<au>
						<snm>Cotton</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
				</aug>
				<source>Proc Biol Sci</source>
				<pubdate>2002</pubdate>
				<volume>269</volume>
				<fpage>1555</fpage>
				<lpage>1561</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1098/rspb.2002.2074</pubid>
						<pubid idtype="pmpid" link="fulltext">12184825</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>A hybrid micro-macroevolutionary approach to gene tree reconstruction</p>
				</title>
				<aug>
					<au>
						<snm>Durand</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Halldorsson</snm>
						<fnm>BV</fnm>
					</au>
					<au>
						<snm>Vernot</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Research in Computational Molecular Biology, Proceedings</source>
				<pubdate>2005</pubdate>
				<volume>3500</volume>
				<fpage>250</fpage>
				<lpage>264</lpage>
			</bibl>
			<bibl id="B14">
				<title>
					<p>A linear time algorithm for tree mapping</p>
				</title>
				<aug>
					<au>
						<snm>Eulenstein</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Arbeitspapiere der GMD</source>
				<pubdate>1997</pubdate>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Genome-scale approaches to resolving incongruence in molecular phylogenies</p>
				</title>
				<aug>
					<au>
						<snm>Rokas</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Williams</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>King</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Carroll</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2003</pubdate>
				<volume>425</volume>
				<fpage>798</fpage>
				<lpage>804</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature02053</pubid>
						<pubid idtype="pmpid" link="fulltext">14574403</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Evolutionary sequence analysis of complete eukaryote genomes</p>
				</title>
				<aug>
					<au>
						<snm>Blair</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Shah</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Hedges</snm>
						<fnm>SB</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>53</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1274250</pubid>
						<pubid idtype="pmpid" link="fulltext">15762985</pubid>
						<pubid idtype="doi">10.1186/1471-2105-6-53</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Tunicates and not cephalochordates are the closest living relatives of vertebrates</p>
				</title>
				<aug>
					<au>
						<snm>Delsuc</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Brinkmann</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Chourrout</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Philippe</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2006</pubdate>
				<volume>439</volume>
				<fpage>965</fpage>
				<lpage>968</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature04336</pubid>
						<pubid idtype="pmpid" link="fulltext">16495997</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Prospects for building the tree of life from large sequence databases</p>
				</title>
				<aug>
					<au>
						<snm>Driskell</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>An&#233;</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Burleigh</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>McMahon</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>O'Meara</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Sanderson</snm>
						<fnm>MJ</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2004</pubdate>
				<volume>306</volume>
				<fpage>1172</fpage>
				<lpage>1174</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1102036</pubid>
						<pubid idtype="pmpid" link="fulltext">15539599</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Toward automatic reconstruction of a highly resolved tree of life</p>
				</title>
				<aug>
					<au>
						<snm>Ciccarelli</snm>
						<fnm>FD</fnm>
					</au>
					<au>
						<snm>Doerks</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>von Mering</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Creevey</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2006</pubdate>
				<volume>311</volume>
				<fpage>1283</fpage>
				<lpage>1287</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1123061</pubid>
						<pubid idtype="pmpid" link="fulltext">16513982</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Dense taxonomic EST sampling and its applications for molecular systematics of the Coleoptera (beetles)</p>
				</title>
				<aug>
					<au>
						<snm>Hughes</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Longhorn</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Papadopoulou</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Theodorides</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>de Riva</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mejia-Chang</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Foster</snm>
						<fnm>PG</fnm>
					</au>
					<au>
						<snm>Vogler</snm>
						<fnm>AP</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2006</pubdate>
				<volume>23</volume>
				<fpage>268</fpage>
				<lpage>278</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msj041</pubid>
						<pubid idtype="pmpid" link="fulltext">16237206</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Towards detection of orthologues in sequence databases</p>
				</title>
				<aug>
					<au>
						<snm>Yuan</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Eulenstein</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Vingron</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>285</fpage>
				<lpage>289</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/14.3.285</pubid>
						<pubid idtype="pmpid" link="fulltext">9614272</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Automated ortholog inference from phylogenetic trees and calculation of orthology reliability</p>
				</title>
				<aug>
					<au>
						<snm>Storm</snm>
						<fnm>CEV</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>ELL</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>92</fpage>
				<lpage>99</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.1.92</pubid>
						<pubid idtype="pmpid" link="fulltext">11836216</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>OrthoMCL: identification of ortholog groups for eukaryotic genomes</p>
				</title>
				<aug>
					<au>
						<snm>Li</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Stoeckert</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Roos</snm>
						<fnm>DS</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>2178</fpage>
				<lpage>2189</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403725</pubid>
						<pubid idtype="pmpid" link="fulltext">12952885</pubid>
						<pubid idtype="doi">10.1101/gr.1224503</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Obtaining maximal concatenated phylogenetic data sets from large sequence databases</p>
				</title>
				<aug>
					<au>
						<snm>Sanderson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Driskell</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Ree</snm>
						<fnm>RH</fnm>
					</au>
					<au>
						<snm>Eulenstein</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Langley</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2003</pubdate>
				<volume>20</volume>
				<fpage>1036</fpage>
				<lpage>1042</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msg115</pubid>
						<pubid idtype="pmpid" link="fulltext">12777519</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Analytical methods for detecting paralogy in molecular datasets</p>
				</title>
				<aug>
					<au>
						<snm>Cotton</snm>
						<fnm>JA</fnm>
					</au>
				</aug>
				<source>Meth Enzymol</source>
				<pubdate>2005</pubdate>
				<volume>395</volume>
				<fpage>700</fpage>
				<lpage>724</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0076-6879(05)95036-2</pubid>
						<pubid idtype="pmpid" link="fulltext">15865991</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Orthologs, paralogs, and evolutionary genomics</p>
				</title>
				<aug>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Annu Rev Genet</source>
				<pubdate>2005</pubdate>
				<volume>39</volume>
				<fpage>309</fpage>
				<lpage>338</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.genet.39.073003.114725</pubid>
						<pubid idtype="pmpid" link="fulltext">16285863</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA)</p>
				</title>
				<aug>
					<au>
						<snm>Lee</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Sultana</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Pertea</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Cho</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Karamycheva</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Tsai</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Parvizi</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Cheung</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Antonescu</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Holt</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Liang</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Quackenbush</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>493</fpage>
				<lpage>502</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">155294</pubid>
						<pubid idtype="pmpid" link="fulltext">11875039</pubid>
						<pubid idtype="doi">10.1101/gr.212002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts</p>
				</title>
				<aug>
					<au>
						<snm>Rensink</snm>
						<fnm>WA</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Iobst</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ouyang</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Buell</snm>
						<fnm>CR</fnm>
					</au>
				</aug>
				<source>BMC Genomics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1249569</pubid>
						<pubid idtype="pmpid" link="fulltext">16162286</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Comparative EST analyses in plant systems</p>
				</title>
				<aug>
					<au>
						<snm>Dong</snm>
						<fnm>QF</fnm>
					</au>
					<au>
						<snm>Kroiss</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Oakley</snm>
						<fnm>FD</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>BB</fnm>
					</au>
					<au>
						<snm>Brendel</snm>
						<fnm>V</fnm>
					</au>
				</aug>
				<source>Meth Enzymol</source>
				<pubdate>2005</pubdate>
				<volume>395</volume>
				<fpage>400</fpage>
				<lpage>418</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0076-6879(05)95022-2</pubid>
						<pubid idtype="pmpid">15984049</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Expressed sequence tags: clustering and applications</p>
				</title>
				<aug>
					<au>
						<snm>Kalyanaraman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Aluru</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Handbook of computational molecular biology</source>
				<publisher>Boca Raton: Chapman and Hall/CRC</publisher>
				<editor>Aluru S</editor>
				<pubdate>2006</pubdate>
				<note>12&#8211;11 through 12&#8211;22</note>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Distinguishing homologous and analogous proteins</p>
				</title>
				<aug>
					<au>
						<snm>Fitch</snm>
						<fnm>WM</fnm>
					</au>
				</aug>
				<source>Syst Zool</source>
				<pubdate>1970</pubdate>
				<volume>19</volume>
				<fpage>99</fpage>
				<lpage>113</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.2307/2412448</pubid>
						<pubid idtype="pmpid">5449325</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>A simple algorithm to infer gene duplication and speciation events on a gene tree</p>
				</title>
				<aug>
					<au>
						<snm>Zmasek</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Eddy</snm>
						<fnm>SR</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>821</fpage>
				<lpage>828</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/17.9.821</pubid>
						<pubid idtype="pmpid" link="fulltext">11590098</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Reconstruction of ancient molecular phylogeny</p>
				</title>
				<aug>
					<au>
						<snm>Guigo</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Muchnik</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Mol Phylogenet Evol</source>
				<pubdate>1996</pubdate>
				<volume>6</volume>
				<fpage>189</fpage>
				<lpage>213</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/mpev.1996.0071</pubid>
						<pubid idtype="pmpid" link="fulltext">8899723</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>From gene to organismal phylogeny: Reconciled trees and the gene tree/species tree problem</p>
				</title>
				<aug>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
					<au>
						<snm>Charleston</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Mol Phylogenet Evol</source>
				<pubdate>1997</pubdate>
				<volume>7</volume>
				<fpage>231</fpage>
				<lpage>240</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/mpev.1996.0390</pubid>
						<pubid idtype="pmpid" link="fulltext">9126565</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Gene trees in species trees</p>
				</title>
				<aug>
					<au>
						<snm>Maddison</snm>
						<fnm>WP</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>1997</pubdate>
				<volume>46</volume>
				<fpage>523</fpage>
				<lpage>536</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2413694</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Inferring species trees from gene trees: A phylogenetic analysis of the Elapidae</p>
				</title>
				<aug>
					<au>
						<snm>Slowinski</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Knight</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rooney</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Mol Phylogenet Evol</source>
				<pubdate>1997</pubdate>
				<volume>8</volume>
				<fpage>349</fpage>
				<lpage>362</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/mpev.1997.0434</pubid>
						<pubid idtype="pmpid" link="fulltext">9417893</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Tree reconciliation: reconstruction of species phylogeny by phylogenetic gene trees</p>
				</title>
				<aug>
					<au>
						<snm>V'Yugin</snm>
						<fnm>VV</fnm>
					</au>
					<au>
						<snm>Gelfand</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Lyubetsky</snm>
						<fnm>VA</fnm>
					</au>
				</aug>
				<source>Mol Biol</source>
				<pubdate>2002</pubdate>
				<volume>36</volume>
				<fpage>650</fpage>
				<lpage>658</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1023/A:1020667228952</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>COMPONENT user's manual (version 2.0)</p>
				</title>
				<aug>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
				</aug>
				<publisher>London: Trustees of The Natural History Museum</publisher>
				<pubdate>1993</pubdate>
			</bibl>
			<bibl id="B39">
				<title>
					<p>GeneTree: comparing gene and species phylogenies using reconciled trees</p>
				</title>
				<aug>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>819</fpage>
				<lpage>820</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/14.9.819</pubid>
						<pubid idtype="pmpid" link="fulltext">9918954</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>NOTUNG: a program for dating gene duplications and optimizing gene family trees</p>
				</title>
				<aug>
					<au>
						<snm>Chen</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Durand</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Farach-Colton</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Comput Biol</source>
				<pubdate>2000</pubdate>
				<volume>7</volume>
				<fpage>429</fpage>
				<lpage>447</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1089/106652700750050871</pubid>
						<pubid idtype="pmpid" link="fulltext">11108472</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology</p>
				</title>
				<aug>
					<au>
						<snm>Soltis</snm>
						<fnm>PS</fnm>
					</au>
					<au>
						<snm>Soltis</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Chase</snm>
						<fnm>MW</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1999</pubdate>
				<volume>402</volume>
				<fpage>402</fpage>
				<lpage>404</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/46528</pubid>
						<pubid idtype="pmpid" link="fulltext">10586878</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Angiosperm phylogeny based on <it>matK </it>sequence information</p>
				</title>
				<aug>
					<au>
						<snm>Hilu</snm>
						<fnm>KW</fnm>
					</au>
					<au>
						<snm>Borsch</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Muller</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Soltis</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Soltis</snm>
						<fnm>PS</fnm>
					</au>
					<au>
						<snm>Savolainen</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Chase</snm>
						<fnm>MW</fnm>
					</au>
					<au>
						<snm>Powell</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Alice</snm>
						<fnm>LA</fnm>
					</au>
					<au>
						<snm>Evans</snm>
						<fnm>R</fnm>
					</au>
					<etal/>
				</aug>
				<source>Am J Bot</source>
				<pubdate>2003</pubdate>
				<volume>90</volume>
				<fpage>1758</fpage>
				<lpage>1776</lpage>
			</bibl>
			<bibl id="B43">
				<title>
					<p>Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes</p>
				</title>
				<aug>
					<au>
						<snm>Qiu</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Dombrovska</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Whitlock</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bernasconi-Quadroni</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Rest</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Borsch</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hilu</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>Int J Pl Sci</source>
				<pubdate>2005</pubdate>
				<volume>166</volume>
				<fpage>815</fpage>
				<lpage>842</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1086/431800</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>The root of angiosperm phylogeny inferred from duplicate phytochrome genes</p>
				</title>
				<aug>
					<au>
						<snm>Mathews</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Donoghue</snm>
						<fnm>MJ</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1999</pubdate>
				<volume>286</volume>
				<fpage>947</fpage>
				<lpage>950</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.286.5441.947</pubid>
						<pubid idtype="pmpid" link="fulltext">10542147</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>Phylogenetic relationships of the tribe Millettieae and allies &#8211; the current status</p>
				</title>
				<aug>
					<au>
						<snm>Hu</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Advances in legume systematics</source>
				<publisher>Kew, UK: Royal Botanic Gardens</publisher>
				<editor>Herendeen PS, Bruneau A</editor>
				<pubdate>2000</pubdate>
				<volume>9</volume>
				<fpage>299</fpage>
				<lpage>310</lpage>
			</bibl>
			<bibl id="B46">
				<title>
					<p>RbcL and legume phylogeny, with particular reference to Phaseoleae, Millettieae, and allies</p>
				</title>
				<aug>
					<au>
						<snm>Kajita</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ohashi</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Tateishi</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Bailey</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Doyle</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Syst Bot</source>
				<pubdate>2001</pubdate>
				<volume>26</volume>
				<fpage>515</fpage>
				<lpage>536</lpage>
			</bibl>
			<bibl id="B47">
				<title>
					<p>A phylogeny of legumes (Leguminosae) based on analyses of the plastid <it>matK </it>gene resolves many well-supported subclades within the family</p>
				</title>
				<aug>
					<au>
						<snm>Wojciechowski</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lavin</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sanderson</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Am J Bot</source>
				<pubdate>2004</pubdate>
				<volume>91</volume>
				<fpage>1846</fpage>
				<lpage>1862</lpage>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Mining EST databases to resolve evolutionary events in major crop species</p>
				</title>
				<aug>
					<au>
						<snm>Schlueter</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Dixon</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Granger</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Grant</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Clark</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Doyle</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Shoemaker</snm>
						<fnm>RC</fnm>
					</au>
				</aug>
				<source>Genome</source>
				<pubdate>2004</pubdate>
				<volume>47</volume>
				<fpage>868</fpage>
				<lpage>876</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1139/g04-047</pubid>
						<pubid idtype="pmpid" link="fulltext">15499401</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Signal, noise, and reliability in molecular phylogenetic analyses</p>
				</title>
				<aug>
					<au>
						<snm>Hillis</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Huelsenbeck</snm>
						<fnm>JP</fnm>
					</au>
				</aug>
				<source>J Hered</source>
				<pubdate>1992</pubdate>
				<volume>83</volume>
				<fpage>189</fpage>
				<lpage>195</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">1624764</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>The challenge of constructing large phylogenetic trees</p>
				</title>
				<aug>
					<au>
						<snm>Sanderson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Driskell</snm>
						<fnm>AC</fnm>
					</au>
				</aug>
				<source>Trends Plant Sci</source>
				<pubdate>2003</pubdate>
				<volume>8</volume>
				<fpage>374</fpage>
				<lpage>379</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1360-1385(03)00165-1</pubid>
						<pubid idtype="pmpid" link="fulltext">12927970</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B51">
				<title>
					<p>The analysis of 100 genes supports the grouping of three highly divergent amoebae: <it>Dictyostelium</it>, <it>Entamoeba</it>, and <it>Mastigamoeba</it></p>
				</title>
				<aug>
					<au>
						<snm>Bapteste</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Brinkmann</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Moore</snm>
						<fnm>DV</fnm>
					</au>
					<au>
						<snm>Sensen</snm>
						<fnm>CW</fnm>
					</au>
					<au>
						<snm>Gordon</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Durufle</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Gaasterland</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Lopez</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Muller</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>1414</fpage>
				<lpage>1419</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">122205</pubid>
						<pubid idtype="pmpid" link="fulltext">11830664</pubid>
						<pubid idtype="doi">10.1073/pnas.032662799</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B52">
				<title>
					<p>A sequence-based genetic map of Medicago truncatula and comparison of marker colinearity with M. sativa</p>
				</title>
				<aug>
					<au>
						<snm>Choi</snm>
						<fnm>HK</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Uhm</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Limpens</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Lim</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Mun</snm>
						<fnm>JH</fnm>
					</au>
					<au>
						<snm>Kalo</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Penmetsa</snm>
						<fnm>RV</fnm>
					</au>
					<au>
						<snm>Seres</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Kulikova</snm>
						<fnm>O</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genetics</source>
				<pubdate>2004</pubdate>
				<volume>166</volume>
				<fpage>1463</fpage>
				<lpage>1502</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1470769</pubid>
						<pubid idtype="pmpid" link="fulltext">15082563</pubid>
						<pubid idtype="doi">10.1534/genetics.166.3.1463</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment</p>
				</title>
				<aug>
					<au>
						<snm>Subramian</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Weyer-Menkhoff</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kaufmann</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Morgenstern</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>66</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1087830</pubid>
						<pubid idtype="pmpid" link="fulltext">15784139</pubid>
						<pubid idtype="doi">10.1186/1471-2105-6-66</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B54">
				<title>
					<p>Bayesian gene/species tree reconciliation and orthology analysis using MCMC</p>
				</title>
				<aug>
					<au>
						<snm>Arvestad</snm>
						<fnm>l</fnm>
					</au>
					<au>
						<snm>Berglund</snm>
						<fnm>A-C</fnm>
					</au>
					<au>
						<snm>Lagergren</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Sennblad</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<issue>suppl 1</issue>
				<fpage>i7</fpage>
				<lpage>i15</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg1000</pubid>
						<pubid idtype="pmpid" link="fulltext">12855432</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B55">
				<title>
					<p>Gene conversion and the evolution of euryarchaeal chaperonins: A maximum likelihood-based method for detecting conflicting phylogenetic signals</p>
				</title>
				<aug>
					<au>
						<snm>Archibald</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Roger</snm>
						<fnm>AJ</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>2002</pubdate>
				<volume>55</volume>
				<issue>2</issue>
				<fpage>232</fpage>
				<lpage>245</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s00239-002-2321-5</pubid>
						<pubid idtype="pmpid" link="fulltext">12107599</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>A simple and robust statistical test for detecting the presence of recombination</p>
				</title>
				<aug>
					<au>
						<snm>Bruen</snm>
						<fnm>TC</fnm>
					</au>
					<au>
						<snm>Philippe</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Bryant</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2006</pubdate>
				<volume>172</volume>
				<issue>4</issue>
				<fpage>2665</fpage>
				<lpage>2681</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1456386</pubid>
						<pubid idtype="pmpid" link="fulltext">16489234</pubid>
						<pubid idtype="doi">10.1534/genetics.105.048975</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B57">
				<title>
					<p>Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination</p>
				</title>
				<aug>
					<au>
						<snm>Gusfield</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>J Computer System Sci</source>
				<pubdate>2005</pubdate>
				<volume>70</volume>
				<issue>3</issue>
				<fpage>381</fpage>
				<lpage>398</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1016/j.jcss.2004.12.009</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B58">
				<title>
					<p>Phylogenetics of New World <it>Astragalus</it>: Screening of novel nuclear loci for the reconstruction of phylogenies at low taxonomic levels</p>
				</title>
				<aug>
					<au>
						<snm>Scherson</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Choi</snm>
						<fnm>HK</fnm>
					</au>
					<au>
						<snm>Cook</snm>
						<fnm>DR</fnm>
					</au>
					<au>
						<snm>Sanderson</snm>
						<fnm>MJ</fnm>
					</au>
				</aug>
				<source>Brittonia</source>
				<pubdate>2005</pubdate>
				<volume>57</volume>
				<fpage>354</fpage>
				<lpage>366</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1663/0007-196X(2005)057[0354:PONWAS]2.0.CO;2</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B59">
				<title>
					<p>Uninode coding vs gene tree parsimony for phylogenetic reconstruction using duplicate genes</p>
				</title>
				<aug>
					<au>
						<snm>Simmons</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Freudenstein</snm>
						<fnm>JV</fnm>
					</au>
				</aug>
				<source>Mol Phylogenet Evol</source>
				<pubdate>2002</pubdate>
				<volume>23</volume>
				<issue>3</issue>
				<fpage>481</fpage>
				<lpage>498</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1055-7903(02)00033-7</pubid>
						<pubid idtype="pmpid" link="fulltext">12099800</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>The evolution of supertrees</p>
				</title>
				<aug>
					<au>
						<snm>Bininda-Emonds</snm>
						<fnm>ORP</fnm>
					</au>
				</aug>
				<source>Trends Ecol Evol</source>
				<pubdate>2004</pubdate>
				<volume>19</volume>
				<fpage>315</fpage>
				<lpage>322</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.tree.2004.03.015</pubid>
						<pubid idtype="pmpid" link="fulltext">16701277</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B61">
				<title>
					<p>Gene tree reconciliation with soft multifurcations</p>
				</title>
				<aug>
					<au>
						<snm>Chang</snm>
						<fnm>W-C</fnm>
					</au>
				</aug>
				<source>Masters Thesis</source>
				<publisher>Ames, IA: Iowa State University</publisher>
				<pubdate>2005</pubdate>
			</bibl>
			<bibl id="B62">
				<title>
					<p>Comparative genomics provides evidence for an ancient genome duplication event in fish</p>
				</title>
				<aug>
					<au>
						<snm>Taylor</snm>
						<fnm>JS</fnm>
					</au>
					<au>
						<snm>Van de Peer</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Braasch</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Meyer</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Philos Trans R Soc Lond B Biol Sci</source>
				<pubdate>2001</pubdate>
				<volume>356</volume>
				<fpage>1661</fpage>
				<lpage>1679</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1098/rstb.2001.0975</pubid>
						<pubid idtype="pmpid" link="fulltext">11604130</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B63">
				<title>
					<p>Vertebrate phylogenomics: reconciled trees and gene duplications</p>
				</title>
				<aug>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
					<au>
						<snm>Cotton</snm>
						<fnm>JA</fnm>
					</au>
				</aug>
				<source>Pacific Symposium on Biocomputing</source>
				<pubdate>2002</pubdate>
				<volume>2002</volume>
				<fpage>536</fpage>
				<lpage>547</lpage>
			</bibl>
			<bibl id="B64">
				<title>
					<p>Rates and patterns of gene duplication and loss in the human genome</p>
				</title>
				<aug>
					<au>
						<snm>Cotton</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Page</snm>
						<fnm>RDM</fnm>
					</au>
				</aug>
				<source>Proc Biol Sci</source>
				<pubdate>2005</pubdate>
				<volume>272</volume>
				<issue>1560</issue>
				<fpage>277</fpage>
				<lpage>283</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1098/rspb.2004.2969</pubid>
						<pubid idtype="pmpid" link="fulltext">15705552</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B65">
				<title>
					<p>Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution</p>
				</title>
				<aug>
					<au>
						<snm>Gu</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gu</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>2002</pubdate>
				<volume>31</volume>
				<fpage>205</fpage>
				<lpage>209</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng902</pubid>
						<pubid idtype="pmpid" link="fulltext">12032571</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B66">
				<title>
					<p>Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families</p>
				</title>
				<aug>
					<au>
						<snm>Pfeil</snm>
						<fnm>BE</fnm>
					</au>
					<au>
						<snm>DSchlueter</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Shoemaker</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Doyle</snm>
						<fnm>JJ</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>2005</pubdate>
				<volume>54</volume>
				<fpage>441</fpage>
				<lpage>454</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1080/10635150590945359</pubid>
						<pubid idtype="pmpid" link="fulltext">16012110</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B67">
				<title>
					<p>The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species</p>
				</title>
				<aug>
					<au>
						<snm>Quackenbush</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Cho</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Liang</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Holt</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Karamycheva</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Parvizi</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Pertea</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Sultana</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>159</fpage>
				<lpage>164</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29813</pubid>
						<pubid idtype="pmpid" link="fulltext">11125077</pubid>
						<pubid idtype="doi">10.1093/nar/29.1.159</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B68">
				<title>
					<p>TIGR Gene Indices Database</p>
				</title>
				<url>http://www.tigr.org/tdb/tgi/index.shtml</url>
			</bibl>
			<bibl id="B69">
				<title>
					<p>EMBOSS: the European molecular biology open software suite</p>
				</title>
				<aug>
					<au>
						<snm>Rice</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Longden</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Bleasby</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2000</pubdate>
				<volume>16</volume>
				<fpage>276</fpage>
				<lpage>277</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid>
						<pubid idtype="pmpid" link="fulltext">10827456</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B70">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>WQ</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B71">
				<title>
					<p>Sanderson Lab Web Site</p>
				</title>
				<url>http://ginger.ucdavis.edu</url>
			</bibl>
			<bibl id="B72">
				<title>
					<p>Quality assessment of multiple alignment programs</p>
				</title>
				<aug>
					<au>
						<snm>Lassmann</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>ELL</fnm>
					</au>
				</aug>
				<source>FEBS Letters</source>
				<pubdate>2002</pubdate>
				<volume>529</volume>
				<fpage>126</fpage>
				<lpage>130</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(02)03189-7</pubid>
						<pubid idtype="pmpid" link="fulltext">12354624</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B73">
				<title>
					<p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p>
				</title>
				<aug>
					<au>
						<snm>Thompson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Gibson</snm>
						<fnm>TJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1994</pubdate>
				<volume>22</volume>
				<fpage>4673</fpage>
				<lpage>4680</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">308517</pubid>
						<pubid idtype="pmpid" link="fulltext">7984417</pubid>
						<pubid idtype="doi">10.1093/nar/22.22.4673</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B74">
				<title>
					<p>PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods)</p>
				</title>
				<aug>
					<au>
						<snm>Swofford</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<publisher>Sunderland, MA: Sinauer</publisher>
				<edition>4.0</edition>
				<pubdate>2002</pubdate>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12504223</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B75">
				<title>
					<p>Inferring phylogeny despite incomplete lineage sorting</p>
				</title>
				<aug>
					<au>
						<snm>Maddison</snm>
						<fnm>WP</fnm>
					</au>
					<au>
						<snm>Knowles</snm>
						<fnm>LL</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>2006</pubdate>
				<volume>55</volume>
				<fpage>21</fpage>
				<lpage>30</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1080/10635150500354928</pubid>
						<pubid idtype="pmpid" link="fulltext">16507521</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B76">
				<title>
					<p>Confidence limits on phylogenies: An approach using the bootstrap</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Evolution</source>
				<pubdate>1985</pubdate>
				<volume>39</volume>
				<fpage>783</fpage>
				<lpage>791</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2408678</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B77">
				<title>
					<p>Inferring Phylogenies</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<publisher>Sunderland, MA: Sinauer Press</publisher>
				<pubdate>2004</pubdate>
			</bibl>
			<bibl id="B78">
				<title>
					<p>An approximately unbiased test of phylogenetic tree selection</p>
				</title>
				<aug>
					<au>
						<snm>Shimodaira</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>2002</pubdate>
				<volume>51</volume>
				<fpage>492</fpage>
				<lpage>508</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1080/10635150290069913</pubid>
						<pubid idtype="pmpid" link="fulltext">12079646</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B79">
				<title>
					<p>PrIMETV: a PrIME Tree Viewer</p>
				</title>
				<aug>
					<au>
						<snm>Arvestad</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<url>http://prime.sbc.su.se/primetv/</url>
			</bibl>
		</refgrp>
	</bm>
</art>
