<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2148-8-200</ui>
	<ji>1471-2148</ji>
	<fm>
		<dochead>Research article</dochead>
		<bibl>
			<title>
				<p>Protein evolution of ANTP and PRD homeobox genes</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Fonseca</snm>
					<mi>A</mi>
					<fnm>Nuno</fnm>
					<insr iid="I1"/>
					<email>nf@ibmc.up.pt</email>
				</au>
				<au id="A2">
					<snm>Vieira</snm>
					<mi>P</mi>
					<fnm>Cristina</fnm>
					<insr iid="I1"/>
					<email>cgvieira@ibmc.up.pt</email>
				</au>
				<au id="A3">
					<snm>Holland</snm>
					<mi>WH</mi>
					<fnm>Peter</fnm>
					<insr iid="I2"/>
					<email>peter.holland@zoo.ox.ac.uk</email>
				</au>
				<au id="A4" ca="yes">
					<snm>Vieira</snm>
					<fnm>Jorge</fnm>
					<insr iid="I1"/>
					<email>jbvieira@ibmc.up.pt</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Instituto de Biologia Molecular e Celular (IBMC); University of Porto, Rua do Campo Alegre 823, 4150-180 Porto, Portugal</p>
				</ins>
				<ins id="I2">
					<p>Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, UK</p>
				</ins>
			</insg>
			<source>BMC Evolutionary Biology</source>
			<issn>1471-2148</issn>
			<pubdate>2008</pubdate>
			<volume>8</volume>
			<issue>1</issue>
			<fpage>200</fpage>
			<url>http://www.biomedcentral.com/1471-2148/8/200</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">18620554</pubid><pubid idtype="doi">10.1186/1471-2148-8-200</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>14</day>
					<month>1</month>
					<year>2008</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>11</day>
					<month>7</month>
					<year>2008</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>11</day>
					<month>7</month>
					<year>2008</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2008</year>
			<collab>Fonseca et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Although homeobox genes have been the subject of many studies, little is known about the main amino acid changes that occurred early in the evolution of genes belonging to different classes.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>In this study, we report a method for the fast and efficient retrieval of sequences belonging to the ANTP (HOXL and NKL) and PRD classes. Furthermore, we look for diagnostic amino acid residues that can be used to distinguish HOXL, NKL and PRD genes.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>The reported protein features will facilitate the robust classification of homeobox genes from newly sequenced bilaterian genomes. Nevertheless, in non-bilaterian genomes our findings must be cautiously applied. In principle, as long as a good manually curated data set is available the approach here described can be applied to non-bilaterian organisms as well. Our results help focus experimental studies onto investigating the biochemical functions of key homeodomain residues in different gene classes.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Genes that belong to the homeobox family are characterized by the ability to code for a protein that contains a recognizable, although very variable, 'homeodomain', usually 60 amino acids in length <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Many of these genes are transcription factors that play important roles in the embryonic development of bilaterian and non-bilaterian animals. Changes in homeobox gene content and deployment during evolution may have contributed to the evolution of body plan differences in animals <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. Therefore, comparison of homeobox gene sets from different animals may shed light on the evolutionary events that gave rise to animal body plan diversity. Nevertheless, gene orthology is not always easy to establish when comparing divergent animals <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. For this purpose, phylogenetic analysis, conservation of synteny, paralogy within the human genome, insertions within the homeodomain, key amino acid residues, and several motifs outside of the homeodomain can all be used. These features can also be used to classify homeobox genes into classes, subclasses and families. In the latest revision, Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> classified all 235 human homeobox genes into 11 classes (ANTP, PRD, LIM, POU, HNF, SINE, TALE, CUT, PROS, ZF and CERS) and 102 gene families. The ANTP class is further divided into HOXL and NKL subclasses. It should be noted that the only protein region that can be aligned in all 102 gene families or 235 genes is the homeodomain <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
			<p>Here, we report amino acid patterns typical of bilaterian HOXL, NKL, and PRD genes that can be used to quickly and efficiently retrieve amino acid sequences belonging to these classes and subclasses, among hundreds of other homeodomain sequences. Retrieving a given class or subclass of sequences from many animal genomes may be thus an easier task than previously thought. However, we show that these typical amino acid patterns should be cautiously used when sequences come from non-bilaterian animals.</p>
			<p>Since phylogenetic analysis was one of the primary sources of evidence used to establish the different classes and sub-classes, it is likely that most of homeobox gene classes and subclasses represent monophyletic lineages. Hence, it is conceivable that amino acid changes important for protein function in a given lineage may be revealed as fixed differences between classes and subclasses. Previous attempts have been made to classify genes within HOX families <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp> but not at the level of whole classes or subclasses. Here, we show that, because of their chemical properties, amino acid usage is different in ANTP and PRD classes at five positions. Furthermore, at nine positions, amino acid usage is different between HOXL and NKL subclasses. Our findings support the notion that many chemically important changes happened early in the evolution of homeobox genes, and that these changes can be used as additional evidence to establish gene orthology. These results can be helpful for experimental studies aimed onto investigating the biochemical functions of key homeodomain residues in different gene classes as well.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Data</p>
				</st>
				<p>We have used the hand-curated data set of Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, the PROSITE homeodomain data set <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and the NCBI database <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Identification of amino acid patterns typical of HOXL, NKL and PRD genes</p>
				</st>
				<p>In order to find amino acid patterns that distinguish genes from the ANTP subclasses, as well as genes from the PRD class, from genes belonging to other classes, a fast word discovery program (<abbrgrp><abbr bid="B12">12</abbr></abbrgrp>; available at <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>) was first used to find statistically interesting words. The minimum word length used was two (-m 2) and the minimum number of sequences where the word should occur (-e) was set to 10% of the number of sequences in the data set being considered. The words found were then filtered to discard those words that occurred more than three times in more than one dataset. The software Bioredx (<abbrgrp><abbr bid="B14">14</abbr></abbrgrp>; available at <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>) was then used to find the largest and more general amino acid patterns that distinguish two sets of sequences. This program accepts as input two sequence files and a word seed. Based on the seed given, tries to find patterns that occur in one file (referred as the positive file) and not in the other (referred as the negative file). This software was used to find patterns that occurred less than four times in the negative file and considered patterns up to 20 amino acid residues larger than the initial word seed.</p>
				<p>In the reported amino acid patterns, amino acids listed within brackets are those allowed at a given position. For compactness of representation it is also possible to negate the class. The negation is denoted by "^". In this case, the amino acid residues listed are the ones that are not allowed at that particular position. The approach here used to find amino acid patterns does not use a set of aligned sequences. Therefore, in order to make sure that the derived amino acid patterns are found in the same region of the homeodomain sequence, all sequences belonging to a given class and sub-classes were aligned using the PAM 250 scoring matrix <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Identification of key amino acid residues</p>
				</st>
				<p>There are many schemes for comparing and grouping amino acids. Nevertheless, none of them can possibly capture the vast number of contexts in which amino acids are found within proteins. The scheme here used is that of Livingstone and Barton <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, based on the amino acid properties size, polarity, hydrophobicity, charge, aliphaticity and aromaticity. For each position and property we calculated the probability of gaining or losing a constraint using the distribution of the amino acid property's observed at that position in all sequences considered. Only changes in the properties values with a probability of occurrence lower than 5% were considered.</p>
			</sec>
			<sec>
				<st>
					<p>Phylogenetic analyses</p>
				</st>
				<p>A Neighbour-joining tree, using pair-wise deletion, as implemented in the MEGA software <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> was constructed in order to classify the set of 918 homeodomain sequences from a variety of animal species (available at PROSITE <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>), into the 12 classes and two sub-classes scheme proposed by Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>In order to establish a method for fast retrieval of sequences belonging to a given class or subclass, amino acid patterns characteristic of different homeobox classes and subclasses were sought. As a starting point we used the hand-curated human PRD, NKL and HOXL data sets compiled by Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The homeodomains of human EN1, EN2, DLX1, DLX2, DLX3, DLX4, DLX5, DLX6, NOTO, and HOPX are difficult to classify and were not used in these initial analyses. Therefore, the PRD, NKL and HOXL data sets contain 49, 39 and 52 sequences, respectively. Within each of the three data sets, the amino acid sequences show considerable variation, suggesting that the human sequences include the majority of amino acid variability allowed at a given position along the sequence. This assumption was later tested (see section 2).</p>
			<sec>
				<st>
					<p>1) Amino acid patterns characteristic of the different human gene classes and sub-classes</p>
				</st>
				<p>Many characteristic amino acid patterns were detected using the approach described in Material and Methods. Here, we describe the seven patterns that are most pertinent, taking into account their coverage across a group of genes and absence in the other groups.</p>
				<p>The pattern [KT] [IV]WFQNRR [AMV]K [DEHKLMQWY] [KR] [KR] (positions 46&#8211;58; Fig. <figr fid="F1">1</figr>), named HOXL 1 pattern, is present in all human HOXL homeodomains, and in none of the human NKL and PRD genes.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Majority rule consensus sequence of HOXL, NKL and PRD genes</p>
					</caption>
					<text>
						<p><b>Majority rule consensus sequence of HOXL, NKL and PRD genes</b>. The relative location of the described amino acid patterns is shown (see text for details). Red &#8211; HOXL1 and HOXL2 patterns. Green &#8211; NKL pattern; Blue &#8211; PRD pattern; Boxed &#8211; ANTP1 and ANTP2 patterns; Grey shadow &#8211; ANTP-PRD pattern.</p>
					</text>
					<graphic file="1471-2148-8-200-1"/>
				</fig>
				<p>The pattern LE [AGKNR]E (positions 16&#8211;19; Fig. <figr fid="F1">1</figr>), named HOXL 2 pattern, is present in all human HOXL homeodomains except <it>Mnx1</it>. Only two human PRD genes (<it>Pax4 </it>and <it>Pax6</it>) encode this amino acid pattern. None of the human NKL genes encode this amino acid pattern.</p>
				<p>The pattern [AKST] [DENPS] [LAST] [Q] [V] (positions 41&#8211;45; Fig. <figr fid="F1">1</figr>), named NKL pattern, is present in 36 out of 39 human NKL sequences, in one human HOXL sequence (Mnx1), and in none of the human PRD sequences. The human NKL sequences that do not present this pattern are: HHEX, NANOG, and VENTX.</p>
				<p>The pattern L [EINQRV] [^DGHMPTVWY] [^CDGKMNPQR] [FL] [^CFILPTWY] [AEFHKQRV] [ADEGKNSTW] [CHKMPQR] [FHY]P (positions 16&#8211;26; Fig. <figr fid="F1">1</figr>), named PRD pattern, is present in all human PRD homeodomains and never present in human HOXL and NKL sequences.</p>
				<p>The patterns [HQ] [IV] [AKLT] (positions 44&#8211;46; Fig. <figr fid="F1">1</figr>), named ANTP 1 pattern, and [IKTV] [ITV]W [FY]QN [HQR]R [AMNTVY]K, named ANTP 2 pattern, (positions 46&#8211;55; Fig. <figr fid="F1">1</figr>) are present in all human HOXL and NKL genes and never present in human PRD homeodomains.</p>
				<p>The pattern W [FY] [KQESR] [NK] [HQRKY] [RW] (positions 48&#8211;53; Fig. <figr fid="F1">1</figr>), named ANTP-PRD pattern, is found in all human PRD, NKL and HOXL homeodomains.</p>
			</sec>
			<sec>
				<st>
					<p>2) Generality of the amino acid patterns found</p>
				</st>
				<p>In order to test the generality of the amino acid patterns derived in the previous section, we used the 356 homeodomain sequences classified by Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, which include the 140 human sequences used above. These sequences include genes classified as HOXL, NKL and PRD, plus other homeobox gene classes (LIM, POU, HNF, SINE, TALE, CUT, PROS, ZF, CERS). It should be noted, that in these analyses, the difficult to classify genes that were excluded above were now included, and these were classified as tentatively suggested by Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Using amino acid sequences, a neighbour-joining tree (using pair-wise deletion, as implemented in the MEGA software; <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>) was built with the 356 sequences, plus 918 homeodomain sequences from a variety of animal species. The latter 918 sequences were obtained from the file PS50071 available at PROSITE <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> after removing all non-animal sequences from the file. Sequences from clusters that are supported by a bootstrap value of 80% or higher, and that include at least one sequence that has been classified by Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, as belonging to a given class or subclass, were classified as belonging to that class or subclass. Using this phylogenetic argument 202 sequences could be classified as HOXL, 204 as NKL, 200 as PRD, 66 as LIM, 83 as POU, 15 as HNF, 23 as SINE, 66 as TALE, 20 as CUT, 3 as PROS, 94 as ZF, and 10 as CERS. Nevertheless, 288 sequences remained unclassified.</p>
				<p>Table <tblr tid="T1">1</tblr> shows that HOXL 1 pattern is only found in genes classified as HOXL and thus is highly specific. It is also highly representative, because 96% of the HOXL sequences (194/202) used could be classified as such using this pattern. Therefore, the requirement to use one of the amino acid combinations implied by this pattern is a derived feature that appeared in the HOXL lineage early in animal evolution. The HOXL sequences that do not show the expected pattern are: <it>Drosophila melanogaster </it>AbdB, <it>Drosophila melanogaster </it>btn, <it>Strigamia maritima </it>Hox3b, <it>Gallus gallus </it>HMD2 (PROSITE annotation), <it>Danio rerio </it>HXABA (PROSITE annotation), and <it>Salmo salar </it>HXB2 (PROSITE annotation). These sequences do not form a closely related subgroup of sequences. For instance, the <it>Drosophila melanogaster Abd-B </it>and <it>btn </it>genes belong to two different HOXL gene families. Moreover, other genes belonging to these families do show the HOXL 1 amino acid pattern. The <it>Fugu rubripes </it>HXDBB (PROSITE annotation) and the <it>Gallus gallus </it>HXB8 (PROSITE annotation) sequences are incomplete, and thus it is not possible to determine whether they show this amino acid pattern. In addition to the 194 known HOXL genes showing the HOXL 1 pattern, 178/288 'unclassified' sequences also had this pattern (Table <tblr tid="T1">1</tblr>). Since the HOXL 1 pattern is very specific to the HOXL subclass, it is very likely that the 178 unclassified sequences that show this pattern are HOXL sequences.</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Number of sequences from each homeobox gene class or subclass showing a given amino acid pattern.</p>
					</caption>
					<tblbdy cols="14">
						<r>
							<c ca="left">
								<p>Pattern</p>
							</c>
							<c ca="left">
								<p>HoxL (202)</p>
							</c>
							<c ca="left">
								<p>NKL (204)</p>
							</c>
							<c ca="left">
								<p>PRD (200)</p>
							</c>
							<c ca="left">
								<p>LIM (66)</p>
							</c>
							<c ca="left">
								<p>POU (83)</p>
							</c>
							<c ca="left">
								<p>HNF (15)</p>
							</c>
							<c ca="left">
								<p>SINE (23)</p>
							</c>
							<c ca="left">
								<p>TALE (66)</p>
							</c>
							<c ca="left">
								<p>CUT (20)</p>
							</c>
							<c ca="left">
								<p>PROS (3)</p>
							</c>
							<c ca="left">
								<p>ZF (94)</p>
							</c>
							<c ca="left">
								<p>CERS (10)</p>
							</c>
							<c ca="left">
								<p>Uncl. (288)</p>
							</c>
						</r>
						<r>
							<c cspan="14">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>HoxL 1</p>
							</c>
							<c ca="left">
								<p>194</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>178</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>HoxL 2</p>
							</c>
							<c ca="left">
								<p>199</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>16</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>38</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>190</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NKL</p>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c ca="left">
								<p>116</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>7</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>52</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PRD</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>172</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>14</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP 1</p>
							</c>
							<c ca="left">
								<p>197</p>
							</c>
							<c ca="left">
								<p>199</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>7</p>
							</c>
							<c ca="left">
								<p>12</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>11</p>
							</c>
							<c ca="left">
								<p>9</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>25</p>
							</c>
							<c ca="left">
								<p>2</p>
							</c>
							<c ca="left">
								<p>251</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP 2</p>
							</c>
							<c ca="left">
								<p>197</p>
							</c>
							<c ca="left">
								<p>139</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>242</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP-PRD</p>
							</c>
							<c ca="left">
								<p>200</p>
							</c>
							<c ca="left">
								<p>200</p>
							</c>
							<c ca="left">
								<p>190</p>
							</c>
							<c ca="left">
								<p>52</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>23</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>275</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Uncl. &#8211; unclassified sequences. The total number of sequences in each class is shown in parentheses.</p>
					</tblfn>
				</tbl>
				<p>The HOXL 2 pattern is not as specific as the HOXL 1 pattern, since it is observed in about 46% of the genes belonging to the POU class and in a few NKL (0.5%) and PRD (8%) genes. All 16 PRD genes showing the HOXL 2 pattern belong to the Pax4/6 gene family, thus it is likely that this is a case of convergent evolution. The human POU homeodomains showing HOXL 2 pattern are POU1F1, POU3F1, POU3F2, POU3F3, and POU3F4. The latter four genes are closely related, although the POU1F1 gene is distantly related to those genes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Nevertheless, this is also likely a case of convergent evolution, since the alternative hypothesis (that the need to use one of the amino acid combinations implied by the HOXL pattern 2 is an ancestral feature) implies many independent losses. Therefore we argue that the necessity of using the amino acid combinations implied by HOXL 2 pattern is a derived feature that appeared early in the HOXL lineage. Only three sequences (1.5%) classified as HOXL using a phylogenetic argument, do not show the HOXL 2 pattern, namely: <it>Drosophila melanogaster </it>exex (the <it>Drosophila Mnx1 </it>orthologue; <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>), human MNX1 and mouse Mnx1 (PROSITE annotation). <it>Mnx1 </it>genes are peculiar in other ways. For instance, they show both the NKL and HOXL 1 pattern (see below).</p>
				<p>The NKL pattern is highly specific but not widely representative of all NKL genes. The pattern is found in 57% of the sequences classified as belonging to the NKL class (116/204; Table <tblr tid="T1">1</tblr>), but in only ten other classified genes (three HOXL and seven SINE). In addition, 52 of the 288 phylogenetically 'unclassified' genes show the NKL pattern, and of these 36 have already been classified as belonging to the NKL class by PROSITE (data not shown). The three HOXL genes with the NKL pattern are the three <it>Mnx1 </it>genes (from <it>Drosophila melanogaster</it>, <it>Mus musculus</it>, and <it>Homo sapiens</it>). <it>Mnx1 </it>genes also show the HOXL 1 pattern that is highly specific for HOXL genes. Thus, it is unlikely that <it>Mnx1 </it>genes have been misclassified. Therefore, the presence of the NKL pattern in <it>Mnx1 </it>sequences may be the result of convergent evolution.</p>
				<p>The seven SINE sequences where the NKL pattern is found are SIX3 from <it>Oryzias latipes</it>, <it>Gallus gallus</it>, <it>Mus musculus </it>and <it>Homo sapiens </it>and SIX6 from <it>Gallus gallus</it>, <it>Mus musculus </it>and <it>Homo sapiens</it>. All <it>SIX </it>human genes cluster together with a high bootstrap value (97%; <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>). Nevertheless, the NKL pattern is only observed in the <it>SIX3 </it>and <it>SIX6 </it>genes, two closely related genes. It is thus, likely a case of convergent evolution. The requirement to use one of the amino acid combinations implied by the NKL pattern is a derived feature that appeared likely early in the NKL lineage.</p>
				<p>Most of the classified sequences where the NKL pattern is not found have been annotated by Holland et al. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and PROSITE as belonging to the NANOG, NOTO, VENTX, EN, DLX, and BARX, families. The EN, DLX and NOTO genes are difficult to classify. Thus, information on these genes was not used to derive the NKL amino acid pattern.</p>
				<p>The PRD pattern is highly specific, being found only in sequences classified as PRD, and is also highly representative being found in 86% of the sequences classified as PRD. The PRD genes that do not show this pattern are <it>D. melanogaster OdsH</it>, <it>otp</it>, <it>PHDP</it>, <it>Ptx1</it>, <it>IP09201</it>, HOPX genes from <it>Danio rerio</it>, <it>Homo sapiens</it>, <it>Bos taurus</it>, <it>Rattus norvegicus</it>, <it>Mus musculus</it>, <it>Sus scrofa</it>, and <it>Gallus gallus </it>(PROSITE annotation), human <it>PAX2</it>, <it>PAX5 </it>and <it>PAX8 </it>(which have partial homeobox sequences), <it>Hydra vulgaris Dmbx</it>, OTP genes from <it>Heliocidaris erythrogramma</it>, <it>Heliocidaris tuberculata</it>, <it>Lytechinus variegates</it>, <it>Paracentrotus lividus</it>, and <it>Saccoglossus kowalevskii </it>(PROSITE annotation), <it>OTX from Strongylocentrotus purpuratus </it>(PROSITE annotation), <it>ALX </it>from <it>Strongylocentrotus purpuratus </it>(PROSITE annotation), <it>ANF </it>genes (PROSITE annotation) from <it>Gallus gallus</it>, and <it>Xenopus laevis </it>(two genes), and the <it>PROP </it>gene from <it>Canis familiaris </it>(PROSITE annotation). The PRD sequences that do not show this pattern are not related in any particular way. Therefore, the absence of this pattern in these sequences is likely the result of several independent losses. Therefore, it seems likely that the requirement to use one of the amino acid combinations implied by the PRD pattern is a derived feature that appeared early in the evolution of PRD genes.</p>
				<p>Since the PRD-specific pattern is highly specific, it is likely that the 14 unclassified sequences that show this pattern are also PRD sequences. According to PROSITE classification these genes are <it>Gsc </it>from <it>Danio rerio</it>, <it>Xenopus laevis </it>(two genes), <it>Gallus gallus</it>, <it>Mus musculus</it>, <it>Saguinus labiatus</it>, <it>Gorilla gorilla</it>, <it>Pongo pygmaeus</it>, <it>Pan paniscus</it>, and <it>Pan troglodytes</it>, <it>ALX4 </it>from <it>Mus musculus </it>and <it>Bos taurus</it>, and <it>UNC-4 </it>and <it>ceh-36 </it>genes from <it>Caenorhabditis elegans </it>(PROSITE classification). According to PROSITE, 13 of these genes belong to the PRD class; the status of the <it>ceh-36 </it>gene is unknown.</p>
				<p>The ANTP 1 pattern is found in most HOXL (98%) and NKL (98%) gene sequences and is almost absent in PRD (0.5%) gene sequences. Nevertheless, such a pattern is also found in sequences from genes belonging to all other classes except the PROS class (Table <tblr tid="T1">1</tblr>). It should, however, be noted that the sample size of PROS genes is very small (only three sequences). The ANTP sequences that do not show this pattern are <it>Drosophila melanogaster eve</it>, <it>lbe</it>, and <it>lbl</it>, <it>Homo sapiens NOTO</it>, <it>Heterodontus francisci Evx2 </it>(PROSITE annotation), <it>Gallus gallus Hme1 </it>(PROSITE annotation), <it>Fugu rubripes hxdbb</it>, and <it>Caenorhabditis elegans vab7 and hm31 </it>(PROSITE annotation). The <it>HXB8 </it>gene from <it>Gallus gallus </it>(PROSITE annotation) is incomplete, and thus it is not possible to determine whether it shows this pattern. The HOPX sequence (PROSITE annotation) from <it>Danio rerio </it>is the only PRD sequence that shows ANTP 1 pattern.</p>
				<p>Although the ANTP 1 pattern is short (only three amino acid positions long), the broad distribution indicates that, very likely, it is not the result of convergent evolution. It is more likely that the ability to use the amino acid combinations implied by this pattern is an ancestral feature of homeobox containing genes. For some likely functional reason, ANTP genes have retained such a pattern, in contrast with PRD genes (the outgroup to ANTP genes) where only 0.5% of all sequences show this pattern.</p>
				<p>The ANTP 2 pattern is found in most HOXL (98%) but in only 68% of the NKL sequences. This pattern is not found in sequences from other classes. It is thus, highly specific, although less representative across the NKL subclass of ANTP class genes. The HOXL genes that do not show this pattern are (PROSITE annotations) <it>Hmd2 </it>from <it>Gallus gallus</it>, <it>hxaba </it>from <it>Danio rerio</it>, <it>hxb2 </it>from <it>Salmo salar </it>and <it>hxbb </it>from <it>Fugu rubripes</it>. These sequences are not related in any particular way. It is not possible to determine whether the <it>Hxb8 </it>gene from <it>Gallus gallus </it>(PROSITE annotation) shows this pattern, since this is a partial sequence. NKL sequences not showing the ANTP 2 pattern belong to the Dlx (distalless), En (engrailed) and Noto gene families that have been previously found to be difficult to classify. Hence, information on these genes was not used to derive the ANTP 2 pattern. Once again, genes from these families stand out as oddities (it should be noted that they also do not show the NKL pattern as well; see above).</p>
				<p>The ANTP-PRD pattern identifies most HOXL (99%), NKL (98%) and PRD (95%) sequences used. Nevertheless, it is also present in all SINE sequences and in 79% of the LIM sequences. The 16 ANTP plus PRD sequences that do not show the expected pattern are the <it>HOPX </it>genes from <it>Homo sapiens</it>, <it>Mus musculus</it>, <it>Rattus norvegicus</it>, <it>Sus scrofa</it>, <it>Bos taurus</it>, <it>Gallus gallus</it>, and <it>Danio rerio</it>, the <it>HM31 </it>gene from <it>Caenorhabditis elegans</it>, and the <it>Artemia sanfranciscana HMEN </it>gene. It should be noted that <it>HOPX </it>are difficult to classify, that in the latest revision <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> were classified as PRD genes, and these were not used to derive the amino acid patterns being tested. The <it>PAX2</it>, <it>PAX5 </it>and <it>PAX8 </it>sequences from <it>Homo sapiens</it>, as well as the <it>HXB8 </it>gene from <it>Gallus gallus</it>, the <it>HXDBB </it>gene from <it>Fugu rubripes</it>, and the <it>DLX2 </it>and <it>DLX4 </it>genes from <it>Eleutherodactylus coqui</it>, are partial, thus it is not possible to determine whether they show the ANTP-PRD amino acid pattern.</p>
			</sec>
			<sec>
				<st>
					<p>3) Further characterization of NKL genes</p>
				</st>
				<p>Given the failure to identify about 43% of the classified NKL sequences using the NKL pattern, we performed additional analyses to see whether the NKL pattern could be refined to accommodate the members of the NANOG, NOTO, VENTX, EN, DLX, and BARX, families. The results are shown in Table <tblr tid="T2">2</tblr>. When the NKL and refined patterns are considered, 98% of all NKL sequences could be classified as such. It should be noted that these refined NKL patterns are almost completely absent in the HOXL, PRD, LIM, POU, HNF, SINE, TALE, CUT, PROS, ZF and CERS classes. The 11 HOXL sequences that contain the EN pattern but not the NKL pattern all belong to the Hox1 gene family. This happens because the EN pattern allows for an asparagine at position 41. Five out of the six genes that show the NANOG pattern but not the NKL pattern belong to the HOXL <it>Gsx </it>gene family. These sequences are found because the pattern allows for a Lysine at position 43. The 8 HOXL sequences that show a hit when using the VENTX pattern but not the NKL pattern all belong to the <it>Gbx </it>family. This happens because a Valine is now allowed at position 43. The three CUT genes that show the EN and VENTX patterns are the <it>Homo sapiens ONECUT1 </it>gene and the orthologues in <it>Mus musculus </it>and <it>Rattus norvegicus</it>. These sequences show a hit because both patterns allow for an Isoleucine at position 45.</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Number of sequences showing only the refined NKL amino acid patterns.</p>
					</caption>
					<tblbdy cols="14">
						<r>
							<c ca="left">
								<p>Pattern</p>
							</c>
							<c ca="left">
								<p>HoxL (202)</p>
							</c>
							<c ca="left">
								<p>NKL (88)</p>
							</c>
							<c ca="left">
								<p>PRD (200)</p>
							</c>
							<c ca="left">
								<p>LIM (66)</p>
							</c>
							<c ca="left">
								<p>POU (83)</p>
							</c>
							<c ca="left">
								<p>HNF (15)</p>
							</c>
							<c ca="left">
								<p>SINE (23)</p>
							</c>
							<c ca="left">
								<p>TALE (66)</p>
							</c>
							<c ca="left">
								<p>CUT (20)</p>
							</c>
							<c ca="left">
								<p>PROS (3)</p>
							</c>
							<c ca="left">
								<p>ZF (94)</p>
							</c>
							<c ca="left">
								<p>CERS (10)</p>
							</c>
							<c ca="left">
								<p>Uncl. (288)</p>
							</c>
						</r>
						<r>
							<c cspan="14">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>DLX-BARX</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>40</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>EN</p>
							</c>
							<c ca="left">
								<p>11</p>
							</c>
							<c ca="left">
								<p>28</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>21</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NOTO</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NANOG</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>8</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>VENTX</p>
							</c>
							<c ca="left">
								<p>8</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>0</p>
							</c>
							<c ca="left">
								<p>6</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Uncl. &#8211; unclassified sequences. The total number of sequences in each class is shown in parentheses.</p>
						<p>DLX-BARX &#8211; [AKST] [QDENPS] [LAST] [Q] [V]; EN &#8211; [AKSTN] [DENPS] [LAST] [Q] [VI];</p>
						<p>NOTO &#8211; [AKST] [DENPS] [LASTN] [Q] [V]; NANOG &#8211; [AKST] [DENPSY] [LASTK] [Q] [V];</p>
						<p>VENTX &#8211; [AKST] [DENPS] [LASTV] [Q] [VI]</p>
					</tblfn>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>4) Non-bilaterian homeobox sequences</p>
				</st>
				<p>Only a few non-bilaterian homeobox sequences (those listed in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>) are contained in the data sets used. Therefore, we collected from the NCBI database a set of 251 non-redundant non-bilaterian homeobox sequences that encompass the regions where the amino acid patterns here reported are located, and that showed the ANTP-PRD pattern derived above (see Appendix). Most bilaterian ANTP and PRD sequences show this ANTP-PRD pattern (Table <tblr tid="T1">1</tblr>); furthermore, this pattern is only found in ANTP, PRD, LIM and SINE sequences (Table <tblr tid="T1">1</tblr>). Therefore, by imposing the presence of the ANTP-PRD pattern we hoped to enrich the data set for non-bilaterian ANTP and PRD sequences. Table <tblr tid="T3">3</tblr> summarizes the results. As expected, of the retrieved gene sequences only one seems to belong to gene classes other than ANTP, PRD, LIM and SINE.</p>
				<tbl id="T3">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Amino acid pattern presence in non-bilaterian sequences.</p>
					</caption>
					<tblbdy cols="10">
						<r>
							<c ca="left">
								<p>Patterns other than ANTP-PRD</p>
							</c>
							<c ca="left">
								<p>Expected pattern for</p>
							</c>
							<c ca="left">
								<p>HoxL</p>
							</c>
							<c ca="left">
								<p>NKL</p>
							</c>
							<c ca="left">
								<p>PRD</p>
							</c>
							<c ca="left">
								<p>Demox</p>
							</c>
							<c ca="left">
								<p>LIM</p>
							</c>
							<c ca="left">
								<p>SINE</p>
							</c>
							<c ca="left">
								<p>TALE</p>
							</c>
							<c ca="left">
								<p>Uncl.</p>
							</c>
						</r>
						<r>
							<c cspan="10">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2;</p>
								<p>HOXL1; HOXL2</p>
							</c>
							<c ca="left">
								<p>
									<ul>HOXL</ul>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>13</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2;</p>
								<p>HOXL1; NKL</p>
							</c>
							<c ca="left">
								<p>
									<ul>Mnx1</ul>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2;</p>
								<p>NKL</p>
							</c>
							<c ca="left">
								<p>NKL</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>22</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>12</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; NKL</p>
							</c>
							<c ca="left">
								<p>NKL*</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>4</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>7</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2;</p>
								<p>EN</p>
							</c>
							<c ca="left">
								<p>EN (NKL)</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2;</p>
								<p>NOTO</p>
							</c>
							<c ca="left">
								<p>NOTO (NKL)</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>4</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2;</p>
								<p>VENTX</p>
							</c>
							<c ca="left">
								<p>VENTX (NKL)</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>ANTP1; ANTP2</p>
							</c>
							<c ca="left">
								<p>Demox</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>3</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>14</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PRD</p>
							</c>
							<c ca="left">
								<p>PRD</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>24</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Other pattern combinations</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>20</p>
							</c>
							<c ca="left">
								<p>33</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>5</p>
							</c>
							<c ca="left">
								<p>14</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
							<c ca="left">
								<p>59</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>Only those non-bilaterian sequences that encompass the regions where the amino acid patterns here described are located, and that showed the ANTP-PRD pattern were used. In total 251 non-redundant sequences were used (gi numbers are listed in the appendix). Cases where non-bilaterian genes are clearly misidentified when using amino acid patterns are shown underlined (see text for details). Uncl. &#8211; unclassified sequences.</p>
						<p>* this signature can also be considered characteristic of NKL genes since only 68% of bilaterian NKL sequences show the ANTP2 pattern</p>
					</tblfn>
				</tbl>
				<p>Most non-bilaterian genes showing the full NKL or PRD signature have already been classified as such. Furthermore, the ANTP Demox class found in Demospongiae and apparently absent in all other animals <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, is characterized by the presence of the ANTP-PRD, and ANTP 1 and 2 patterns, as expected for ANTP genes that do not belong to the HOXL or NKL lineage (Table <tblr tid="T3">3</tblr>). Although there are no true Hox genes in sponges <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, 13 sequences show the full HOXL signature. Interestingly, all such sequences are non-annotated, thus they must be hard to classify gene sequences. One <it>Ephydatia fluviatilis </it>gene has been labeled as the <it>Msx </it>(NKL) gene. Our pattern analyses show that the corresponding sequence shows a pattern exclusively found in the <it>Mnx1 </it>(HOXL) gene family, thus, also suggesting that non-bilaterian sequences with full signatures may be wrongly identified. About 53% of all non-bilaterian sequences used did not show an easily recognizable signature. We conclude that the reported amino acid patterns must be cautiously used when applied to sequences from non-bilaterian animals.</p>
			</sec>
			<sec>
				<st>
					<p>5) Identification of single key amino acid changes in bilaterian PRD, NKL and HOXL genes</p>
				</st>
				<p>Key amino acid changes that occurred during the evolution of homeobox containing genes may be revealed as changes affecting a single amino acid position. Thus, we compared, at each amino acid position, the large HOXL (202 sequences), NKL (204 sequences) and PRD (200 sequences) data sets described above, for the following chemical properties: size, charge, polarity, hydrophobicity, aromaticity, and aliphapaty. For completeness, the 178, 52 and 14 sequences that could not be classified using a phylogenetic argument but that show amino acid patterns typical of HOXL, NKL and PRD genes were also used.</p>
				<p>Chemical properties observed in more than 99% of the sequences belonging to one class, and observed in less than three-quarters of sequences belonging to another class are shown in Tables <tblr tid="T4">4</tblr>, <tblr tid="T5">5</tblr>, and <tblr tid="T6">6</tblr>. Rules are described relative to the inferred common ancestor situation of the two categories being compared. For instance, when comparing PRD and ANTP class genes the rule "not negatively charged at position 27" means that it was inferred (by comparison with LIM genes) that the common ancestor to these two gene classes could use a negatively charged amino acid at position 27; in this example, ANTP genes subsequently almost completely lost the ability to use a negatively charged amino acid, but 94% of PRD genes retained the use of such an amino acid at this position (Table <tblr tid="T4">4</tblr>). An almost complete change in amino acid usage regarding polarity and hydrophobicity is also observed for amino acid position 30. Furthermore, about 30% of the PRD genes show a charged amino acid at position 50. HOXL and NKL genes never use charged amino acids at this position, and this is also the case for LIM genes, here used as an outgroup to ANTP and PRD genes. Therefore, the possibility of using a charged amino acid at position 50 seems to be a derived feature that appeared in the PRD lineage. This amino acid position has been previously used to sub-classify PRD genes into three categories <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. A charged amino acid is found in sequences from <it>Caenorhabditis elegans</it>, <it>Strongylocentrotus purpuratus</it>, and <it>Hydra vulgaris</it>, among others, thus this switch is an event that happened early in the evolution of PRD genes.</p>
				<tbl id="T4">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>Frequency of genes following amino acid rules specific for the PRD or ANTP classes.</p>
					</caption>
					<tblbdy cols="11">
						<r>
							<c ca="left">
								<p>Position</p>
							</c>
							<c cspan="2" ca="center">
								<p>27</p>
							</c>
							<c cspan="2" ca="center">
								<p>29</p>
							</c>
							<c cspan="2" ca="center">
								<p>30</p>
							</c>
							<c cspan="2" ca="center">
								<p>30</p>
							</c>
							<c cspan="2" ca="center">
								<p>50</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Rules</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not Ne</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not A</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (LP, NP)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (LH, VH)</p>
							</c>
							<c cspan="2" ca="center">
								<p>(P,Ne)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Dataset</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
						</r>
						<r>
							<c cspan="11">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>PRD</p>
							</c>
							<c ca="left">
								<p>0.060 (200)</p>
							</c>
							<c ca="left">
								<p>0.000 (14)</p>
							</c>
							<c ca="left">
								<p>0.305 (200)</p>
							</c>
							<c ca="left">
								<p>0.714 (14)</p>
							</c>
							<c ca="left">
								<p>0.020 (200)</p>
							</c>
							<c ca="left">
								<p>0.071 (14)</p>
							</c>
							<c ca="left">
								<p>0.020 (200)</p>
							</c>
							<c ca="left">
								<p>0.071 (14)</p>
							</c>
							<c ca="left">
								<p>0.315 (197)</p>
							</c>
							<c ca="left">
								<p>0.786 (14)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>(HOXL and NKL)</p>
							</c>
							<c ca="left">
								<p>0.998 (406)</p>
							</c>
							<c ca="left">
								<p>1.000 (230)</p>
							</c>
							<c ca="left">
								<p>1.000 (406)</p>
							</c>
							<c ca="left">
								<p>1.000 (230)</p>
							</c>
							<c ca="left">
								<p>0.998 (406)</p>
							</c>
							<c ca="left">
								<p>1.000 (230)</p>
							</c>
							<c ca="left">
								<p>0.998 (406)</p>
							</c>
							<c ca="left">
								<p>1.000 (230)</p>
							</c>
							<c ca="left">
								<p>0.000 (404)</p>
							</c>
							<c ca="left">
								<p>0.000 (230)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>In brackets is indicated the total number of sequences analysed.</p>
						<p>Phy &#8211; set of sequences classified using a phylogenetic approach</p>
						<p>Unc &#8211; set of sequences not classified using a phylogenetic approach. These sequences, nevertheless, show amino acid patterns typical of PRD, HOXL and NKL genes. Ne &#8211; negatively charged amino acids; P &#8211; positively charged amino acids; A -aromatic amino acids; LP &#8211; Less-polar amino acids; NP &#8211; Non-polar amino acids; LH &#8211; less hydrophobic amino acids; VH &#8211; very hydrophobic amino acids; S- small amino acids; T &#8211; tinny amino acids; Al &#8211; Alyphatic amino acids.</p>
					</tblfn>
				</tbl>
				<tbl id="T5">
					<title>
						<p>Table 5</p>
					</title>
					<caption>
						<p>Frequency of genes following amino acid rules specific for the HOXL or NKL sub-classes (positions 1 to 30).</p>
					</caption>
					<tblbdy cols="11">
						<r>
							<c ca="left">
								<p>Position</p>
							</c>
							<c cspan="2" ca="center">
								<p>14</p>
							</c>
							<c cspan="2" ca="center">
								<p>15</p>
							</c>
							<c cspan="2" ca="center">
								<p>15</p>
							</c>
							<c cspan="2" ca="center">
								<p>15</p>
							</c>
							<c cspan="2" ca="center">
								<p>28</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Rules</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not A</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (LH, VH)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (LP, NP)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (S, T)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (S, T)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Dataset</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
						</r>
						<r>
							<c cspan="11">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>HoxL</p>
							</c>
							<c ca="left">
								<p>0.995 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (177)</p>
							</c>
							<c ca="left">
								<p>1.000 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (177)</p>
							</c>
							<c ca="left">
								<p>1.000 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (177)</p>
							</c>
							<c ca="left">
								<p>1.000 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (177)</p>
							</c>
							<c ca="left">
								<p>0.990 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (178)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NKL</p>
							</c>
							<c ca="left">
								<p>0.709 (203)</p>
							</c>
							<c ca="left">
								<p>0.808 (52)</p>
							</c>
							<c ca="left">
								<p>0.588 (204)</p>
							</c>
							<c ca="left">
								<p>0.346 (52)</p>
							</c>
							<c ca="left">
								<p>0.598 (204)</p>
							</c>
							<c ca="left">
								<p>0.346 (52)</p>
							</c>
							<c ca="left">
								<p>0.637 (204)</p>
							</c>
							<c ca="left">
								<p>0.346 (52)</p>
							</c>
							<c ca="left">
								<p>0.500 (204)</p>
							</c>
							<c ca="left">
								<p>0.673 (52)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>In brackets is indicated the total number of sequences analysed.</p>
						<p>Legend as in Table 4</p>
					</tblfn>
				</tbl>
				<tbl id="T6">
					<title>
						<p>Table 6</p>
					</title>
					<caption>
						<p>Frequency of genes following amino acid rules specific for the HOXL or NKL sub-classes (positions 31 to 60).</p>
					</caption>
					<tblbdy cols="9">
						<r>
							<c ca="left">
								<p>Position</p>
							</c>
							<c cspan="2" ca="center">
								<p>33</p>
							</c>
							<c cspan="2" ca="center">
								<p>33</p>
							</c>
							<c cspan="2" ca="center">
								<p>47</p>
							</c>
							<c cspan="2" ca="center">
								<p>54</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Rules</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (LH, VH)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (NP, LP)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not (Non-Al)</p>
							</c>
							<c cspan="2" ca="center">
								<p>Not LP</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Dataset</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
							<c ca="left">
								<p>Phy</p>
							</c>
							<c ca="left">
								<p>Unc</p>
							</c>
						</r>
						<r>
							<c cspan="9">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>HoxL</p>
							</c>
							<c ca="left">
								<p>1.000 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (178)</p>
							</c>
							<c ca="left">
								<p>1.000 (202)</p>
							</c>
							<c ca="left">
								<p>1.000 (178)</p>
							</c>
							<c ca="left">
								<p>1.000 (201)</p>
							</c>
							<c ca="left">
								<p>1.000 (178)</p>
							</c>
							<c ca="left">
								<p>1.000 (200)</p>
							</c>
							<c ca="left">
								<p>1.000 (178)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NKL</p>
							</c>
							<c ca="left">
								<p>0.716 (204)</p>
							</c>
							<c ca="left">
								<p>0.865 (52)</p>
							</c>
							<c ca="left">
								<p>0.701 (204)</p>
							</c>
							<c ca="left">
								<p>0.865 (52)</p>
							</c>
							<c ca="left">
								<p>0.730 (204)</p>
							</c>
							<c ca="left">
								<p>0.981 (52)</p>
							</c>
							<c ca="left">
								<p>0.431 (202)</p>
							</c>
							<c ca="left">
								<p>0.731 (52)</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>In brackets is indicated the total number of sequences analysed.</p>
						<p>Legend as in Table 4</p>
					</tblfn>
				</tbl>
				<p>Genes belonging to the HOXL lineage show derived constraints at amino acid positions 14, 15, 28, 33, 47 and 54, relative to the inferred ancestral state for ANTP class genes (the PRD data set was used as an outgroup; Table <tblr tid="T4">4</tblr>). Only three sequences classified as HOXL do not follow the general pattern for these genes, namely, the <it>HMA2 </it>gene (PROSITE annotation) from <it>Helobdella triserialis </it>that shows an aromatic amino acid at position 14, and the <it>HXB8 </it>and <it>PAL1 </it>genes (PROSITE annotation) from <it>Sus scrofa </it>and <it>Caenorhabditis elegans</it>, respectively, that show small amino acids at position 28. No derived single amino acid constraints were found for the NKL lineage.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>The amino acid patterns and single key amino acid changes here identified shed light on some of the major likely functional changes that occurred during the evolution of the HOXL, NKL and PRD genes. Our results show that important amino acid changes happened very early in the evolution of these genes, and thus it is possible to identify an archetype for bilaterian PRD, ANTP, HOXL and NKL genes. As with every generalization, however, some genes do not fit the archetype. Experimental studies are now needed in order to understand why the archetypes possess such chemical properties. We propose the following archetypes for homeodomain amino acid sequences showing the ANTP-PRD pattern (this pattern is present in 97% of all bilaterian ANTP and PRD sequences).</p>
			<p>PRD genes show the L [EINQRV] [^DGHMPTVWY] [^CDGKMNPQR] [FL] [^CFILPTWY] [AEFHKQRV] [ADEGKNSTW] [CHKMPQR] [FHY]P (positions 16&#8211;26) PRD pattern, use a negatively charged amino acid at position 27 (DE) and a less polar or non-polar amino acid (^RKDENQ) at position 30. About 80% of all animal bilaterian PRD genes follow this description.</p>
			<p>ANTP (Demox, HOXL and NKL) genes show the [HQ] [IV] [AKLT] (positions 44&#8211;46) ANTP 1 pattern. In contrast to PRD genes, ANTP genes do not use a negatively charged amino acid at position 27 (DE). Furthermore, at position 30, 99.8% of all bilaterian ANTP genes use a polar amino acid (RKDENQ). 97% of all ANTP bilaterian genes follow this description (data not shown).</p>
			<p>HOXL genes are characterized by the presence of the HOXL 1 pattern. 96% of all bilaterian HOXL genes show the [KT] [IV]WFQNRR [AMV]K [DEHKLMQWY] [KR] [KR] (positions 46&#8211;58) HOXL 1 pattern.</p>
			<p>NKL genes show the NKL pattern (positions 41&#8211;45) or a NKL derived pattern. About 98% of the bilaterian NKL genes follow this pattern. If the NKL pattern is generalized to include the derived NKL patterns then many hits are observed in other homeodomain classes (data not shown). This is why several NKL-related patterns are reported.</p>
			<p>The above definitions suggest that the region in between amino acid positions 16&#8211;30 and 44&#8211;58 are the most important for function specificity of genes belonging to different classes and sub-classes. The former region corresponds to the end of Helix 1, the inter Helix 1&#8211;2 region and to the first positions of Helix 2 <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The region in between amino acid positions 44&#8211;58 correspond mainly to helix 3. This helix, also called recognition helix is essential for successful and specific DNA binding <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
			<p>The situation observed for NKL genes suggest that specific amino acid patterns may exist for groups of genes within major classes, as previously suggested <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. It also indicates that not all motifs implied by an amino acid pattern occur in a given homeodomain class. On the other hand, the situation observed for non-bilaterian vs bilaterian genes suggests that we may have failed to identify all relevant chemical changes. For instance, it is conceivable that important amino acid changes are observed when the protostome and the deuterostome lineages are compared; such distinctions were not addressed in this work because we grouped genes from a given homeodomain class together, irrespective of the organism.</p>
			<p>It is tempting to use these features to classify homeobox containing genes. Nevertheless, given the very old age of the gene families considered, the possibility of convergent evolution must be considered when analyzing a given amino acid sequence. Therefore, for the purpose of gene classification, the features here described should not be decisive but rather used as additional evidence. It should be noted, that the reconstruction of gene genealogies is a hard problem, namely when proteins belonging to the same family share some of the same protein interaction partners, thus facing a similar selective environment (Campos et al. 2004). Any additional piece of evidence that may shed light on the correct classification of genes should therefore be used.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>In this work we report a method for the fast retrieval of bilaterian ANTP and PRD sequences. Given the availability of a sufficiently large curated data set this method can be applied to any group of proteins. Furthermore, we report some of the main amino acid changes that occurred early in the HOXL, NKL and PRD lineages. These features can be used for the classification of gene sequences, although, as shown, convergent evolution must be considered as an explanation for the presence of a given pattern in a sequence. The possibility that the region of the protein that allows distinguishing the different classes, also allows the distinction of different families within classes, has important practical and evolutionary consequences and must be explored in more detail.</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>NAF, JV and CPV conceived the design of the study. NAF developed the software needed for the large-scale analyses, while JV and CPV collected, aligned the sequence data, and performed the phylogenetic analyses. JV drafted the manuscript. All authors participated in the results discussion and helped writing the final version of the manuscript. All authors read and approved the final manuscript.</p>
		</sec>
		<sec>
			<st>
				<p>Appendix</p>
			</st>
			<p>Gi numbers of the 251 non-bilaterian non-redundant sequences used for the analysis presented in Table <tblr tid="T3">3</tblr>. All sequences show the ANTP-PRD amino acid pattern.</p>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1, ANTP2, HOXL1 and HOXL2</p>
				</st>
				<p><it>Nematostella vectensis</it>: 32263856; 32263873; 74039490; 74039492; 82621559; 82621587; 82621611; 82621663; 83356315; 156363224; 156363226; 156387518; 156397205</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1, ANTP2, HOXL1 and NKL</p>
				</st>
				<p><it>Ephydatia fluviatilis</it>: 438584</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1, ANTP2 and NKL</p>
				</st>
				<p><it>Acropora millepora</it>: 117581722; 117581724; 117581726; 117581728; <it>Anemonia erythraea</it>: 158936936; <it>Ephydatia fluviatilis</it>: 438585; 510584; <it>Halichondria bowerbanki</it>: 33641771; <it>Nematostella vectensis</it>: 32816231; 78190373; 82621509; 82621525; 82621533; 82621555; 82621557; 82621585; 82621591; 82621609; 82621625; 82621657; 82621665; 82621667; 110339021; 110339029; 110339121; 156353243; 156363798; 156367335; 156372762; 156397943; 168693291;<it>Suberites domuncula</it>: 34786940; 49659003; <it>Sycon raphanus</it>: 11066243</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1 and NKL</p>
				</st>
				<p><it>Anthopleura japonica</it>: 144369330; 144369363; <it>Hydra vulgaris</it>: 2331219; 7635735; <it>Nematostella vectensis</it>: 82570553; 82621543; 82621677; 110339077; 110339133; 156376845; 156402818; 156406963; 156407174</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1, ANTP2 and EN</p>
				</st>
				<p><it>Nematostella vectensis</it>: 82621563; 82621631; 82621655; 156366927; 156402173</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1, ANTP2 and NOTO</p>
				</st>
				<p><it>Nematostella vectensis</it>: 82621579; 82621647; 82621651; 110339063; 156388033; 156400908; 156407394</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1, ANTP2 and VENTX</p>
				</st>
				<p><it>Nematostella vectensis</it>: 110339027</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD, ANTP1 and ANTP2</p>
				</st>
				<p><it>Baikalospongia interm&#233;dia</it>: 62238212; <it>Ephydatia fluviatilis</it>: 1438870; <it>Ephydatia muelleri</it>: 3184520; <it>Hydra vulgaris</it>: 7635740; <it>Nematostella vectensis</it>: 82621517; 82621615; 82621619; 82621623; 82621635; 82621653; 156375827; 156375891; 156376334; 156401296; 156402692; <it>Podocoryne carnea</it>: 28188799; <it>Potamolepis sp</it>.: 62238208; <it>Suberites domuncula</it>: 49659001</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing patterns ANTP-PRD and PRD</p>
				</st>
				<p><it>Anthopleura jap&#243;nica</it>: 144369334; <it>Cladonema californicum</it>: 9652040; <it>Hydra vulgaris</it>: 3021450; 3021452; 6503072; <it>Nematostella vectensis</it>: 38569883; 78190377; 82395396; 82395402; 82395404; 82395406; 82395408; 82395412; 82395414; 82570541; 82570557; 82570559; 82570563; 110339183; 110339213; 110339215; 110339225; 113120207; 156377162; 156615306;<it>Tripedalia cystophora</it>: 33391193</p>
			</sec>
			<sec>
				<st>
					<p>Sequences showing other pattern combinations</p>
				</st>
				<p><it>Acropora formosa</it>: 228960; <it>Acropora millepora</it>: 7335704; 7595811; 7595813; 13506878; 117581730; 117581732;<it>Anthopleura jap&#243;nica</it>: 144369323; 144369326; 144369366; <it>Aurelia aurita</it>: 50841484; <it>Cassiopea xamachana</it>: 4894653; 4894655; 4894659; <it>Cladonema radiatum</it>: 47155918; 47155920; 47155922; <it>Eleutheria dichotoma</it>: 1147626; 91982983; 91982989; <it>Hydra littoralis</it>: 2102728; <it>Hydra magnipapillata</it>: 630481; 630482; 89242120; 144369369; 144369375; <it>Hydra viridis</it>: 7120; 7124; 7130; 83763566; <it>Hydra vulgaris</it>: 4433647; 4838455; 7635731; 7635733; 7635737; 7635742; 9945022; <it>Hydractinia symbiolongicarpus</it>: 2980868; 83272159; <it>Nematostella vectensis</it>: 5081328; 32816237; 74039494; 78190375; 82395398; 82395410; 82570519; 82570521; 82570527; 82570529; 82570537; 82570539; 82570545; 82570549; 82570555; 82570561; 82621507; 82621511; 82621513; 82621515; 82621519; 82621523; 82621527; 82621531; 82621535; 82621539; 82621541; 82621547; 82621549; 82621551; 82621565; 82621567; 82621571; 82621573; 82621575; 82621589; 82621595; 82621601; 82621603; 82621605; 82621613; 82621621; 82621627; 82621629; 82621633; 82621637; 82621643; 82621645; 82621671; 82621675; 83356317; 99030986; 110338989; 110339023; 110339061; 110339099; 110339101; 110339115; 110339171; 110339191; 110339211; 110339247; 113120203; 156356964; 156358580; 156361301; 156361303; 156364678; 156368221; 156371265; 156371439; 156372678; 156374167; 156377158; 156377160; 156377164; 156389076; 156390886; 156393340; 156395806; 156396978; 156398319; 156407912; <it>Podocoryne carnea</it>: 976094; 6118056; 6465862; 7649930; 9964019; 47155914; 62002543; <it>Sarsia sp</it>.: 9988771; <it>Scolionema suvaense</it>: 158936914; <it>Suberites domuncula</it>: 49659005; <it>Trichoplax adhaerens</it>: 38565482</p>
			</sec>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>NAF is the recipient of a Post-Doctoral grant SFRH/BPD/26737/2006 from FCT. This work has been partially funded by POCI 2010, co-funded by FEDER funds.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>A comprehensive classification of homeobox genes</p>
				</title>
				<aug>
					<au>
						<snm>B&#252;rglin</snm>
						<fnm>TR</fnm>
					</au>
				</aug>
				<source>Guidebook to the Homeobox Genes</source>
				<publisher>Oxford , Oxford University Press</publisher>
				<editor>Duboule D</editor>
				<pubdate>1994</pubdate>
				<fpage>25</fpage>
				<lpage>71</lpage>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Homeodomain proteins</p>
				</title>
				<aug>
					<au>
						<snm>B&#252;rglin</snm>
						<fnm>TR</fnm>
					</au>
				</aug>
				<source>Encyclopedia of Molecular Cell Biology and Molecular Medicine</source>
				<publisher>Weinheim , Wiley-VCH Verlag GmbH &amp; Co.</publisher>
				<editor>Meyers RA</editor>
				<edition>2</edition>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>179</fpage>
				<lpage>222</lpage>
			</bibl>
			<bibl id="B3">
				<title>
					<p>The zootype and the phylotypic stage</p>
				</title>
				<aug>
					<au>
						<snm>Slack</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Holland</snm>
						<fnm>PW</fnm>
					</au>
					<au>
						<snm>Graham</snm>
						<fnm>CF</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1993</pubdate>
				<volume>361</volume>
				<issue>6412</issue>
				<fpage>490</fpage>
				<lpage>492</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8094230</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Homeotic genes and the evolution of arthropods and chordates</p>
				</title>
				<aug>
					<au>
						<snm>Carroll</snm>
						<fnm>SB</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1995</pubdate>
				<volume>376</volume>
				<issue>6540</issue>
				<fpage>479</fpage>
				<lpage>485</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7637779</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Gene duplication: past, present and future</p>
				</title>
				<aug>
					<au>
						<snm>Holland</snm>
						<fnm>PW</fnm>
					</au>
				</aug>
				<source>Semin Cell Dev Biol</source>
				<pubdate>1999</pubdate>
				<volume>10</volume>
				<issue>5</issue>
				<fpage>541</fpage>
				<lpage>547</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10597638</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Classification and nomenclature of all human homeobox genes</p>
				</title>
				<aug>
					<au>
						<snm>Holland</snm>
						<fnm>PW</fnm>
					</au>
					<au>
						<snm>Booth</snm>
						<fnm>HA</fnm>
					</au>
					<au>
						<snm>Bruford</snm>
						<fnm>EA</fnm>
					</au>
				</aug>
				<source>BMC Biol</source>
				<pubdate>2007</pubdate>
				<volume>5</volume>
				<issue>1</issue>
				<fpage>47</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">2211742</pubid>
						<pubid idtype="pmpid" link="fulltext">17963489</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>HoxPred: automated classification of Hox proteins using combinations of generalised profiles</p>
				</title>
				<aug>
					<au>
						<snm>Thomas-Chollier</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Leyns</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Ledent</snm>
						<fnm>V</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2007</pubdate>
				<volume>8</volume>
				<fpage>247</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1965487</pubid>
						<pubid idtype="pmpid" link="fulltext">17626621</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>An automated phylogenetic key for classifying homeoboxes</p>
				</title>
				<aug>
					<au>
						<snm>Sarkar</snm>
						<fnm>IN</fnm>
					</au>
					<au>
						<snm>Thornton</snm>
						<fnm>JW</fnm>
					</au>
					<au>
						<snm>Planet</snm>
						<fnm>PJ</fnm>
					</au>
					<au>
						<snm>Figurski</snm>
						<fnm>DH</fnm>
					</au>
					<au>
						<snm>Schierwater</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>DeSalle</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Mol Phylogenet Evol</source>
				<pubdate>2002</pubdate>
				<volume>24</volume>
				<issue>3</issue>
				<fpage>388</fpage>
				<lpage>399</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12220982</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Hox genes in evolution: protein surfaces and paralog groups</p>
				</title>
				<aug>
					<au>
						<snm>Sharkey</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Graba</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Scott</snm>
						<fnm>MP</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>1997</pubdate>
				<volume>13</volume>
				<issue>4</issue>
				<fpage>145</fpage>
				<lpage>151</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9097725</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>PROSITE: Database of protein domains, families and functional sites</p>
				</title>
				<url>http://www.expasy.org/prosite/</url>
			</bibl>
			<bibl id="B11">
				<title>
					<p>NCBI: National Center for Biotechnology Information</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/index.html</url>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Fast discovery of statistically interesting words.</p>
				</title>
				<aug>
					<au>
						<snm>Pereira</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Fonseca</snm>
						<fnm>NA</fnm>
					</au>
					<au>
						<snm>Silva</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Technical Report DCC-2007-01, DCC-FC \&amp; LIACC, Universidade do Porto</source>
				<pubdate>2006</pubdate>
			</bibl>
			<bibl id="B13">
				<url>http://www.dcc.fc.up.pt/dcc/Pubs/TReports/index.html?&amp;item=305#</url>
			</bibl>
			<bibl id="B14">
				<title>
					<p>A high performance distributed tool for mining patterns in biological sequences</p>
				</title>
				<aug>
					<au>
						<snm>Pereira</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Fonseca</snm>
						<fnm>NA</fnm>
					</au>
					<au>
						<snm>Silva</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>Technical Report DCC-2006-08, DCC-FC \&amp; LIACC, Universidade do Porto</source>
				<pubdate>2006</pubdate>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Rapid and sensitive sequence comparison with FASTP and FASTA</p>
				</title>
				<aug>
					<au>
						<snm>Pearson</snm>
						<fnm>WR</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1990</pubdate>
				<volume>183</volume>
				<fpage>63</fpage>
				<lpage>98</lpage>
				<xrefbib>
					<pubid idtype="pmpid">2156132</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation</p>
				</title>
				<aug>
					<au>
						<snm>Livingstone</snm>
						<fnm>CD</fnm>
					</au>
					<au>
						<snm>Barton</snm>
						<fnm>GJ</fnm>
					</au>
				</aug>
				<source>Comput Appl Biosci</source>
				<pubdate>1993</pubdate>
				<volume>9</volume>
				<issue>6</issue>
				<fpage>745</fpage>
				<lpage>756</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8143162</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0</p>
				</title>
				<aug>
					<au>
						<snm>Tamura</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Dudley</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Nei</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kumar</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2007</pubdate>
				<volume>24</volume>
				<issue>8</issue>
				<fpage>1596</fpage>
				<lpage>1599</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">17488738</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>The Mnx homeobox gene class defined by HB9, MNR2 and amphioxus AmphiMnx</p>
				</title>
				<aug>
					<au>
						<snm>Ferrier</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Brooke</snm>
						<fnm>NM</fnm>
					</au>
					<au>
						<snm>Panopoulou</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Holland</snm>
						<fnm>PW</fnm>
					</au>
				</aug>
				<source>Dev Genes Evol</source>
				<pubdate>2001</pubdate>
				<volume>211</volume>
				<issue>2</issue>
				<fpage>103</fpage>
				<lpage>107</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11455421</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Conservation and phylogeny of a novel family of non-Hox genes of the Antp class in Demospongiae (porifera)</p>
				</title>
				<aug>
					<au>
						<snm>Richelle-Maurer</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Boury-Esnault</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Itskovich</snm>
						<fnm>VB</fnm>
					</au>
					<au>
						<snm>Manuel</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Pomponi</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Van de Vyver</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Borchiellini</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>2006</pubdate>
				<volume>63</volume>
				<issue>2</issue>
				<fpage>222</fpage>
				<lpage>230</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">16786434</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The NK homeobox gene cluster predates the origin of Hox genes</p>
				</title>
				<aug>
					<au>
						<snm>Larroux</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Fahey</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Degnan</snm>
						<fnm>SM</fnm>
					</au>
					<au>
						<snm>Adamski</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rokhsar</snm>
						<fnm>DS</fnm>
					</au>
					<au>
						<snm>Degnan</snm>
						<fnm>BM</fnm>
					</au>
				</aug>
				<source>Curr Biol</source>
				<pubdate>2007</pubdate>
				<volume>17</volume>
				<issue>8</issue>
				<fpage>706</fpage>
				<lpage>710</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">17379523</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Evolution of homeobox genes: Q50 Paired-like genes founded the Paired class</p>
				</title>
				<aug>
					<au>
						<snm>Galliot</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>de Vargas</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Dev Genes Evol</source>
				<pubdate>1999</pubdate>
				<volume>209</volume>
				<issue>3</issue>
				<fpage>186</fpage>
				<lpage>197</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10079362</pubid>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
