<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-7-329</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Software</dochead>
		<bibl>
			<title>
				<p>An interactive visualization tool to explore the biophysical properties of amino acids and their contribution to substitution matrices</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Bulka</snm>
					<fnm>Blazej</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>bulka1@umbc.edu</email>
				</au>
				<au id="A2">
					<snm>desJardins</snm>
					<fnm>Marie</fnm>
					<insr iid="I1"/>
					<email>mariedj@cs.umbc.edu</email>
				</au>
				<au id="A3" ca="yes">
					<snm>Freeland</snm>
					<mi>J</mi>
					<fnm>Stephen</fnm>
					<insr iid="I2"/>
					<email>freeland@umbc.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA</p>
				</ins>
				<ins id="I2">
					<p>Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<issn>1471-2105</issn>
			<pubdate>2006</pubdate>
			<volume>7</volume>
			<issue>1</issue>
			<fpage>329</fpage>
			<url>http://www.biomedcentral.com/1471-2105/7/329</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16817972</pubid><pubid idtype="doi">10.1186/1471-2105-7-329</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>12</day>
					<month>12</month>
					<year>2005</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>03</day>
					<month>7</month>
					<year>2006</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>03</day>
					<month>7</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Bulka et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Quantitative descriptions of amino acid similarity, expressed as probabilistic models of evolutionary interchangeability, are central to many mainstream bioinformatic procedures such as sequence alignment, homology searching, and protein structural prediction. Here we present a web-based, user-friendly analysis tool that allows any researcher to quickly and easily visualize relationships between these bioinformatic metrics and to explore their relationships to underlying indices of amino acid molecular descriptors.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We demonstrate the three fundamental types of question that our software can address by taking as a specific example the connections between 49 measures of amino acid biophysical properties (e.g., size, charge and hydrophobicity), a generalized model of amino acid substitution (as represented by the PAM74-100 matrix), and the mutational distance that separates amino acids within the standard genetic code (i.e., the number of point mutations required for interconversion during protein evolution). We show that our software allows a user to recapture the insights from several key publications on these topics in just a few minutes.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>Our software facilitates rapid, interactive exploration of three interconnected topics: (i) the multidimensional molecular descriptors of the twenty proteinaceous amino acids, (ii) the correlation of these biophysical measurements with observed patterns of amino acid substitution, and (iii) the causal basis for differences between any two observed patterns of amino acid substitution. This software acts as an intuitive bioinformatic exploration tool that can guide more comprehensive statistical analyses relating to a diverse array of specific research questions.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Molecular biology has made great progress in observing and quantifying the patterns by which amino acids exchange for one another within protein sequences over time. A key motivation here has been to create amino acid substitution matrices (such as the PAM and BLOSUM matrix families), which lie at the heart of mainstream bioinformatics procedures, from algorithms that determine whether <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and how exactly <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> two proteins are homologous, to those that predict protein tertiary structure by comparison with known folds <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. However, these matrices represent generalized patterns of change "averaged" across all proteins: although they typically encompass the idea that patterns of substitution will vary with evolutionary distance, other systematic sources of variation are overlooked. An increasing literature supports the idea that this generalization may compromise the sensitivity of sequence comparison for various specialized subsets of proteins (e.g., for particular protein families <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, or for genomes that have evolved under unusual mutation biases or selection regimes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>). Thus a worthy challenge is to seek the underlying ontology that can link individually derived, specialized models of amino acid substitution into a common framework: if we can ultimately replace generalized patterns of observed change with a flexible, quantitative model of amino acid substitution, then this offers significant potential to increase the sophistication of standard bioinformatics procedures. Such research may in fact be viewed as a subset of current efforts to find a general, chemical ontology for bioactivity (e.g., <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>) where researchers face the same challenge of unifying diverse observations into a model that predicts molecular interactions from first principles.</p>
			<p>In this context, it has long been understood that amino acid substitution matrices reflect a combination chemical and evolutionary factors: most intuitively the biophysical properties (known within chemical disciplines as "molecular descriptors") of the amino acids <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp> and the mutational distance of their encodings within the genetic code <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. However, establishing accurate, quantitative connections between the outcomes of molecular evolution and amino acids' molecular descriptors remains a complex issue under active research (e.g., <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>).</p>
			<p>In this context, Nakai <it>et al</it>. created an innovative database, the AAindex <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, comprising both amino acid substitution matrices (20 &#215; 20 matrices in which each element reflects some measure of the exchangeability of a pair of amino acids) and amino acid indices (vectors of 20 elements, each element being a value that describes some physiochemical property such as size or hydrophobicity, for one of the twenty amino acids encoded by the standard genetic code). In a later publication that expanded this database, Tomii and Kanehisa <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> suggested procedures for correlating any amino acid molecular descriptor with an observed exchange rate (e.g., substitution matrix) and for clustering indices together by similarity.</p>
			<p>This latter technique of index clustering, is especially useful when exploring the relationship between indices, given that properties of widespread interest have often been measured in many different ways by different researchers. (For example, the latest version of the AAindex database <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> contains 29 different measurements of a property that contains the term "hydrophobicity" in its description.) Moreover, this comparison allows easy visualization of non-intuitive correlations (e.g., hydrophobicity and volume). The authors applied similarity-based methods to their AAindex database to build a <it>minimum spanning tree</it>: a graph-theoretic structure that connects discrete elements together based on similarity, by minimizing the overall sum of the distances of the direct connections. The result is a data structure in which elements are grouped together based on similarity (a detailed description and justification is given in the work of Tomii and Kanehisa who first applied this methodology to visualizing amino acid similarity <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>). This minimum spanning tree showed the underlying structure (clustering) for the 402 indices of their database. Since this time, numerous further indices and matrices have been developed: some have been incorporated into updates of the AAindex, while others remain isolated in the scientific literature (e.g., <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B25">25</abbr></abbrgrp>).</p>
			<p>In this context, we have developed free, user-friendly, publicly available web-based software that enables researchers to repeat and extend the ideas of Nakai <it>et al</it>., <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and Tomii and Kanehisa <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> using interactive data visualization. We thus present the Amino Acid Explorer, a web tool that facilitates quantitative exploration of similarity between physiochemical properties of amino acids and their evolutionary dynamics. Our tool allows users to explore the similarity between any of the 83 matrices and any subset of the 494 indices housed by AAindex version 6.0, and to include any custom index or matrix (e.g., from recent scientific literature or from unpublished research, as a matrix derived from an alignment of proteins in a particular functional class, or an index derived by combining several physiochemical properties). We have embedded this analysis tool within a comprehensive web context: both a moderated user forum <url>http://www.evolvingcode.net/forum/viewforum.php?f=24</url> in which to discuss problems, findings or questions and an open wiki <url>http://www.evolvingcode.net/index.php?page=Amino_Acid_Indices</url> in which the community of those researching the interface of biochemistry and protein evolution may contribute their knowledge.</p>
		</sec>
		<sec>
			<st>
				<p>Implementation</p>
			</st>
			<p>Our web tool, which may be accessed at <url>http://www.evolvingcode.net:8080/aaindex/</url>, comprises two major parts: one client side, one server side. The client side consists of the graphical interface that runs as a Java applet within a user's browser. The server side (residing on <url>http://www.evolvingcode.net</url>), is a web application that performs all computations on the data, and is part of a larger computational infrastructure created around UMBC AAIndex database. Figure <figr fid="F1">1</figr> shows an overview of our tool's architecture. Additionally, a short paragraph describing UMBC AAIndex database is located at the end of this section.</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Overview of Amino Acid Explorer Architecture</p>
				</caption>
				<text>
					<p>Overview of Amino Acid Explorer Architecture.</p>
				</text>
				<graphic file="1471-2105-7-329-1"/>
			</fig>
			<sec>
				<st>
					<p>User interface and visualization</p>
				</st>
				<p>The user interface of our tool is a Java applet that runs in a user's browser. It allows the user to (i) select any subset of the AAIndex indices (or custom indices) to be clustered using the minimum spanning tree method, (ii) choose an appropriate distance calculation method (to be used during the spanning tree computation), and (iii) choose a matrix or matrices to compare with the indices of a spanning tree.</p>
				<p>Specifically, having built a spanning tree, the application can compute distances between all the indices in this tree and a user defined matrix; it displays these distances by shading the elements of the spanning tree with a color-coded scale. Additionally, it can use a second color-coded scale to display which of two user-defined matrices each index of the spanning tree is closest to (in other words, what makes these two matrices different from one another in terms of the indices under consideration?).</p>
				<sec>
					<st>
						<p>Drawing the spanning tree</p>
					</st>
					<p>Graph drawing and visualization are currently open research topics in computer science <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Although an agreed method exists for creating the graph (calculating a spanning tree), finding an optimal spatial positioning for nodes and drawing edges in a readable way (e.g., grouping nodes that are directly connected together, while minimizing crossed edges) remain active areas of research. A large number of different software packages implement a variety of state-of-the-art graph drawing methods, which differ significantly in speed, quality of the drawing, and interactivity (i.e., allowing the user to influence the final shape of the graph being drawn). Our visualization tool uses a slightly modified form of the open source-package TouchGraph <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> to render the minimum spanning tree that was computed server-side. (Modifications to the original TouchGraph code are limited to changes that redefine the default parameters for flexibility of the edges, and minor modifications required to integrate the code into our applet.) A full description of TouchGraph can be found at their web site; in essence, it uses an iterative "force-based layout" algorithm (in which nodes each projects a force that repel other nodes, while edges act like springs that can be compressed or stretched) to move, though a series of incremental improvements, from a random graph layout to an optimal representation. The whole incremental process is visible, and the user can intervene at any point by dragging nodes to locations that seem to be better suited. In our application, this is most likely to be useful when users request a spanning tree for a large set of amino acid indices, under which conditions the force-based layout may become stuck at a local optimum, visible to the user as a representation in which one or a few key edges cross one another.</p>
				</sec>
				<sec>
					<st>
						<p>Visualizing distances between a matrix and a set of indices</p>
					</st>
					<p>Our application represents the distances between matrices and indices in two modes. In the first mode, each node in the spanning tree (representing a single amino acid index) is color-coded to represent its measured similarity to a single, user-defined reference matrix. The color scale runs from blue (most distant) to red (most similar). Distances are measured as described below. The second mode (<it>differential mode</it>) shows how two substitution matrices differ in terms of the amino acid indices of a spanning tree. This mode uses a color-coded scale to denote which of two matrices is closest to each node (index). In the figures shown here, the color scale is green (matrix 1) to brown (matrix 2) so as to avoid any confusion with Mode #1 described above. The degree of color saturation denotes the magnitude of the difference (i.e., strong colors indicate that the two matrices are very different in terms of this index).</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Computations</p>
				</st>
				<p>All significant computation for this tool occurs on the server-side, because it often involves most or all of the data stored in the database (thus transfer to a client-side applet could take prohibitive time for users with low-bandwidth connections).</p>
				<sec>
					<st>
						<p>Computation of a minimum spanning tree</p>
					</st>
					<p>The software calculates a minimum spanning tree using Prim's algorithm, as described by Cormen <it>et al</it>. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Since this algorithm minimizes the total sum of distances between directly connected indices, the definition of distance here is of prime importance. Tomii and Kanehisa <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> used a statistical correlation measure between two indices (each is a vector of 20 numbers representing an amino acid property). Our software allows users to employ this metric, but also to explore another notion of distance, namely Euclidean distance (calculating distance between two indices as distance between two points in 20-dimensional space). This approach is often taken to compare normalized vectors in multi dimensional spaces <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. More generally, our software allows users to restrict the set of amino acids that are taken into account when calculating distance (e.g., it is possible to consider only hydrophobic amino acids, or only those encoded by GC-rich codons), whichever metric of distance is being used.</p>
				</sec>
				<sec>
					<st>
						<p>Computation of distance between a matrix and a set of indices</p>
					</st>
					<p>In order to compute the distance between a matrix and a set of indices, our software uses the correlation method described by Tomii and Kanehisa <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. This method first converts each index (a vector of 20 values, one for each amino acids) into a matrix by calculating the simple arithmetic distance between each pair of amino acids, as defined by the index. It then calculates the correlation coefficient between these two matrices. While the Euclidean distance method may be used to build a minimum spanning tree of indices, which have been normalized to facilitate direct comparison, this method would is inappropriate for matrix/index comparisons because matrix values have not been normalized (i.e., matrix elements may extend beyond the interval from 0 to 1 and thus Euclidean distance between any one element of an index and elements of a matrix would be misleading. Linear normalization of matrix elements would itself be inappropriate since many matrices, such as the PAM series, comprise values that are expressed in logarithmic units). Therefore, our software always uses the Tomii and Kanehisa method of simple correlation to compare a matrix with an index. If the user has selected only a subset of the 20 amino acids for tree building, then calculations of distance between a matrix and the indices of a spanning tree consider only the appropriate subset of matrix elements.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>UMBC AAIndex database</p>
				</st>
				<p>We created the UMBC version of the AAIndex database as a local version of the original AAindex data (created by GenomeNet Japan <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>) to facilitate the manipulations required by our interactive software. Specifically, our local implementation converted all data of the original AAindex to XML format, generated interfaces that enable precise local and remote access to all aspects of the database, and normalized all amino acid index data.</p>
				<p>XML is a standardized language that is designed to simplify sharing of information among independently created systems. In particular, it is easily readable by machines (there are many code libraries that allow access to XML data by programs written in almost any programming language), and thus facilitates conversions to other languages, both to formats that are intended to be read by humans (e.g., web pages or PDF files) and to other computer formats. Our UMBC AAIndex database allows direct user access via internet either in "raw" form (plain XML data) or transformed to a web page that is designed to be easily read by a human. In the former capacity, our implementation of this database has been designed for simple access by either programs residing on our server, or by simple HTTP requests from remote machines. When bandwidth for data transfer is an issue for some third-party users, our architecture also allows deployment of programs directly at the server for a more direct access. Both of these latter points reflect our aim to facilitate other researchers who would like to expand and improve the functionality we offer for the AAindex data.</p>
				<p>The indices in the database have been normalized by linearly scaling all the values of each index from 0 (the smallest value of the original index) to 1 (the greatest value of the original index). This simplifies and makes more intuitive the comparison of values between different indices, which may originally have had values expressed using different units. (Note that this normalization does not influence the results obtained by the correlation coefficient method used by Tomii and Kanehisa <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, which may be reproduced exactly by our software in a matter of seconds.)</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>Here we present three simple, example analyses to illustrate the types of exploration that our software allows. Each illustrates a conceptually different question that the tool reduces to a simple "point and click" exercise. We have chosen to focus on the relationship between biophysical properties of amino acids, patterns of molecular evolution, and the structure of the standard genetic code. However, it would be trivial to find an equivalent set of example analyses that focused on protein folding or homology searching. Indeed, our visualization software can be used to investigate any area of bioinformatics that builds on understanding how amino acids' molecular descriptors influence the patterns by which amino acids substitute for one another during evolution.</p>
			<p>In Figure <figr fid="F2">2</figr>, we show an analysis (taking approximately 40 seconds to produce) in which we build a minimum spanning tree of indices relating to amino acid size, charge, and hydrophobicity. Interestingly, while measures of charge and size form coherent units (boxes A and B respectively), the more numerous measures of hydrophobicity form three major branches. Notably, index 388, Polar Requirement <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, is a measure of amino acid polarity that has been used extensively in developing evidence for the idea that the pattern by which amino acids were assigned to codons within the standard genetic code results from natural selection to minimize the change in amino acid hydrophobicity caused by point mutations <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. Although this minimum spanning tree emphasizes the legitimacy of treating Polar Requirement as a measure of hydrophobicity (its authors originally introduced the metric as an estimate of stereic affinities between nucleotides and amino acids <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>), the tri-partite spanning tree for the concept of hydrophobicity illustrates the potential dangers of over-emphasizing any one measure of hydrophobicity. In this context, it is helpful to note that a second "branch" of amino acid hydrophobicity measures includes Kyte and Doolittle's <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> "hydropathy" (index 151) which is also strongly reflected by the codon assignments of the standard genetic code <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>A minimum spanning tree of size, charge and hydrophobicity for the 20 amino acids of the standard genetic code</p>
				</caption>
				<text>
					<p><b>A minimum spanning tree of size, charge and hydrophobicity for the 20 amino acids of the standard genetic code</b>. Specifically, this tree is built from the 67 amino acid indices that contain the words "hydrop" and/or "polar," "size," "volume," "charge," and "electr" as part of their description. This includes most of the indices that relate to the general concepts of amino acid size, charge, and hydrophobicity. Boxes A and B represent "natural" clusters formed by the minimum spanning tree of charge and size, respectively.</p>
				</text>
				<graphic file="1471-2105-7-329-2"/>
			</fig>
			<p>In Figure <figr fid="F3">3</figr>, we show a second analysis (taking approximately 5 seconds to produce, given the tree of Figure <figr fid="F2">2</figr>) in which we measure the similarity of each index in our original minimum spanning tree to a classic amino acid substitution matrix: the PAM 74&#8211;100 <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Here we see that generally, measures of amino acid hydrophobicity correlate well with observed patterns of amino acid substitution, though interestingly, Polar Requirement is by no means the strongest of these (an observation pertinent to the debate over cause and effect of hydrophobicity as a dominant explanatory variable of generalized amino acid substitution patterns <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B10">10</abbr></abbrgrp>). Amino acid volume shows some correlation with substitution patterns, but charge (as measured by these indices) is by far the least related property. This provides a quick, empirical justification for the general patterns predicted, for example, by Grantham <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. It also matches analyses of which fundamental amino acid properties are reflected within the codon assignments of the standard genetic code <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B37">37</abbr></abbrgrp>.</p>
			<fig id="F3">
				<title>
					<p>Figure 3</p>
				</title>
				<caption>
					<p>The minimum spanning tree recolored to reflect distance to a PAM matrix</p>
				</caption>
				<text>
					<p><b>The minimum spanning tree recolored to reflect distance to a PAM matrix</b>. Specifically, the minimum spanning tree of size, charge, and hydrophobicity (Figure 2) is recolored to indicate the similarity of each amino acid index to the PAM74-100 amino acid substitution matrix [5].</p>
				</text>
				<graphic file="1471-2105-7-329-3"/>
			</fig>
			<p>In Figure <figr fid="F4">4</figr>, we show a further analysis (taking approximately 10 seconds in total to produce, given the tree of Figure <figr fid="F1">1</figr>) that explores how the PAM74-100 matrix differs from Fitch's matrix of "mutational distance between amino acids within the standard genetic code" <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> in terms of amino acid size, charge and hydrophobicity. We find that in general, measures of hydrophobicity and volume are closer to the PAM matrix (i.e., are more correlated with observed patterns of amino acid substitution), whereas the small cluster of amino acid indices relating to charge correlate more strongly with the genetic code based matrix. On a simple level, this quick analysis shows that the standard genetic code does indeed contain an element of non-random codon assignments with respect to amino acid charge, as reported in an erratum by Haig and Hurst <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> that replaced their initial rejection of such a link <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. At a deeper level, these results are germane to debates over the flow of causality that links amino acid physiochemical properties to observed patterns of amino acid substitution within proteins &#8211; the mainstream view is that physiochemical properties dominate the pattern by which amino acids substitute for one another, particularly over large stretches of evolutionary time <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. However, there has been some debate as to whether (and to what extent) such patterns can be caused by neutral evolution that substituted amino acids based on their mutational proximity within the standard genetic code, given that the code is non-randomly organized with respect to key amino acid properties <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B41">41</abbr><abbr bid="B38">38</abbr></abbrgrp>. Our quick analysis indicates that physiochemical considerations really are, in fact, more important to long-term protein evolution than can be explained by codon assignments (in that the physiochemical properties are more strongly correlated with observed substitution patterns than with mutational distance within the genetic code; i.e., physiochemical similarity comes to dominate patterns of substitution as evolution proceeds).</p>
			<fig id="F4">
				<title>
					<p>Figure 4</p>
				</title>
				<caption>
					<p>The minimum spanning tree recolored to show each index's similarity to one of two substitution matrices</p>
				</caption>
				<text>
					<p><b>The minimum spanning tree recolored to show each index's similarity to one of two substitution matrices</b>. Specifically, the spanning tree of size, charge, and hydrophobicity (Figure 2) is recolored to indicate whether each amino acid index is more highly correlated with the PAM74-100 amino acid substitution matrix (green) or a matrix of amino acids' proximity within the standard genetic code [8] (brown).</p>
				</text>
				<graphic file="1471-2105-7-329-4"/>
			</fig>
			<p>This same feature of the AAIndex Explorer tool could equally well be used to quickly visualize which properties (and which amino acids) are responsible for the difference between any two substitution matrices (e.g., between a "generalized" or global model of amino acid substitution, as found in a PAM or BLOSUM matrix, and any observed pattern of interchange within a specific protein family or phyletic lineage).</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>In this paper, we present software that facilitates rapid, interactive exploration of data pertaining to three interconnected topics: (i) the multidimensional molecular descriptors of biochemical properties for the twenty proteinaceous amino acids, (ii) the correlation of these biophysical measurements with observed patterns of amino acid substitution (i.e. substitution matrices), and (iii) the causal, biocehmical basis for differences between any two observed patterns of amino acid substitution. This software acts as an intuitive bioinformatic exploration tool that can guide more comprehensive statistical analyses relating to a diverse array of specific research questions.</p>
		</sec>
		<sec>
			<st>
				<p>Availability and requirements</p>
			</st>
			<p>Project name: Amino Acid Explorer</p>
			<p>Project home page: <url>http://www.evolvingcode.net:8080/aaindex/tools/</url></p>
			<p>Operating system(s): Platform independent</p>
			<p>Programming language: Java</p>
			<p>Other requirements:</p>
			<p>&#8226; Use via EvolvingCode's website</p>
			<p>&#9675; Web browser (tested with Internet Explorer, Netscape and Mozilla under Windows and Linux, Safari under Mac OS X 10.3.9)</p>
			<p>&#9675; Java 1.4.2 plug-in for the web browser (or higher version)</p>
			<p>&#8226; Full installation on an independent server</p>
			<p>&#9675; Java 1.4.2 plug-in for the web browser (or higher version) on the client side</p>
			<p>&#9675; JDK 1.4.2 environment on the server</p>
			<p>&#9675; XML Database compliant with XML:DB API (tested with eXist database)</p>
			<p>&#9675; Servlet Web Container matching Servlet API 2.4 specifications (tested with Tomcat 5.0.28)</p>
			<p>&#9675; Xalan XSLT processor</p>
			<p>License: Apache-style open source license</p>
			<p>Any restrictions to use by non-academics: None</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p><b>BB </b>created the local implementation of the AAindex database, including XML schemas, coded the spanning tree software, and wrote the computer science aspects of this paper. <b>SJF </b>came up with the concept of this software, supervised software development, and wrote the biological portions of this paper. <b>MdJ </b>supervised and provided technical expertise for the computer science involved in this project</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>The authors would like to thank the members of their research groups (Freeland Lab and MAPLE Lab) for their comments and support. This work was funded in part by NSF grant <it>DBI-0317349-001</it>. The tool described here contains software developed by TouchGraph LLC <url>http://www.touchgraph.com/</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Performance evaluation of amino acid substitution matrices</p>
				</title>
				<aug>
					<au>
						<snm>Henikoff</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Henikoff</snm>
						<fnm>JG</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>1993</pubdate>
				<volume>17</volume>
				<fpage>49</fpage>
				<lpage>61</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8234244</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Multiple sequence alignment with Clustal X</p>
				</title>
				<aug>
					<au>
						<snm>Jeanmougin</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Gouy</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Gibson</snm>
						<fnm>TJ</fnm>
					</au>
				</aug>
				<source>Trends Biochem Sci</source>
				<pubdate>1998</pubdate>
				<volume>23</volume>
				<fpage>403</fpage>
				<lpage>405</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9810230</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Assessment of predictions submitted for the CASP6 comparative modelling category</p>
				</title>
				<aug>
					<au>
						<snm>Tress</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ezkurdia</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Grana</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Lopez</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2005</pubdate>
				<inpress/>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Fold-specific substitution matrices for protein classification</p>
				</title>
				<aug>
					<au>
						<snm>Vilim</snm>
						<fnm>RB</fnm>
					</au>
					<au>
						<snm>Cunningham</snm>
						<fnm>RM</fnm>
					</au>
					<au>
						<snm>Lu</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Kheradpour</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Stevens</snm>
						<fnm>FJ</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>847</fpage>
				<lpage>853</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">14764567</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Enriching the sequence substitution matrix by structural information</p>
				</title>
				<aug>
					<au>
						<snm>Teodorescu</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Galor</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Pillardy</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Elber</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2004</pubdate>
				<volume>54</volume>
				<fpage>41</fpage>
				<lpage>48</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">14705022</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions</p>
				</title>
				<aug>
					<au>
						<snm>Bastien</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Roy</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Marechal</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>C R Biol</source>
				<pubdate>2005</pubdate>
				<volume>328</volume>
				<fpage>445</fpage>
				<lpage>453</lpage>
				<xrefbib>
					<pubid idtype="pmpid">15948633</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>A mutation data matrix for transmembrane proteins</p>
				</title>
				<aug>
					<au>
						<snm>Jones</snm>
						<fnm>DT</fnm>
					</au>
					<au>
						<snm>Taylor</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Thornton</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>FEBS Letters</source>
				<pubdate>1994</pubdate>
				<volume>339</volume>
				<fpage>269</fpage>
				<lpage>275</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8112466</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>BATMAS30: amino acid substitution matrix for alignment of bacterial transporters</p>
				</title>
				<aug>
					<au>
						<snm>Sutormin</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Rakhmaninova</snm>
						<fnm>AB</fnm>
					</au>
					<au>
						<snm>Gelfand</snm>
						<fnm>MS</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2003</pubdate>
				<volume>51</volume>
				<fpage>85</fpage>
				<lpage>95</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12596266</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Analysis of differences in amino acid substitution patterns, using multilevel G-tests</p>
				</title>
				<aug>
					<au>
						<snm>Pacholczyk</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kimmel</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>C R Biol</source>
				<pubdate>2005</pubdate>
				<volume>328</volume>
				<fpage>632</fpage>
				<lpage>641</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15992746</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions</p>
				</title>
				<aug>
					<au>
						<snm>Yu</snm>
						<fnm>YK</fnm>
					</au>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21</volume>
				<fpage>902</fpage>
				<lpage>911</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15509610</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Model of amino acid substitution in proteins encoded by mitochondrial DNA</p>
				</title>
				<aug>
					<au>
						<snm>Adachi</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Hasegawa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1996</pubdate>
				<volume>42</volume>
				<fpage>459</fpage>
				<lpage>468</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8642615</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules</p>
				</title>
				<aug>
					<au>
						<snm>Feldman</snm>
						<fnm>HJ</fnm>
					</au>
					<au>
						<snm>Dumontier</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Ling</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Haider</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Hogue</snm>
						<fnm>CW</fnm>
					</au>
				</aug>
				<source>FEBS Letters</source>
				<pubdate>2005</pubdate>
				<volume>579</volume>
				<fpage>4685</fpage>
				<lpage>4691</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">16098521</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Chemogenomic profiling: Identifying the functional interactions of small molecules in yeast</p>
				</title>
				<aug>
					<au>
						<snm>Giaever</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Flaherty</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Kumm</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Proctor</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Nislow</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Jaramillo</snm>
						<fnm>DF</fnm>
					</au>
					<au>
						<snm>Chu</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Jordan</snm>
						<fnm>MI</fnm>
					</au>
					<au>
						<snm>Arkin</snm>
						<fnm>AP</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>RW</fnm>
					</au>
				</aug>
				<source>PNAS</source>
				<pubdate>2004</pubdate>
				<volume>101</volume>
				<fpage>793</fpage>
				<lpage>798</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">321760</pubid>
						<pubid idtype="pmpid" link="fulltext">14718668</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks</p>
				</title>
				<aug>
					<au>
						<snm>di Bernardo</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Gardner</snm>
						<fnm>TS</fnm>
					</au>
					<au>
						<snm>Chobot</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Eastwood</snm>
						<fnm>EL</fnm>
					</au>
					<au>
						<snm>Wojtovich</snm>
						<fnm>AP</fnm>
					</au>
					<au>
						<snm>Elliott</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Schaus</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Collins</snm>
						<fnm>JJ</fnm>
					</au>
				</aug>
				<source>Nature Biotechnology</source>
				<pubdate>2005</pubdate>
				<volume>23</volume>
				<fpage>377</fpage>
				<lpage>383</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15765094</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Amino acid difference formula to help explain protein evolution</p>
				</title>
				<aug>
					<au>
						<snm>Grantham</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1974</pubdate>
				<volume>185</volume>
				<fpage>862</fpage>
				<lpage>864</lpage>
				<xrefbib>
					<pubid idtype="pmpid">4843792</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Amino acid substitution during functionally constrained divergent evolution of protein sequences</p>
				</title>
				<aug>
					<au>
						<snm>Benner</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Cohen</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Gonnet</snm>
						<fnm>GH</fnm>
					</au>
				</aug>
				<source>Protein Eng</source>
				<pubdate>1994</pubdate>
				<volume>11</volume>
				<fpage>1323</fpage>
				<lpage>1332</lpage>
			</bibl>
			<bibl id="B17">
				<title>
					<p>An improved method of testing for evolutionary homology</p>
				</title>
				<aug>
					<au>
						<snm>Fitch</snm>
						<fnm>WM</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1966</pubdate>
				<volume>16</volume>
				<fpage>9</fpage>
				<lpage>16</lpage>
				<xrefbib>
					<pubid idtype="pmpid">5917736</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Empirical codon substitution matrix</p>
				</title>
				<aug>
					<au>
						<snm>Schneider</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Cannarozzi</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Gonnet</snm>
						<fnm>GH</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>134</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1173088</pubid>
						<pubid idtype="pmpid" link="fulltext">15927081</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>SimFold energy function for de novo protein structure prediction: Consensus with Rosetta</p>
				</title>
				<aug>
					<au>
						<snm>Fujitsuka</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Chikenji</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Takada</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2005</pubdate>
				<inpress/>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The exchangeability of amino acids in proteins</p>
				</title>
				<aug>
					<au>
						<snm>Yampolsky</snm>
						<fnm>LY</fnm>
					</au>
					<au>
						<snm>Stoltzfus</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2005</pubdate>
				<volume>170</volume>
				<fpage>1459</fpage>
				<lpage>1472</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15944362</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Amino acid similarity matrices based on force fields</p>
				</title>
				<aug>
					<au>
						<snm>Dosztanyi</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Torda</snm>
						<fnm>AE</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>686</fpage>
				<lpage>699</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11524370</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Cluster analysis of amino acid indices for prediction of protein structure and function</p>
				</title>
				<aug>
					<au>
						<snm>Nakai</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kidera</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Kanehisa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Protein Eng</source>
				<pubdate>1988</pubdate>
				<volume>2</volume>
				<fpage>93</fpage>
				<lpage>100</lpage>
				<xrefbib>
					<pubid idtype="pmpid">3244698</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins</p>
				</title>
				<aug>
					<au>
						<snm>Tomii</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kanehisa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Protein Eng</source>
				<pubdate>1996</pubdate>
				<volume>9</volume>
				<fpage>27</fpage>
				<lpage>36</lpage>
				<xrefbib>
					<pubid idtype="pmpid">9053899</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>AAindex: amino acid index database</p>
				</title>
				<aug>
					<au>
						<snm>Kawashima</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kanehisa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>374</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102411</pubid>
						<pubid idtype="pmpid" link="fulltext">10592278</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Optimality of the genetic code with respect to protein stability and amino-acid frequencies</p>
				</title>
				<aug>
					<au>
						<snm>Gilis</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Massar</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Cerf</snm>
						<fnm>NJ</fnm>
					</au>
					<au>
						<snm>Rooman</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>RESEARCH0049</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">60310</pubid>
						<pubid idtype="pmpid" link="fulltext">11737948</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<aug>
					<au>
						<snm>Tollis</snm>
						<fnm>IG</fnm>
					</au>
					<au>
						<snm>Tamassia</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Eades</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Di Battista</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Graph Drawing: Algorithms for the Visualization of Graphs</source>
				<publisher>Pearson Education</publisher>
				<pubdate>1998</pubdate>
			</bibl>
			<bibl id="B27">
				<title>
					<p>TouchGraph Website</p>
				</title>
				<url>http://www.touchgraph.com</url>
			</bibl>
			<bibl id="B28">
				<aug>
					<au>
						<snm>Cormen</snm>
						<fnm>TH</fnm>
					</au>
					<au>
						<snm>Leiserson</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Rivest</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Stein</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Introduction to Algorithms</source>
				<publisher>Cambridge, MA, London: The MIT Press; Boston, MA, Burr Ridge, IL, Dubuque, IA, Madison, WI, New York, NY, San Francisco, CA, St. Louis, MO, Montreal, Toronto: McGraw-Hill Book Company</publisher>
				<edition>Second</edition>
				<pubdate>2001</pubdate>
			</bibl>
			<bibl id="B29">
				<aug>
					<au>
						<snm>Mitchell</snm>
						<fnm>TM</fnm>
					</au>
				</aug>
				<source>Machine Learning</source>
				<publisher>McGraw-Hill Companies</publisher>
				<pubdate>1997</pubdate>
			</bibl>
			<bibl id="B30">
				<title>
					<p>AAindex Website</p>
				</title>
				<url>http://www.genome.ad.jp/dbget/aaindex.html</url>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Evolution of the genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Woese</snm>
						<fnm>CR</fnm>
					</au>
				</aug>
				<source>Naturwissenschaften</source>
				<pubdate>1973</pubdate>
				<volume>60</volume>
				<fpage>447</fpage>
				<lpage>459</lpage>
				<xrefbib>
					<pubid idtype="pmpid">4588588</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>A quantitative measure of error minimisation within the genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Haig</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Hurst</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1991</pubdate>
				<volume>33</volume>
				<fpage>412</fpage>
				<lpage>417</lpage>
				<xrefbib>
					<pubid idtype="pmpid">1960738</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>The genetic code is one in a million</p>
				</title>
				<aug>
					<au>
						<snm>Freeland</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Hurst</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1998</pubdate>
				<volume>47</volume>
				<fpage>238</fpage>
				<lpage>248</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9732450</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>On the coevolution of genes and genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Goodarzi</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Shateri Najafabadi</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Torabi</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2005</pubdate>
				<volume>362</volume>
				<fpage>133</fpage>
				<lpage>140</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">16213111</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>The case for an Error Minimizing Standard Genetic Code</p>
				</title>
				<aug>
					<au>
						<snm>Freeland</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Wu</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Keulmann</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Orig Life Evol Biosph</source>
				<pubdate>2003</pubdate>
				<volume>33</volume>
				<fpage>457</fpage>
				<lpage>477</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">14604186</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>On the fundamental nature and evolution of the genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Woese</snm>
						<fnm>CR</fnm>
					</au>
					<au>
						<snm>Dugre</snm>
						<fnm>DH</fnm>
					</au>
					<au>
						<snm>Saxinger</snm>
						<fnm>WC</fnm>
					</au>
					<au>
						<snm>Dugre</snm>
						<fnm>SA</fnm>
					</au>
				</aug>
				<source>Cold Spring Harb Symp Quant Biol</source>
				<pubdate>1966</pubdate>
				<volume>31</volume>
				<fpage>723</fpage>
				<lpage>736</lpage>
				<xrefbib>
					<pubid idtype="pmpid">5237212</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>A simple measure for displaying the hydropathic character of a protein</p>
				</title>
				<aug>
					<au>
						<snm>Kyte</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>RF</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1982</pubdate>
				<volume>157</volume>
				<fpage>105</fpage>
				<lpage>132</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7108955</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>The origin of the genetic code cannot be studied using measurements based on the PAM matrix because this matrix reflects the code itself, making any such analyses tautologous</p>
				</title>
				<aug>
					<au>
						<snm>Di Giulio</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Theor Biol</source>
				<pubdate>2001</pubdate>
				<volume>208</volume>
				<fpage>141</fpage>
				<lpage>144</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11162059</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>A statistical test of hypotheses on the organization and origin of the genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Szathmary</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Zintzaras</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1992</pubdate>
				<volume>35</volume>
				<fpage>185</fpage>
				<lpage>189</lpage>
				<xrefbib>
					<pubid idtype="pmpid">1518086</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>A quantitative measure of error minimization in the genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Haig</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Hurst</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1999</pubdate>
				<volume>49</volume>
				<fpage>708</fpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10552053</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>On error minimization in a sequential origin of the standard genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Ardell</snm>
						<fnm>DH</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1998</pubdate>
				<volume>47</volume>
				<fpage>1</fpage>
				<lpage>13</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9664691</pubid>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
