<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-8-S4-S3</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Proceedings</dochead>
		<bibl>
			<title>
				<p>CORRIE: enzyme sequence annotation with confidence estimates</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Audit</snm>
					<fnm>Benjamin</fnm>
					<insr iid="I1"/>
					<email>benjamin.audit@ens-lyon.fr</email>
				</au>
				<au id="A2">
					<snm>Levy</snm>
					<mi>D</mi>
					<fnm>Emmanuel</fnm>
					<insr iid="I2"/>
					<insr iid="I3"/>
					<email>emmanuel.levy@gmail.com</email>
				</au>
				<au id="A3">
					<snm>Gilks</snm>
					<mi>R</mi>
					<fnm>Wally</fnm>
					<insr iid="I4"/>
					<insr iid="I5"/>
					<email>wally@maths.leeds.ac.uk</email>
				</au>
				<au id="A4">
					<snm>Goldovsky</snm>
					<fnm>Leon</fnm>
					<insr iid="I2"/>
					<insr iid="I6"/>
					<email>leongo@ebi.ac.uk</email>
				</au>
				<au id="A5" ca="yes">
					<snm>Ouzounis</snm>
					<mi>A</mi>
					<fnm>Christos</fnm>
					<insr iid="I2"/>
					<insr iid="I6"/>
					<insr iid="I7"/>
					<email>ouzounis@certh.gr</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS UMR5672, Ecole Normale Sup&#233;rieure, 46 All&#233;e d'Italie, F-69364 Lyon CEDEX 07, France</p>
				</ins>
				<ins id="I2">
					<p>Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK</p>
				</ins>
				<ins id="I3">
					<p>Current address: Computational Genomics Group, MRC Laboratory of Molecular Biology, Hills Rd, Cambridge CB2 2QH, UK</p>
				</ins>
				<ins id="I4">
					<p>Medical Research Council Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR, UK</p>
				</ins>
				<ins id="I5">
					<p>Current address: Department of Statistics, School of Mathematics, University of Leeds, Leeds LS2 9JT, UK</p>
				</ins>
				<ins id="I6">
					<p>Current address: Computational Genomics Unit, Center for Research &amp; Technology Hellas, PO Box 361, GR-57001 Thessalonica, Greece</p>
				</ins>
				<ins id="I7">
					<p>Current address: Institute of Agrobiotechnology, Center for Research &amp; Technology Hellas, PO Box 361, GR-57001 Thessalonica, Greece</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<supplement>
				<title>
					<p>The Second Automated Function Prediction Meeting</p>
				</title>
				<editor>Ana PC Rodrigues, Barry J Grant, Adam Godzik and Iddo Friedberg</editor>
				<note>Proceedings</note>
				<url>http://www.biomedcentral.com/content/pdf/1471-2105-8-S4-info.pdf</url>
			</supplement>
			<conference>
				<title>
					<p>The Second Automated Function Prediction Meeting</p>
				</title>
				<location>La Jolla, CA, USA</location>
				<date-range>30 August &#8211; 1 September 2006</date-range>
				<url>http://BioFunctionPrediction.org/AFP/afp06</url>
			</conference>
			<issn>1471-2105</issn>
			<pubdate>2007</pubdate>
			<volume>8</volume>
			<issue>Suppl 4</issue>
			<fpage>S3</fpage>
			<url>http://www.biomedcentral.com/1471-2105/8/S4/S3</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17570146</pubid><pubid idtype="doi">10.1186/1471-2105-8-S4-S3</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>22</day>
					<month>5</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>Audit et al; licensee BioMed Central Ltd.</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<p>Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: <url>http://www.genomes.org/services/corrie/</url>.</p>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>The explosion of genome sequencing technologies has resulted in an ever-increasing gap between the discovery of new gene sequences and their experimental characterization. The accumulation of raw sequence data has dictated the use of computational techniques for the inference of their possible functional roles, based on the evolutionary conservation of structure and function. However, this widely used empirical process has not attracted sufficient attention as a fundamental problem in computational biology, requiring rigorous analysis.</p>
			<p>The typical solution to annotation transfer involves the inference of functional properties based on sequence similarity <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. This procedure can be divided into two steps: (i) the establishment of a list of proteins of known function and significant sequence similarity to the uncharacterized sequence <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>; (ii) the selection of those characterized sequences from which the annotation might be transferred <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The procedure relies on the assumption of a strong relationship between protein structure and function. Despite the fact that this hypothesis is strongly supported by various studies <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, there is concern that a blind application of such procedures usually leads to annotation errors <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Two major types of errors can be made: (i) the short-listed homologous protein(s) have a different function from the query sequence (erroneous assignment, despite correct reference); (ii) the transferred annotations are incorrect (erroneous reference, despite correct assignment). The latter type followed by an iterative usage of annotation transfer results in the important problem of error propagation in annotated databases <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr></abbrgrp>. Modeling studies have demonstrated that dramatic consequences on the reliability of database annotations can thus arise, with detrimental effects for the quality and integrity of reference databases <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. One of the challenges for future improvements is the association of function assignments with a measure of reliability that can control annotation quality <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, by excluding spurious annotations. Herein, we address this issue by analysing the Enzyme Classification (EC) hierarchy within a probabilistic framework for the process of homology-based annotation, as a follow-up of a previous theoretical study <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Methods and results</p>
			</st>
			<p>Our approach relies on the usage of a reference dataset such as the EC hierarchy, where protein sequences are pre-classified into (an arbitrary number of) functional classes <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. An assignment corresponds to a membership in a functional class; thus, function sharing becomes an explicit property. The possibility for a protein to belong to a functional class is assessed based on its similarity relationships with all protein sequences that do or do not belong to that class. Most existing methods map functions to proteins via the clustering of proteins based on sequence similarities irrespectively of any function sharing and the compilation of available functional descriptions in the (most relevant) cluster(s) to annotate the uncharacterized sequence(s) <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. An innovative feature of our strategy is that individual sequences are mapped to functional classes, instead of individual functions mapped to sequence classes <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
			<p>We introduced Correspondence Indicators (CIs) as a novel measure to quantify the relationship between a protein sequence and a functional class. A CI results from the combination of pairwise similarity scores between a query sequence of interest and all the members of a functional class <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In our implementation, we use the BLAST bit-scores as a measure of pairwise similarity <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, but other measures can also be used (Figure <figr fid="F1">1</figr>). Herein, we provide an analysis of the ENZYME database <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, examine likely sources of error and announce the interactive server CORRIE.</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Schematic view of the CORRIE annotation framework</p>
				</caption>
				<text>
					<p><b>Schematic view of the CORRIE annotation framework. </b>The only requirement for CORRIE is a classification of sequences. Here, we start with the classification of enzymes found in SwissProt. This enables us to create two tables, one for sequences and one for classes. From pairwise sequence comparisons we derive a score table, which describes all the classes hit by each sequence. BLAST scores are further integrated into correspondence indicators (CIs), which describe the relationship each sequence has with the classes it hits. Next, CIs are integrated to compute the probability that a sequence belongs to a particular class. The table "CI reference" is central to the framework as it constitutes a reference against which new proteins are compared and classified. This is illustrated in Figure 2.</p>
				</text>
				<graphic file="1471-2105-8-S4-S3-1"/>
			</fig>
			<p>The databases used in the present work were the ENZYME database (date:2006-07-12) <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and UniProt/SwissProt (release 50.4, date:2006-07-25; UniProtKB 8.4) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. In total, we have obtained 77,812 proteins annotated as enzymes partitioned into 2,216 EC classes, of which 64,895 proteins partitioned into 827 classes were used: we have excluded enzymes with more than one EC number and all EC classes with ten or less members, as reported previously <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. For sequence searches, we used BLAST (v.2.2.8) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> with a bit-score cut-off threshold of 30. To filter low-complexity regions, we used CAST <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, with a threshold value of 25. The new interactive implementation of the annotation framework is implemented with MySQL (v.4.1) <url>http://www.mysql.org</url>. All the results reported herein concern assignments (re-annotations) obtained with an assignment probability of one (P = 1) using the univariate method with &#945; &#8594; &#8734; i.e. with a CI Y<sub>&#937;j </sub>reduced to the best BLAST hit of the query protein with class &#937;<sub>j </sub><abbrgrp><abbr bid="B10">10</abbr></abbrgrp> (for an example, see Figure <figr fid="F2">2</figr>). As discussed previously, the univariate method has a lower coverage than the multivariate framework <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, yet since it treats the assignment to each class independently, it is more robust with respect to query proteins having more than one EC number assignment.</p>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>Illustration of the probability calculation implemented in CORRIE</p>
				</caption>
				<text>
					<p><b>Illustration of the probability calculation implemented in CORRIE</b>. To annotate a new sequence <it>s</it>, <it>s </it>is first aligned against all proteins in CORRIE. Here, <it>s </it>has similarity with proteins from two distinct classes: A and B. CIs between <it>s </it>and A, and between <it>s </it>and B are calculated [10]. The probability that <it>s </it>belongs to A (i.e. that <it>s </it>has function A) is calculated by comparison of the CI between <it>s </it>and A, with the CIs of proteins that belong or not to A. In this case, the ten proteins closest to <it>s </it>in the CI space are shown in the red dotted rectangle. Since all ten proteins truly belong to A, CORRIE estimates to P = 1 the probability for <it>s </it>to truly belong to A. When considering class B, ten proteins closest to <it>s </it>in the CI space do not belong to B. Therefore, CORRIE estimates to P = 0 the probability for <it>s </it>to truly belong to B. In this case, <it>s </it>would be annotated as having function A with probability 1.</p>
				</text>
				<graphic file="1471-2105-8-S4-S3-2"/>
			</fig>
			<p>First, we followed the exact leave-one-out re-annotation scheme for assignments as described previously, with the updated information for proteins/EC classes <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, for comparison purposes. The overall (mean) performance was somewhat improved. We were able to generate (at P = 1) 59,766 assignments for 59,746 proteins (coverage 92.1%); some proteins may have more than one assignment at P = 1. Also, the number of annotation errors was 90, thus implying an error rate r = 0.15% (90 cases out of 59,766 assignments). Compared to our previous report <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, where we have annotated 28,088 enzymes over 589 classes, we observe an increase in coverage (92.1% compared to 90.6%) and a significant decrease in error rate (0.15% compared to 0.21%), despite a more than two-fold increase of the data.</p>
			<p>Second, we have investigated in more depth the sources of error, by examining the local (specific) error rates. More precisely, we consider the probability that a re-annotation is an error knowing the annotation made by our approach, regardless of the true class, i.e. P(annotation is wrong | annotation by CORRIE). This analysis can only be performed at the P = 1 level because there is not enough information at P levels &lt; 1 (due to the very high coverage of the database at P = 1). The results here are quite impressive: 799 (out of 827) classes have at least one assignment at level P = 1. For 755 of these classes, we did not observe any re-annotation error (again at P = 1). This corresponds to 51,131 out of 59,766 re-annotations, or a coverage level of 86%, with a specific error rate equal to zero. For the remaining 44 classes, there is at least one error recorded, which leads to non-zero specific error rates. These non-zero error rates vary across classes between 100% (1 error for 1 assignment) to 0.24% (4 errors for 1673 assignments). The highest error where the number of errors is more than one is 13.6% (3 errors for 22 assignments). We report all nine cases where the number of errors is more than one (Table <tblr tid="T1">1</tblr>). This information is also available on the web site, to help users assess annotation quality for specific classes in the EC hierarchy where the annotation process can be very challenging.</p>
			<tbl id="T1">
				<title>
					<p>Table 1</p>
				</title>
				<caption>
					<p>Local error rate per EC class, for those cases where there is more than one error.</p>
				</caption>
				<tblbdy cols="5">
					<r>
						<c ca="left">
							<p>
								<b>EC</b>
							</p>
						</c>
						<c ca="left">
							<p>
								<b>Errors</b>
							</p>
						</c>
						<c ca="left">
							<p>
								<b>Assignments</b>
							</p>
						</c>
						<c ca="left">
							<p>
								<b>Error %</b>
							</p>
						</c>
						<c ca="left">
							<p>
								<b>Description</b>
							</p>
						</c>
					</r>
					<r>
						<c cspan="5">
							<hr/>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>3.2.1.4</p>
						</c>
						<c ca="left">
							<p>3</p>
						</c>
						<c ca="left">
							<p>22</p>
						</c>
						<c ca="left">
							<p>13.64</p>
						</c>
						<c ca="left">
							<p>Cellulase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>3.2.1.8</p>
						</c>
						<c ca="left">
							<p>3</p>
						</c>
						<c ca="left">
							<p>29</p>
						</c>
						<c ca="left">
							<p>10.34</p>
						</c>
						<c ca="left">
							<p>Endo-1,4-beta-xylanase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>2.4.1.21</p>
						</c>
						<c ca="left">
							<p>4</p>
						</c>
						<c ca="left">
							<p>99</p>
						</c>
						<c ca="left">
							<p>4.04</p>
						</c>
						<c ca="left">
							<p>Starch synthase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>1.6.5.3</p>
						</c>
						<c ca="left">
							<p>9</p>
						</c>
						<c ca="left">
							<p>457</p>
						</c>
						<c ca="left">
							<p>1.97</p>
						</c>
						<c ca="left">
							<p>NADH dehydrogenase (ubiquinone)</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>2.7.11.1</p>
						</c>
						<c ca="left">
							<p>14</p>
						</c>
						<c ca="left">
							<p>819</p>
						</c>
						<c ca="left">
							<p>1.71</p>
						</c>
						<c ca="left">
							<p>Non-specific Ser/Thr protein kinase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>1.1.1.37</p>
						</c>
						<c ca="left">
							<p>2</p>
						</c>
						<c ca="left">
							<p>208</p>
						</c>
						<c ca="left">
							<p>0.96</p>
						</c>
						<c ca="left">
							<p>Malate dehydrogenase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>3.6.3.14</p>
						</c>
						<c ca="left">
							<p>14</p>
						</c>
						<c ca="left">
							<p>1904</p>
						</c>
						<c ca="left">
							<p>0.74</p>
						</c>
						<c ca="left">
							<p>H<sup>+</sup>-transporting two-sector ATPase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>4.2.1.33</p>
						</c>
						<c ca="left">
							<p>2</p>
						</c>
						<c ca="left">
							<p>310</p>
						</c>
						<c ca="left">
							<p>0.65</p>
						</c>
						<c ca="left">
							<p>3-isopropylmalate dehydratase</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>2.7.7.6</p>
						</c>
						<c ca="left">
							<p>4</p>
						</c>
						<c ca="left">
							<p>1673</p>
						</c>
						<c ca="left">
							<p>0.24</p>
						</c>
						<c ca="left">
							<p>DNA-directed RNA polymerase</p>
						</c>
					</r>
				</tblbdy>
				<tblfn>
					<p>Column names: EC &#8211; EC number assignment by CORRIE; Errors &#8211; number of errors assigned to this class; Assignments &#8211; total number of assignments to this class; Error % &#8211; the local error rate; Description &#8211; the description of the corresponding EC reaction.</p>
				</tblfn>
			</tbl>
			<p>Third, we defined a distance measure in the re-annotation space in order to obtain a better understanding of the structure/function relationship for enzymes. This measure, denoted as &#948; (i &#8594; j) = N<sub>ij</sub>/N<sub>i</sub>, is the rate of re-annotation of proteins to class j, while they truly belong to class i; Ni is the number of proteins truly in class i, and Nj is the count of those assigned to class j. Note that this measure is not symmetric, i.e. &#948; (i &#8594; j) &#8800; &#948; (j &#8594; i). For i = j, the &#948; measure provides a measure of recall, or in other words, it indicates whether there exists a high level of sequence specificity within class i. Typical example cases of low recall for two large families are for EC 1.10.2.2 (ubiquinol-cytochrome c reductase) <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, where &#948; = 13/89 (15%) and for EC 3.2.1.4 (cellulase) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, where &#948; = 19/104 (18%). For i &#8800; j, high values of the &#948; measure imply that errors are specifically made from class i to j (as opposed to errors randomly distributed over all classes). Hence, high values for &#948; (i &#8594; j) and &#948; (j &#8594; i) strongly suggest that merging the two classes would result in a much improved assignment of function based on sequence, or that those specific sequences do not contain enough information to distinguish the two enzymatic functions within the EC hierarchy. We report all six cases where the number of errors is more than two (Table <tblr tid="T2">2</tblr>), a surprisingly low number which demonstrates the high quality of assignments made by CORRIE in a series of control experiments.</p>
			<tbl id="T2">
				<title>
					<p>Table 2</p>
				</title>
				<caption>
					<p>Overlapping EC classes, for those cases where there are more than two errors from a true EC class to an assigned EC class.</p>
				</caption>
				<tblbdy cols="6">
					<r>
						<c ca="center">
							<p>
								<b>True EC</b>
							</p>
						</c>
						<c ca="center">
							<p>
								<b>Name of true class</b>
							</p>
						</c>
						<c ca="center">
							<p>
								<b>Assigned EC</b>
							</p>
						</c>
						<c ca="center">
							<p>
								<b>Name of assigned class</b>
							</p>
						</c>
						<c ca="center">
							<p>
								<b>Common activity</b>
							</p>
						</c>
						<c ca="center">
							<p>
								<b>Difference</b>
							</p>
						</c>
					</r>
					<r>
						<c cspan="6">
							<hr/>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>2.7.7.7</p>
						</c>
						<c ca="left">
							<p>DNA-directed DNA polymerase</p>
						</c>
						<c ca="left">
							<p>2.7.7.6</p>
						</c>
						<c ca="left">
							<p>DNA-directed RNA polymerase</p>
						</c>
						<c ca="left">
							<p>DNA-dependent nucleotidyltransferase</p>
						</c>
						<c ca="left">
							<p>Substrate: DNA or RNA</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>1.6.99.5</p>
						</c>
						<c ca="left">
							<p>NADH dehydrogenase (quinone)</p>
						</c>
						<c ca="left">
							<p>1.6.5.3</p>
						</c>
						<c ca="left">
							<p>NADH dehydrogenase (ubiquinone)</p>
						</c>
						<c ca="left">
							<p>NADH dehydrogenase</p>
						</c>
						<c ca="left">
							<p>Electron acceptor: quinone or ubiquinone</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>3.2.1.91</p>
						</c>
						<c ca="left">
							<p>Cellulose 1,4-beta-cellobiosidase</p>
						</c>
						<c ca="left">
							<p>3.2.1.4</p>
						</c>
						<c ca="left">
							<p>Cellulase</p>
						</c>
						<c ca="left">
							<p>Hydrolysis of 1,4-beta-D-glucosidic linkages</p>
						</c>
						<c ca="left">
							<p>Exo-hydrolysis or endo-hydrolysis</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>2.7.1.137</p>
						</c>
						<c ca="left">
							<p>Phosphatidylinositol 3-kinase</p>
						</c>
						<c ca="left">
							<p>2.7.11.1</p>
						</c>
						<c ca="left">
							<p>Non-specific Ser/Thr protein kinase</p>
						</c>
						<c ca="left">
							<p>Kinase</p>
						</c>
						<c ca="left">
							<p>Substrate: PI3 or Ser/Thr</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>2.4.1.242</p>
						</c>
						<c ca="left">
							<p>NDP-glucose &#8211; starch glucosyltransferase</p>
						</c>
						<c ca="left">
							<p>2.4.1.21</p>
						</c>
						<c ca="left">
							<p>Starch synthase</p>
						</c>
						<c ca="left">
							<p>Starch glucosyltransferase</p>
						</c>
						<c ca="left">
							<p>Substrate: NDP-glucose or ADP-glucose</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>3.6.3.15</p>
						</c>
						<c ca="left">
							<p>Sodium-transporting two-sector ATPase</p>
						</c>
						<c ca="left">
							<p>3.6.3.14</p>
						</c>
						<c ca="left">
							<p>H<sup>+</sup>-transporting two-sector ATPase</p>
						</c>
						<c ca="left">
							<p>Ion transporting two sector ATPase</p>
						</c>
						<c ca="left">
							<p>Ion specificity: NA<sup>+ </sup>or H<sup>+</sup></p>
						</c>
					</r>
				</tblbdy>
				<tblfn>
					<p>Column names: True EC/Name &#8211; the real EC number/name; Assigned EC/Name &#8211; the assigned properties made by CORRIE; Common activity/Difference &#8211; similarities and differences of substrate specificity and mechanisms for the corresponding reaction pairs.</p>
				</tblfn>
			</tbl>
			<p>Finally, we have implemented this strategy into a web-server called CORRIE implemented using MySQL and we announce its availability for wider use by the community. The software requires a reference set of protein sequences, their association to a functional classification and an all-vs-all similarity table. Then, for any unclassified query sequence, CORRIE generates a probability for its membership to a functional class. CORRIE has been made accessible at <url>http://www.genomes.org/services/corrie/</url>; a downloadable version will follow soon. The format of the results is simple &#8211; by providing a query sequence, the user obtains the following information: the query sequence identifier, the original description (from the FASTA file format), an internal CORRIE protein identifier for retrieval purposes, the assignment probability, the predicted EC class, the EC description, and the local error rate for the specific class (as a guide for the quality of annotations) (Figure <figr fid="F1">1</figr>). The server provides all annotations obtained by CORRIE (including those with P &lt; 1). The users may also use different &#945; values and the multivariate framework. Users can also browse through various results so that they can refine their assessment of annotation quality and generally explore structure/function relationships within the entire sequence space of proteins known to be associated with enzymatic functions.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>We have previously developed a framework for the probabilistic annotation of enzymes into the functional classes of the EC hierarchy <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. We have now extended this work using a larger reference database, and have reduced the error rates significantly while maintaining a coverage of &gt;90%. We have also examined the local errors made in this assignment process and identified those EC classes more prone to non-specific structure/function relationships. Finally, we have made the system available as an interactive web server for the exploration of enzyme sequence space.</p>
			<p>It is interesting to note that most errors reported (Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>) occur between closely related EC classes. This is particularly evident in cases where the similarity and difference of the function between overlapping classes is described (Table <tblr tid="T2">2</tblr>). In all six cases, the overall function remains the same while the difference lies in substrate specificity or the reaction mechanism. Recent studies have shown that substrate specificity in four of these twelve overlapping classes can be modulated with a small number of mutations. For instance, it has been reported recently that a RNA polymerase function was obtained from a DNA polymerase using <it>in vitro </it>compartmentalization, and a mutant with a single mutation was among the optimal mutants at synthesizing RNA <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Also, in the case of a transporting ATPase, the specificity of transport from H<sup>+ </sup>to Li<sup>+ </sup>was achieved by just four mutations <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
			<p>Beyond the issue of functional specificity, there is also an aspect of biological reality in the problematic cases, in terms of overlapping enzyme properties. In other words, these classes might represent activities that co-exist in the same enzyme. In the previous example of the DNA polymerase, it has also been reported that a mutant with just five mutations maintained a DNA polymerase activity, demonstrating that both these activities co-exist <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Also, in the case of glucanases, co-existence of endo- and exo-activities has been observed in some enzymes <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Finally, with starch glucosyltransferases, CORRIE annotates ADP-glucose specific enzymes as being NDP-glucose specific, which is less accurate yet valid.</p>
			<p>These examples illustrate the intricate nature of the sequence-function relationship found among those few cases that CORRIE fails to annotate correctly, and point to the limitation of using sequence similarity as a distance measure between enzymes. Therefore, we envisage implementing other methods in CORRIE in the near future. For example, the sequences within each class could be used to create one or more sequence profiles against which a new sequence could be aligned to produce an alternative CI measure, possibly focusing on key residues <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. This would increase the sensitivity and specificity to a point where these ambiguous classes can be detected accurately.</p>
			<p>One shortcoming of CORRIE, since it is based on the ENZYME database for validation purposes, is the implicit assumption that the query sequences are enzymes. A possible future development would be the explicit detection of enzyme sequences from similarity information. Schemes that have addressed the issue of enzyme recognition have been previously proposed <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. This can be achieved by an all-vs-all comparison and further classification using CORRIE, with the entire UniProt database. In that setting, hypothetical proteins that would match known enzyme classes, could readily be assigned to specific EC numbers, with the proper probabilistic measures attached to them. Currently, this is possible, but the error rate is certainly under-estimated. Finally, the extension to other classification schemes (and semantically richer formats) will facilitate the assignment of protein sequences to various aspects of biological function beyond the EC hierarchy.</p>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st>
			<p>The authors declare that they have no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>BA, LG and CAO participated in the design and coordination of the study. BA, LG and EDL developed the software code and the web site. All authors have drafted the manuscript, and subsequently have read and approved the final manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>The CGU at CERTH is supported by the Networks of Excellence <it>BioSapiens </it>(contract number LSHG-CT-2003-503265) and <it>ENFIN </it>(LSHG-CT-2005-518254), both funded by the European Commission.</p>
				<p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 8, Supplement 4, 2007: The Second Automated Function Prediction Meeting. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/8?issue=S4</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Bioinformatics: from genome data to biological knowledge</p>
				</title>
				<aug>
					<au>
						<snm>Andrade</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Sander</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Curr Opin Biotechnol</source>
				<pubdate>1997</pubdate>
				<volume>8</volume>
				<fpage>675</fpage>
				<lpage>83</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0958-1669(97)80118-8</pubid>
						<pubid idtype="pmpid" link="fulltext">9425655</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>The past, present and future of genome-wide re-annotation</p>
				</title>
				<aug>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Karp</snm>
						<fnm>PD</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>COMMENT2001</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">139008</pubid>
						<pubid idtype="pmpid" link="fulltext">11864365</pubid>
						<pubid idtype="doi">10.1186/gb-2002-3-2-comment2001</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>What we do not know about sequence analysis and sequence databases</p>
				</title>
				<aug>
					<au>
						<snm>Karp</snm>
						<fnm>PD</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>753</fpage>
				<lpage>4</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/14.9.753</pubid>
						<pubid idtype="pmpid" link="fulltext">10366280</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores</p>
				</title>
				<aug>
					<au>
						<snm>Wilson</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Kreychman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>297</volume>
				<fpage>233</fpage>
				<lpage>49</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2000.3550</pubid>
						<pubid idtype="pmpid" link="fulltext">10704319</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Whole-genome sequence annotation: 'Going wrong with confidence'</p>
				</title>
				<aug>
					<au>
						<snm>Kyrpides</snm>
						<fnm>NC</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>1999</pubdate>
				<volume>32</volume>
				<fpage>886</fpage>
				<lpage>7</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2958.1999.01380.x</pubid>
						<pubid idtype="pmpid" link="fulltext">10361291</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Predicting functions from protein sequences &#8211; where are the bottlenecks?</p>
				</title>
				<aug>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nat Genet</source>
				<pubdate>1998</pubdate>
				<volume>18</volume>
				<fpage>313</fpage>
				<lpage>8</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/ng0498-313</pubid>
						<pubid idtype="pmpid" link="fulltext">9537411</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Intrinsic errors in genome annotation</p>
				</title>
				<aug>
					<au>
						<snm>Devos</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2001</pubdate>
				<volume>17</volume>
				<fpage>429</fpage>
				<lpage>31</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(01)02348-4</pubid>
						<pubid idtype="pmpid" link="fulltext">11485799</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Can sequence determine function?</p>
				</title>
				<aug>
					<au>
						<snm>Gerlt</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Babbitt</snm>
						<fnm>PC</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2000</pubdate>
				<volume>1</volume>
				<fpage>REVIEWS0005</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">138884</pubid>
						<pubid idtype="pmpid" link="fulltext">11178260</pubid>
						<pubid idtype="doi">10.1186/gb-2000-1-5-reviews0005</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Modeling the percolation of annotation errors in a database of protein sequences</p>
				</title>
				<aug>
					<au>
						<snm>Gilks</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Audit</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>De Angelis</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Tsoka</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>1641</fpage>
				<lpage>9</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.12.1641</pubid>
						<pubid idtype="pmpid" link="fulltext">12490449</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Probabilistic annotation of protein sequences based on functional classifications</p>
				</title>
				<aug>
					<au>
						<snm>Levy</snm>
						<fnm>ED</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Gilks</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Audit</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>302</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1361783</pubid>
						<pubid idtype="pmpid" link="fulltext">16354297</pubid>
						<pubid idtype="doi">10.1186/1471-2105-6-302</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Automatic annotation of protein function based on family identification</p>
				</title>
				<aug>
					<au>
						<snm>Abascal</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>2003</pubdate>
				<volume>53</volume>
				<fpage>683</fpage>
				<lpage>92</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/prot.10449</pubid>
						<pubid idtype="pmpid" link="fulltext">14579359</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Statistically rigorous automated protein annotation</p>
				</title>
				<aug>
					<au>
						<snm>Krebs</snm>
						<fnm>WG</fnm>
					</au>
					<au>
						<snm>Bourne</snm>
						<fnm>PE</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<fpage>1066</fpage>
				<lpage>73</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bth039</pubid>
						<pubid idtype="pmpid" link="fulltext">14764575</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Adaptive algorithm of automated annotation</p>
				</title>
				<aug>
					<au>
						<snm>Leontovich</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Brodsky</snm>
						<fnm>LI</fnm>
					</au>
					<au>
						<snm>Drachev</snm>
						<fnm>VA</fnm>
					</au>
					<au>
						<snm>Nikolaev</snm>
						<fnm>VK</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>838</fpage>
				<lpage>44</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.6.838</pubid>
						<pubid idtype="pmpid" link="fulltext">12075019</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Basic local alignment search tool</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Gish</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Myers</snm>
						<fnm>EW</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1990</pubdate>
				<volume>215</volume>
				<fpage>403</fpage>
				<lpage>10</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">2231712</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>The ENZYME database in 2000</p>
				</title>
				<aug>
					<au>
						<snm>Bairoch</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>304</fpage>
				<lpage>5</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102465</pubid>
						<pubid idtype="pmpid" link="fulltext">10592255</pubid>
						<pubid idtype="doi">10.1093/nar/28.1.304</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>The Universal Protein Resource (UniProt): an expanding universe of protein information</p>
				</title>
				<aug>
					<au>
						<snm>Wu</snm>
						<fnm>CH</fnm>
					</au>
					<au>
						<snm>Apweiler</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Bairoch</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Barker</snm>
						<fnm>WC</fnm>
					</au>
					<au>
						<snm>Boeckmann</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Ferro</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gasteiger</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lopez</snm>
						<fnm>R</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2006</pubdate>
				<volume>34</volume>
				<fpage>D187</fpage>
				<lpage>91</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1347523</pubid>
						<pubid idtype="pmpid" link="fulltext">16381842</pubid>
						<pubid idtype="doi">10.1093/nar/gkj161</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts</p>
				</title>
				<aug>
					<au>
						<snm>Promponas</snm>
						<fnm>VJ</fnm>
					</au>
					<au>
						<snm>Enright</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Tsoka</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kreil</snm>
						<fnm>DP</fnm>
					</au>
					<au>
						<snm>Leroy</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Hamodrakas</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Sander</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2000</pubdate>
				<volume>16</volume>
				<fpage>915</fpage>
				<lpage>22</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/16.10.915</pubid>
						<pubid idtype="pmpid" link="fulltext">11120681</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Puzzling subunits of mitochondrial cytochrome reductase</p>
				</title>
				<aug>
					<au>
						<snm>Weiss</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Leonard</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Neupert</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Trends Biochem Sci</source>
				<pubdate>1990</pubdate>
				<volume>15</volume>
				<fpage>178</fpage>
				<lpage>80</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0968-0004(90)90155-5</pubid>
						<pubid idtype="pmpid">2163130</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Cellulose, cellulases and cellulosomes</p>
				</title>
				<aug>
					<au>
						<snm>Bayer</snm>
						<fnm>EA</fnm>
					</au>
					<au>
						<snm>Chanzy</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lamed</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Shoham</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>Curr Opin Struct Biol</source>
				<pubdate>1998</pubdate>
				<volume>8</volume>
				<fpage>548</fpage>
				<lpage>57</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0959-440X(98)80143-7</pubid>
						<pubid idtype="pmpid" link="fulltext">9818257</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Directed evolution of DNA polymerase, RNA polymerase and reverse transcriptase activity in a single polypeptide</p>
				</title>
				<aug>
					<au>
						<snm>Ong</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Loakes</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Jaroslawski</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Too</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Holliger</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2006</pubdate>
				<volume>361</volume>
				<fpage>537</fpage>
				<lpage>50</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.jmb.2006.06.050</pubid>
						<pubid idtype="pmpid" link="fulltext">16859707</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Changing the ion binding specificity of the <it>Escherichia coli </it>H(+)-transporting ATP synthase by directed mutagenesis of subunit c</p>
				</title>
				<aug>
					<au>
						<snm>Zhang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Fillingame</snm>
						<fnm>RH</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1995</pubdate>
				<volume>270</volume>
				<fpage>87</fpage>
				<lpage>93</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.270.1.87</pubid>
						<pubid idtype="pmpid" link="fulltext">7814424</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Structural basis for the exocellulase activity of the cellobiohydrolase CbhA from <it>Clostridium thermocellum </it></p>
				</title>
				<aug>
					<au>
						<snm>Schubot</snm>
						<fnm>FD</fnm>
					</au>
					<au>
						<snm>Kataeva</snm>
						<fnm>IA</fnm>
					</au>
					<au>
						<snm>Chang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Shah</snm>
						<fnm>AK</fnm>
					</au>
					<au>
						<snm>Ljungdahl</snm>
						<fnm>LG</fnm>
					</au>
					<au>
						<snm>Rose</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>BC</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2004</pubdate>
				<volume>43</volume>
				<fpage>1163</fpage>
				<lpage>70</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/bi030202i</pubid>
						<pubid idtype="pmpid" link="fulltext">14756552</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>A method to predict functional residues in proteins</p>
				</title>
				<aug>
					<au>
						<snm>Casari</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Sander</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Valencia</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Nat Struct Biol</source>
				<pubdate>1995</pubdate>
				<volume>2</volume>
				<fpage>171</fpage>
				<lpage>8</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nsb0295-171</pubid>
						<pubid idtype="pmpid">7749921</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>An evolutionary trace method defines binding surfaces common to protein families</p>
				</title>
				<aug>
					<au>
						<snm>Lichtarge</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Bourne</snm>
						<fnm>HR</fnm>
					</au>
					<au>
						<snm>Cohen</snm>
						<fnm>FE</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1996</pubdate>
				<volume>257</volume>
				<fpage>342</fpage>
				<lpage>58</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1996.0167</pubid>
						<pubid idtype="pmpid" link="fulltext">8609628</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Prediction of enzyme classification from protein sequence without the use of sequence similarity</p>
				</title>
				<aug>
					<au>
						<snm>des Jardins</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Karp</snm>
						<fnm>PD</fnm>
					</au>
					<au>
						<snm>Krummenacker</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>TJ</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Proc Int Conf Intell Syst Mol Biol</source>
				<pubdate>1997</pubdate>
				<volume>5</volume>
				<fpage>92</fpage>
				<lpage>9</lpage>
				<xrefbib>
					<pubid idtype="pmpid">9322021</pubid>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
