<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-484</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>A statistical score for assessing the quality of multiple sequence alignments</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Ahola</snm>
               <fnm>Virpi</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>virpi.ahola@mtt.fi</email>
            </au>
            <au id="A2">
               <snm>Aittokallio</snm>
               <fnm>Tero</fnm>
               <insr iid="I3"/>
               <insr iid="I6"/>
               <email>tero.aittokallio@utu.fi</email>
            </au>
            <au id="A3">
               <snm>Vihinen</snm>
               <fnm>Mauno</fnm>
               <insr iid="I4"/>
               <insr iid="I5"/>
               <email>mauno.vihinen@uta.fi</email>
            </au>
            <au id="A4">
               <snm>Uusipaikka</snm>
               <fnm>Esa</fnm>
               <insr iid="I2"/>
               <email>esa.uusipaikka@utu.fi</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Biotechnology and Food Research, MTT Agrifood Research Finland, Jokioinen, Finland</p>
            </ins>
            <ins id="I2">
               <p>Department of Statistics, University of Turku, Turku, Finland</p>
            </ins>
            <ins id="I3">
               <p>Department of Mathematics, University of Turku, Turku, Finland</p>
            </ins>
            <ins id="I4">
               <p>Institute of Medical Technology, University of Tampere, Tampere, Finland</p>
            </ins>
            <ins id="I5">
               <p>Research Unit, Tampere University Hospital, Tampere, Finland</p>
            </ins>
            <ins id="I6">
               <p>Systems Biology Unit, Institut Pasteur, Paris, France</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>484</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/484</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17081313</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-484</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>11</day>
               <month>4</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>03</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>03</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Ahola et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>To address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and <it>&#946;</it>-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the difference between the Probcons, TCoffee and Muscle is mostly insignificant. The novel alignment quality score provides similar results than the sum of pairs method.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A wealth of molecular data concerning the linear structure of proteins and nucleic acids is available in the form of DNA, RNA and protein sequences. Multiple sequence alignment has become an essential and widely used tool for understanding the structure and function of these molecules. The results of annotation of gene/protein sequences, prediction of protein structures or building of phylogenetic trees, for instance, are critically dependent on the quality of the given alignment. It has been recognized that the automatic construction of a multiple sequence alignment for a set of remotely related sequences can be a very demanding task. Therefore, there is a need for an objective approach to evaluate the alignments produced by alignment programs.</p>
         <p>Two popular measures for scoring entire multiple alignments are the sum of pairs (SP) score and the column score (CS) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. These scores can, however, only be used if a reference alignment of the same sequences is available. The SP score calculates the proportion of identically aligned residue pairs in the test and the reference alignments, whereas the CS score measures the fraction of identically aligned positions. Several modifications have been made to the SP score <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. The APDB (Analyze alignments with PDB) quality measure evaluates the quality of an alignment by using available tertiary structures of the sequences in the alignment <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The recently introduced multiple overlap score (MOS) is a promising approach, which does not need a reference alignment <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The MOS searches for identically aligned regions in many alignments and presumes that the alignment with the highest number of such residues also has the highest quality.</p>
         <p>We introduce a statistical alignment quality score which first quantifies the degree of conservation at each alignment position and then counts the number of significantly conserved positions over the alignment. For measuring the degree of conservation, we use a type of <it>Z</it>-score that is based on profile analysis <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. After deriving the maximum <it>Z</it>-score for positional conservation, the statistical significance of an observed score value is estimated using the importance sampling method <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The full alignment quality score is defined in terms of positional significance levels, where the multiple comparison problem is addressed with false discovery rates (FDR) <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The practical performance of the maxZ score is demonstrated using the SH2 domain, Ras-like proteins, peptidase M13, subtilase and <it>&#946;</it>-lactamase families. The alignment quality score is finally applied to evaluate the alignment programs Clustal <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, TCoffee <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, Dialign2 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, Probcons <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, Muscle <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, and Mafft <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <sec>
            <st>
               <p>Related work</p>
            </st>
            <p>Several approaches have been proposed for the conservation analysis of multiple sequence alignments to quantify the degree of conservation at each aligned position using column-specific score values <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Valdar reviewed a wide range of such score types developed during the last two decades for protein sequence analysis <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. He also introduced the following three criteria that a positional conservation score should fulfill: (i) the score should be a mathematical mapping from an alignment position into a bounded interval of real values which (ii) takes into account the relative symbol frequencies in the column, and (iii) their stereo-chemical properties. Additional requirements for a good conservation score include the possibility to incorporate (iv) the effect of gaps and (v) sequence weighting into (vi) a simple scoring strategy.</p>
            <p>Existing positional scoring approaches can be roughly divided into two categories with respect to the second and third criteria. In the first category, the positional conservation is characterized based on the symbol frequencies only. Such frequency-based methods include, for instance, the information-content score that quantify the variability among the observed symbols at a particular position by means of Shannon's entropy <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. A popular variation of the information-content (IC) score measures the Kullback-Leibler distance (relative entropy) between the observed symbol distribution and a background distribution of <it>a priori </it>symbol probabilities <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The background probability of an individual symbol may be calculated from the complete alignment, possibly supplemented with symbol-dependent pseudo-counts <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Alternatively, <it>a priori </it>distribution can be determined using overall relative frequencies of symbols within the sequences of the organism or protein family under investigation.</p>
            <p>In the second category of scoring approaches, the positional conservation is characterized based on both symbol frequencies and their similarity properties. Such similarity-based scores address the fact that some symbol combinations occur more frequently than others mainly because of the chemical and physical properties. The most straightforward strategy is to group all the symbols according to their physicochemical properties before applying a particular scoring scheme. For instance, Taylor presented a classification of amino acids based on their synthesis in the Dayhoff mutation data matrix <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Subsequently, the degree of positional conservation with respect to each overlapping group of symbols can be quantified using any frequency-based scoring approach, such as the information content <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Different conservation scores accounting for the stereochemical sensitivity can be obtained using different symbol properties <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
            <p>In general, the symbol properties can be considered by predefining an appropriate matrix where entries represent the similarity or dissimilarity between a symbol pair. Frequently used symbol scoring matrices for amino acids include the BLOSUM and Gonnet series of substitution matrices and PAM distance matrices <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Perhaps the most widely used scoring approach, 'sum-of-pairs', characterizes the positional conservation by calculating the sum of all pairwise similarities between the symbols in the particular column <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. It should be noted, that this 'sum of pairs' score is different from the SP score mentioned earlier in the Background section. The SP score in <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> is used to measure alignment quality with respect to the reference alignment, whereas the score by Carillo and Lipman <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> is more generally applicable. In this work, we only use the reference alignment-based SP score. A similar but more complex mean distance (MD) score is used as an objective function in the multiple alignment software Clustal <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. This normalized MD score also considers the fraction of gaps <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. A number of variations can be made by using different similarity matrices on symbols or weighting schemes on sequences <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <p>The present work is a continuation of our previous work on a statistical (Dunn-Sidak) framework for detecting conserved residues in the positions of a multiple sequence alignment <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Here, we allow for the incorporation of any symbol similarity matrix into the framework that was based on simple frequency-based scoring function. We have previously demonstrated the usefulness of this score in the automatic detection of the conserved residues in a multiple sequence alignment, and compared its results on the SH2 domain with functionally and structurally important positions of the alignment <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Another application of the conservation scores includes the improvement of the reliability of HMMs in the sequence similarity search by decreasing the number of false positive search results <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. In the present study, the emphasis is on positional conservation rather than on individual residues with the aim of assessing the quality of full alignment.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Evaluating the maxZ score for positional conservation</p>
            </st>
            <p>In this section, we study the practical performance of the maxZ score in SH2 domain, Ras-like proteins, peptidase M13, subtilase and <it>&#946;</it>-lactamase familes. We first demonstrate the effect of five different scoring matrices and then we compare the performance of maxZ score with those of information content (IC) and Mean Distance (MD) score <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B9">9</abbr></abbrgrp>. Finally, we demonstrate how the maxZ score can be used to generate a consensus sequence.</p>
            <sec>
               <st>
                  <p>Multiple sequence alignments</p>
               </st>
               <p>We used the multiple sequence alignments of the SH2 domains, Ras-like proteins, peptidase M13, subtilase and <it>&#946;</it>-lactamase families to evaluate the maxZ score. The alignments for the SH2 domain, peptidase M13, subtilase and <it>&#946;</it>-lactamase families were obtained from the Pfam database <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. The seed alignments of the SH2 domain, peptidase M13, subtilases and <it>&#946;</it>-lactamases consist of 58, 24, 45 and 128 sequences, respectively. These alignments also include gaps. The sequence alignment of the Ras-like proteins was downloaded from the web page of an article by Oliveira <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. The alignment was build with a two-step alignment procedure <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. First they classified sequences into groups with approximately 90% pairwise sequence identity. Sequences within each subgroup were aligned against the profile, then the groups were aligned, excluding positions with low sequence identity. The positions with gaps were also excluded from the final alignment. We used only the first sequence of each subgroup in order to avoid over-representation of profiles with many very similar sequences. This was necessary because the current maxZ score does not take the pairwise identity of the sequences into account or otherwise weight the sequences. The alignment of Ras-like proteins consists of 334 sequences.</p>
               <p>Upper panels of the Figures <figr fid="F1">1</figr>, <figr fid="F2">2</figr>, <figr fid="F3">3</figr> illustrate parts of the alignments of the Ras-like proteins, SH2 domain, peptidase M13, subtilase and <it>&#946;</it>-lactamase families. The complete alignments of the Ras-like proteins and SH2 domain can be found as additional files (Additional files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr>, <supplr sid="S7">7</supplr>, <supplr sid="S8">8</supplr>, <supplr sid="S9">9</supplr>). The figures were generated using MultiDisp graphics program developed to visualize multiple sequence alignments <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> (Riikonen <it>et al</it>., in preparation). The lower parts of the alignments include the maxZ, MD and IC score values. The Blosum62 and grouping of amino acids were used as a scoring matrix in the maxZ score.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>MultiDisp visualization of part of the Ras-like proteins (upper) and the corresponding scaled -log(p)-values (lower)</p>
                  </caption>
                  <text>
                     <p><b>MultiDisp visualization of part of the Ras-like proteins (upper) and the corresponding scaled -log(p)-values (lower)</b>. The curves show the <it>p</it>-values calculated using (red) Blosum62, (green) Gonnet250, (black) PAM250, (magenta) identity scoring matrices and (blue) classification of the amino acids for the Ras-like proteins.</p>
                  </text>
                  <graphic file="1471-2105-7-484-1"/>
               </fig>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>MultiDisp visualization of the a) <it>&#946;B</it>-stand, b) <it>&#946;D</it>-stand and c) <it>&#945;B</it>-helix of the SH2 domain (upper) and the corresponding conservation scores (lower)</p>
                  </caption>
                  <text>
                     <p><b>MultiDisp visualization of the a) <it>&#946;B</it>-stand, b) <it>&#946;D</it>-stand and c) <it>&#945;B</it>-helix of the SH2 domain (upper) and the corresponding conservation scores (lower)</b>. The curves show (red) the scaled -log(<it>p</it>)-values, (blue) Mean Distance and (green) Information content scores for the alignment. Consensus sequence for the alignment positions in c) is F P S L P E L V E H Y.</p>
                  </text>
                  <graphic file="1471-2105-7-484-2"/>
               </fig>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>MultiDisp visualization of the a) I, b) II, c) III and d) IV motifs of the peptidase M13, e) I, f) II and g) III motifs of the subtilase, and h) I and i) II motifs of the <it>&#946;</it>-lactamase families and the table of the conservation scores</p>
                  </caption>
                  <text>
                     <p><b>MultiDisp visualization of the a) I, b) II, c) III and d) IV motifs of the peptidase M13, e) I, f) II and g) III motifs of the subtilase, and h) I and i) II motifs of the <it>&#946;</it>-lactamase families and the table of the conservation scores</b>. MD = mean distance, IC = information content scores and maxZ = scaled -log(<it>p</it>)-values for the alignment.</p>
                  </text>
                  <graphic file="1471-2105-7-484-3"/>
               </fig>
               <suppl id="S1">
                  <title>
                     <p>Additional File 1</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the Ras-like protein positions 1&#8211;31</b>. PNG formatted figure includes MultiDisp visualization of the Ras-like protein positions 1&#8211;31 (upper) and the corresponding conservation scores (lower). The curves show (red) the scaled -log(<it>p</it>)-values with Blosum62 scoring matrix, (magenta) the scaled -log(<it>p</it>)-values with grouping of amino acids, (blue) Mean Distance and (green) Information content scores for the alignment.</p>
                  </text>
                  <file name="1471-2105-7-484-S1.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S2">
                  <title>
                     <p>Additional File 2</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the Ras-like protein positions 32&#8211;62</b>. As <supplr sid="S1">Additional file 1</supplr>, but for the Ras-like protein positions 32&#8211;62.</p>
                  </text>
                  <file name="1471-2105-7-484-S2.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S3">
                  <title>
                     <p>Additional File 3</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the Ras-like protein positions 63&#8211;92</b>. As <supplr sid="S1">Additional file 1</supplr>, but for the Ras-like protein positions 63&#8211;92.</p>
                  </text>
                  <file name="1471-2105-7-484-S3.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S4">
                  <title>
                     <p>Additional File 4</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the Ras-like protein positions 93&#8211;122</b>. As <supplr sid="S1">Additional file 1</supplr>, but for the Ras-like protein positions 93&#8211;122.</p>
                  </text>
                  <file name="1471-2105-7-484-S4.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S5">
                  <title>
                     <p>Additional File 5</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the Ras-like protein positions 123&#8211;152</b>. As <supplr sid="S1">Additional file 1</supplr>, but for the Ras-like protein positions 123&#8211;152.</p>
                  </text>
                  <file name="1471-2105-7-484-S5.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S6">
                  <title>
                     <p>Additional File 6</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the SH2 domain positions 1&#8211;27</b>. PNG formatted figure includes MultiDisp visualization of the SH2 domain positions 1&#8211;27 (upper) and the corresponding conservation scores (lower). The curves show (red) the scaled -log(<it>p</it>)-values with Blosum62 scoring matrix, (magenta) the scaled -log(<it>p</it>)-values with grouping of amino acids, (blue) Mean Distance and (green) Information content scores for the alignment.</p>
                  </text>
                  <file name="1471-2105-7-484-S6.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S7">
                  <title>
                     <p>Additional File 7</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the SH2 domain positions 28&#8211;55</b>. As <supplr sid="S6">Additional file 6</supplr>, but for SH2 domain positions 28&#8211;55.</p>
                  </text>
                  <file name="1471-2105-7-484-S7.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S8">
                  <title>
                     <p>Additional File 8</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the SH2 domain positions 56&#8211;82</b>. As <supplr sid="S6">Additional file 6</supplr>, but for SH2 domain positions 56&#8211;82.</p>
                  </text>
                  <file name="1471-2105-7-484-S8.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <suppl id="S9">
                  <title>
                     <p>Additional File 9</p>
                  </title>
                  <text>
                     <p><b>MultiDisp visualization and conservation scores for the SH2 domain positions 83&#8211;109</b>. As <supplr sid="S6">Additional file 6</supplr>, but for SH2 domain positions 83&#8211;109.</p>
                  </text>
                  <file name="1471-2105-7-484-S9.png">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Effect of the scoring matrices</p>
               </st>
               <p>One advantage of the maxZ score is that it can consider the physicochemical relationships of amino acids. The user is able to choose an arbitrary scoring matrix or classification of the amino acids, which can be incorporated into the calculation of the maxZ score. In addition to the identity matrix, we demonstrate the use of three different scoring matrices: Blosum62, Gonnet250 and PAM250 <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Additionally, we classify amino acids into six physicochemically related groups as follows: hydrophobic {V, I, L, F, M, W, Y, C}, negatively charged {D, E}, positively charged {R, K}, conformational {G, P }, polar {N, Q, S} and {A, T}. This classification has been used, for example, by Shen and Vihinen <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Figure <figr fid="F1">1</figr> shows the scaled -log(<it>p</it>)-values for the Ras-like proteins using the five different scoring schemata.</p>
               <p>The residue positions in the alignment of Ras-like proteins were divided into five groups according to the entropy and variability <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. The parameter values of the classification algorithm were chosen such that the groups represent the known structural and/or functional roles of the residue positions. A rough overview of the categories is the following:</p>
               <p>&#8226; Box 11 contains positions with low entropy and variability. The positions in this group form a main functional site.</p>
               <p>&#8226; Box 12 consists of positions with low variability and moderate entropy. These positions are located in the core of the structure next to the residues in Box 11.</p>
               <p>&#8226; Box 22 contains positions with moderate entropy and variability. These residue positions are located in the core structure but are not adjacent to the residues in the Box 11. The positions are involved in the structure of the protein, but also in signal transmission between the modulators and the main functional site.</p>
               <p>&#8226; Box 23 consists of the positions with high entropy and moderate variability. These positions are located at the surface or in the core of the protein and are involved in modulator interaction.</p>
               <p>&#8226; Box 33 contains highly variable positions with high entropy. These positions are mainly located at the surface of the protein.</p>
               <p>For a more detailed description of the categories, see the original paper <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Table <tblr tid="T1">1</tblr> shows the median (lower and upper quartile) values of the -log(<it>p</it>)-values of the maxZ scores with different scoring matrices, along with MD and IC scores in each of the five groups. As expected, all conservation scores decreased gradually when moving from the positions with low entropy and variability to those with high entropy and variability. The performance of the MD and maxZ scores was very similar. The maxZ score with groups of amino acids distinguished slightly better than the other scores the moderately conserved positions (Boxes 12&#8211;23) from the highly conserved positions (Box 11) and unconserved ones (Box 33) (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F1">1</figr>).</p>
               <p>In both Ras-like protein and SH2 domain examples, all the scoring schemes tend to provide very similar results (see Additional files <supplr sid="S1">1</supplr>, <supplr sid="S2">2</supplr>, <supplr sid="S3">3</supplr>, <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr>, <supplr sid="S7">7</supplr>, <supplr sid="S8">8</supplr>, <supplr sid="S9">9</supplr> for Blosum62 and grouping of amino acids). The results with Blosum, Gonnet and PAM matrices all rely heavily on the diagonal values of the scoring matrices. For instance, a position with highly or moderately conserved leucine obtains a relatively low maxZ score (Figure <figr fid="F1">1</figr>), whereas a position with an unconserved cysteine may be also assigned as highly conserved. This is especially critical when the Gonnet scoring matrix is used. The results with six amino acid groups differed most from the other scoring schemes since this calculates the maxZ score for the amino acid classes instead of single residues. The grouping of amino acids tends to give high scores for the positions where the majority of the residues belong to the same class. The use of the identity matrix corresponds to the special case where similarities among the symbols are ignored, and the amino acids are handled as if they where unrelated. The corresponding score is thus based solely on the relative frequencies of the residues and background probabilities. The scoring based on the identity matrix shows quite similar results with the Blosum62 and Gonnet matrices. For some positions, however, the identity matrix fails to detect the conserved positions. Similar behavior was seen with the PAM matrix (Figure <figr fid="F1">1</figr>, position 10).</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Median (lower and upper quartiles) of the -log(p)-values with different residue scoring schema together with the MD and IC scores. </p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>Score</p>
                        </c>
                        <c ca="left">
                           <p>Box11</p>
                        </c>
                        <c ca="left">
                           <p>Box12</p>
                        </c>
                        <c ca="left">
                           <p>Box22</p>
                        </c>
                        <c ca="left">
                           <p>Box23</p>
                        </c>
                        <c ca="left">
                           <p>Box33</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LogP Blosum62</p>
                        </c>
                        <c ca="left">
                           <p>708 (708, 708)</p>
                        </c>
                        <c ca="left">
                           <p>611 (198, 708)</p>
                        </c>
                        <c ca="left">
                           <p>208 (161, 547)</p>
                        </c>
                        <c ca="left">
                           <p>120 (99, 177)</p>
                        </c>
                        <c ca="left">
                           <p>75 (47, 123)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LogP Gonnet</p>
                        </c>
                        <c ca="left">
                           <p>708 (708, 708)</p>
                        </c>
                        <c ca="left">
                           <p>190 (164, 708)</p>
                        </c>
                        <c ca="left">
                           <p>158 (131, 189)</p>
                        </c>
                        <c ca="left">
                           <p>98 (78, 136)</p>
                        </c>
                        <c ca="left">
                           <p>64 (56, 106)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LogP Indep</p>
                        </c>
                        <c ca="left">
                           <p>708 (708, 708)</p>
                        </c>
                        <c ca="left">
                           <p>202 (158, 708)</p>
                        </c>
                        <c ca="left">
                           <p>171 (108, 202)</p>
                        </c>
                        <c ca="left">
                           <p>75 (63, 113)</p>
                        </c>
                        <c ca="left">
                           <p>57 (35, 96)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LogP PAM</p>
                        </c>
                        <c ca="left">
                           <p>708 (212, 708)</p>
                        </c>
                        <c ca="left">
                           <p>201 (166, 708)</p>
                        </c>
                        <c ca="left">
                           <p>153 (125, 201)</p>
                        </c>
                        <c ca="left">
                           <p>94 (81, 133)</p>
                        </c>
                        <c ca="left">
                           <p>66 (56, 105)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>LogP 6 groups</p>
                        </c>
                        <c ca="left">
                           <p>644 (631, 683)</p>
                        </c>
                        <c ca="left">
                           <p>312 (300, 333)</p>
                        </c>
                        <c ca="left">
                           <p>279 (241, 341)</p>
                        </c>
                        <c ca="left">
                           <p>216 (77, 240)</p>
                        </c>
                        <c ca="left">
                           <p>43 (26, 91)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>MD</p>
                        </c>
                        <c ca="left">
                           <p>92 (86, 97)</p>
                        </c>
                        <c ca="left">
                           <p>43 (29, 55)</p>
                        </c>
                        <c ca="left">
                           <p>34 (24, 42)</p>
                        </c>
                        <c ca="left">
                           <p>24 (19, 31)</p>
                        </c>
                        <c ca="left">
                           <p>20 (15, 25)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>IC</p>
                        </c>
                        <c ca="left">
                           <p>57 (55, 59)</p>
                        </c>
                        <c ca="left">
                           <p>39 (34, 48)</p>
                        </c>
                        <c ca="left">
                           <p>31 (27, 35)</p>
                        </c>
                        <c ca="left">
                           <p>21 (19, 23)</p>
                        </c>
                        <c ca="left">
                           <p>13 (10, 19)</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>Box11 and Box33 represent positions with low and high entropy and variability, respectively. The three middle columns represent the moderately conserved positions. More detailed description of the categories can be found in Oliveira <it>et al</it>. [35].</p>
                  </tblfn>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Comparisons with other scores</p>
               </st>
               <p>The results of the maxZ score were compared with those of the MD and IC. Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr> show the MD and IC scores together with the -log(<it>p</it>)-values of the maxZ scores for the SH2 domain, peptidase M13, subtilase and <it>&#946;</it>-lactamase family sequences. Scaling of the -log(<it>p</it>)-values was performed using zero as a minimum. The maximum value was obtained by calculating the -log(p)-values for each possible invariant position and defining the 5% percentile value to be the maximum. Blosum62 was used as a scoring matrix in the maxZ score. The default multiple sequence alignment parameters of ClustalX were used to calculate the MD score.</p>
               <p><b>SH2 domain</b> SH2 domains are binding modules recognizing phosphotyrosines and surrounding residues in polypeptides and proteins <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. Many SH2 domains recognize especially residues +1 and +3 following the phosphotyrosine and form binding pockets for these amino acids <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. All known SH2 domains share the same architecture, consisting of a central antiparallel <it>&#946;</it>-sheet flanked by two <it>&#945;</it>-helices. The central <it>&#946;</it>-sheet (strands B, C and D) forms the core of the structure and includes most of the conserved residues.</p>
               <p>All scores consider the positions forming the binding pocket as highly conserved (> 0.4). These include invariant <it>&#946;B</it>5, which interacts with phosphotyrosine, and <it>&#946;D</it>4 and <it>&#945;A</it>2 (data not shown), which form the binding pocket for the phosphotyrosine <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> (Figure <figr fid="F2">2ab</figr>). Position <it>&#946;D</it>6, which is also involved in forming the binding pocket, obtains lower conservation score values (&#8776; 0.2) indicating moderate conservation. The binding pockets for phosphotyrosine-following residues are formed by the <it>&#945;B</it>-helix, especially positions <it>&#945;B</it>5 &#8211; 6 are involved in forming the hydrophobic core for residue +3 <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Positions <it>&#946;B</it>2, <it>&#945;B</it>9 and <it>&#946;F</it>3 are occupied with aromatic residues. The MaxZ and IC scores determine these five positions as highly conserved, whereas the MD score (0.2 &#8211; 0.4) determines positions <it>&#945;B</it>9, and <it>&#946;F</it>3 as moderately conserved (Figure <figr fid="F2">2c</figr>). The binding site for ligand residue +1 includes positions <it>&#946;D</it>3 and <it>&#946;D</it>5 <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. While the maxZ and IC scores determine position <it>&#946;D</it>5 as moderately conserved, the MD score (&lt; 0.2) rather considers that position as unconserved (Figure <figr fid="F2">2b</figr>).</p>
               <p><b>Peptidase family M13</b> Peptidase family M13, also known as neprilysin family, consists of type II integral transmembrane proteins with short N-terminal cytoplasmic domain, a hydrophobic transmembrane region, and a large ectodomain containing a active site <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Three conserved motifs characterize all known M13 endopeptidases (the numbers are Pfam alignment positions): I:<sup>0</sup>vNAfY<sup>4</sup>, II:<sup>63</sup>XXHEXXH- -XX<sup>73</sup>, III:<sup>147</sup>EXXXD<sup>151 </sup>(Figures <figr fid="F3">3abc</figr>). Additionally IV:<sup>217</sup>HXXXXXR<sup>223 </sup>is conserved in neprilysins (Figure <figr fid="F3">3d</figr>).</p>
               <p>All measures scored as highly conserved the residues H65, H69, E147 which are ligands for <it>Zn</it><sup>2+</sup>, and E66 and H217, which are involved in catalysis (Figure <figr fid="F3">3bcd</figr>). The maxZ score values varied from 0.68 to 1 in the invariant positions occupied with different amino acids, whereas the corresponding MD score values were more stable. This was due to different diagonal values of the scoring matrix. The similar behavior was found in the position 219 of the motif IV, where proline was the most frequent residue. The maxZ score determined that position as highly conserved (0.84), whereas the other scores only considered it as moderately conserved (0.38 and 0.52). For the other important side-chains of N1, A2, D215, H217 and R223, which have a role in substrate binding, the behavior of the three scores was mostly very similar (Figure <figr fid="F3">3ad</figr>). The only exception was the position D215, which was considered as moderately conserved by the maxZ and IC scores (0.22 and 0.44), while the MD score considered it as unconserved. Another difference between the scores was in the positions 70 and 71 of the motif II, where the IC score could not determine these positions as inserts, but obtained considerably high conservation score values.</p>
               <p><b>Subtilisins</b> Pfam subtilase is a family of serine proteases consisting of S8 and S53 peptidase families of the MEROPS database. The S8 peptidases are divided into two subfamilies: S8A (e.g. subtilisin) and S8B (e.g. kexin). The sequences in the S8 family have a catalytic triad Asp/His/Ser. In the subfamily S8A, the active site residues occur (in the Pfam alignments) in the motifs I:<sup>28</sup>D-T/SG<sup>31</sup>, II:<sup>254</sup>HGTH<sup>257 </sup>and III:<sup>687</sup>GTSMAXP<sup>693</sup>, and in the subfamily S8B in the motifs I:<sup>28</sup>D-DG<sup>31</sup>, II:<sup>254</sup>HGTR<sup>257</sup>, III:<sup>687</sup>GTSA/VA/SXP<sup>693 </sup>(Figure <figr fid="F3">3efg</figr>).</p>
               <p>All positions of the catalytic triad Asp/His/Ser were considered as highly conserved by each of the conservation scores. In the first motif, the maxZ and MD scores obtained high conservation score values (0.70&#8211;0.90) for the aspartic acid and glycine residues (Figure <figr fid="F3">3e</figr>). The middle position had three possible side-chains in the first motif, and hence, all scores determined that position as moderately conserved (0.34&#8211;0.42). In the second motif, there were more differences between the conservation scores: the maxZ score determined all the positions as highly conserved (0.71&#8211;1), the MD score determined the first three positions as highly conserved (0.74&#8211;1), whereas the fourth position obtained much lower score (0.31) (Figure <figr fid="F3">3f</figr>). The IC score determined only the first position as highly conserved (0.84), whereas the other positions obtained a slightly lower (0.49&#8211;0.63) conservation score values. Hence, only the maxZ score considered the whole motif as highly conserved. The MD score, on the contrary, obtained rather low conservation score values for the position 257, where subgroups S8A and S8B are conserved in different amino acids. The third motif was a good example of the behavior of the different scores in the invariant positions (Figure <figr fid="F3">3g</figr>). While the MD score obtained the highest score value 1 in all the invariant positions, the maxZ and IC scores were dependent on the side-chain. Nevertheless, the maxZ score determined all the invariant positions as highly conserved (0.68&#8211;1), whereas the IC score obtained somewhat lower scorings (0.49&#8211;0.73).</p>
               <p><b><it>&#946;</it>-Lactamases </b><it>&#946;</it>-Lactamase family of Pfam contains sequences from many different groups including D-alanyl-D-alanine carboxypeptidase B, aminopeptidase, alkaline D-peptidase, animal D-Ala-D-Ala carboxypeptidase homologues, the class A and C <it>&#946;</it>-lactamases and eukaryotic <it>&#946;</it>-lactamase homologs. The family is very diverse outside the SXXK motif, S being the active side residue. For the sequences belonging to the S12 peptidase (D-Ala-D-Ala carboxypeptidase B) family in the MEROPS database, the active site motif is I:<sup>120</sup>SXTK<sup>123</sup>. It also has another motif: II:<sup>306</sup>YXN<sup>308 </sup>(Figure <figr fid="F3">3hi</figr>).</p>
               <p>All the scores determined the active site serine residue as highly conserved and provided a very similar conservation profile for the first motif (Figure <figr fid="F3">3h</figr>). In the second motif, the maxZ and IC score correctly determined the highly conserved position 306 with tyrosine/serine residues and considered the other residues as moderately conserved, while the MD score, on the contrary, failed to detect the highly conserved position 306, where it gave only 0.21 as a score value for that position (Figure <figr fid="F3">3i</figr>).</p>
               <p>These results on the example families suggest that there are three main differences between the maxZ, MD and IC conservation scores. Firstly, since the maxZ score is strongly affected by the diagonal value of the scoring matrix used, it obtains slightly lower values for the positions occupied with very frequently occurring amino acids and slightly higher value for more rarely occurring amino acids than the other scores. For very frequently occurring amino acids, see for example position <it>&#945;B</it>5 of the SH2 domain (Figure <figr fid="F2">2c</figr>) with highly conserved leucine or positions III:G687 and III:S689 of the subtilase family (Figure <figr fid="F3">3g</figr>), which obtain a much lower maxZ than MD score. In the opposite case, at positions <it>&#945;B</it>8 &#8211; 9 of the SH2 domain, the high diagonal values of the scoring matrix for histidine and tryptophan offer a much more reliable scoring than the MD score (Figure <figr fid="F2">2c</figr>). Similarly, the result of the maxZ score for the position 11:306 of the <it>&#946;</it>-lactamase family indicates that the position may be functionally important, whereas the result of the MC score indicates the contrary (Figure <figr fid="F3">3i</figr>). The maxZ score also determines different values for invariant positions with different amino acids, whereas MD score always gives the score of 1 for invariant positions. Secondly, as the maxZ score is entirely determined by the residue obtaining the greatest Z-score value, it is not affected by the other residues whose proportions may be very low, but a single conserved residue can already define a position as conserved (see position <it>&#946;F</it>3 in Figure <figr fid="F2">2c</figr> or position S/Y306 in Figure <figr fid="F3">3i</figr>). Hence, the maxZ score may find important positions of the alignment, which were not found by the other scores. Thirdly, the maxZ and MD scores also consider gaps resulting in zero or very low scores for the insert positions of the alignment. The IC score, on the other hand, fails to detect the insert positions.</p>
               <p>Taken together, all three scores behaved in a rather similar manner. The IC score does not take into account gaps, and thus its use is relevant only when the alignment does not include gaps. The maxZ and MD scores differ in some positions, which generally depend on the similarity matrix or grouping used with the maxZ score. For the Ras-like proteins, the maxZ score with groups of amino acids distinguishes slightly better than the other scores both the moderately conserved positions (Boxes 12&#8211;23) from the highly conserved positions (Box 11) and unconserved ones (Box 33). However, the results of Table <tblr tid="T1">1</tblr> cannot be used to evaluate the IC score since entropy was used in the classification. For the SH2 domains, on the one hand, the maxZ score determines the positions forming the binding pocket for the phosphotyrosine and surrounding molecules mostly as highly conserved, but on the other hand, it also correctly determines the more variable loops between the <it>&#945;</it>-helices and <it>&#946;</it>-stands as unconserved. The MD and IC scores also perform well, but sometimes the MD score fails to detect the important positions, and the IC score is not capable in detecting the loops between the conserved structures.</p>
            </sec>
            <sec>
               <st>
                  <p>Consensus sequence</p>
               </st>
               <p>As a by-product, the maxZ score also produces the consensus sequence for the multiple sequence alignment. According to formula (8), the consensus residue at each alignment position is defined as the residue with the greatest Z-score value. The legend of Figure <figr fid="F2">2</figr> shows the consensus sequence for the part of the SH2 domain.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Evaluating the AQ score for alignment quality</p>
            </st>
            <p>In this section, we evaluate the output of the alignment programs using alignment quality (AQ) based on the maxZ score and compare it to the sum of pairs (SP) and the column score (CS) quality scores <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. First, we study the relationship between the individual AQ and SP scores. Then we compare the quality scores of 7 alignment methods using BAliBASE database <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Since the divergence from the reference values was substantially constant over different false discovery rate (FDR) values, the results are presented at FDR = 0.05.</p>
            <sec>
               <st>
                  <p>Comparison of individual AQ and SP scores</p>
               </st>
               <p>We build 7 test alignments for each set of sequences in the BAliBASE database and compared the results of the AQ and SP scores. Figure <figr fid="F4">4</figr> shows the scatterplot between the AQ and SP scores for the Mafft alignments (L-INS-i strategy) in different reference sets. The Spearman rank correlation coefficient between the AQ and SP scores was 0.53 for the L-INS-i alignments. The range of the correlation coefficient in the 7 alignments was from 0.53 to 0.67. Figure <figr fid="F4">4</figr> shows a clear relationship between the quality scores. The three of the four outlying alignments on the lower right corner of Figure <figr fid="F4">4</figr> are from the reference set 40. In these alignments, the SP scores also dramatically differed from the Column score values (CS = 0 in these alignments).</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>Scatterplot between the AQ and SP scores for the Mafft (L-INS-i) alignments (r = 0.53)</p>
                  </caption>
                  <text>
                     <p><b>Scatterplot between the AQ and SP scores for the Mafft (L-INS-i) alignments (r = 0.53)</b>. Four outlying alignments on the bottom right corner are from the reference sets 11 and 40.</p>
                  </text>
                  <graphic file="1471-2105-7-484-4"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Alignment quality assessment</p>
               </st>
               <p>We compared the performance of the 7 alignment programs using five reference sets of the BAliBASE database. The first two reference sets of the BAliBASE include equi-distant sequences whose identity is less than 20% (ref 11) or between 20 and 40% (ref 12). According to the AQ score, the results on the reference set 11 indicate that Probcons was the best method aligning on the average 80% of the conserved residues correctly (Figure <figr fid="F5">5</figr>). The L-INS-i strategy of Mafft and Muscle also performed well obtaining quality scores only 5&#8211;7% lower than that of the Probcons. In the reference set 12, all the tested programs performed rather well (Figure <figr fid="F5">5</figr>). The Probcons, Muscle, L-INS-i and TCoffee obtained the highest alignment quality score values (94&#8211;96%). These methods did not differ from each other, but they differed from all the other methods (Table <tblr tid="T2">2</tblr>). The quality score was the worst in the alignments produced by Dialign, Clustal, or FFT-NS-2 strategy of Mafft showing 41&#8211;54% (ref 11) and 12&#8211;16% (ref 12) divergence from the reference alignment. The result of the SP score was very similar. The only relevant difference was the Probcons showing significant difference to the other programs in the both reference sets 11 and 12, even if the absolute difference between the methods was very low: the SP score of the Probcons and L-INS-i, for instance, differed from each other only 2%. The absolute CS scores were in all programs approximately 20% (ref 11) and 10% (ref 12) lower than that of the AQ and SP scores. In the reference set 12, the Probcons differed significantly from the other methods. In the reference set 11, the Probcons showed significant difference from all the other programs except the L-INS-i.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Barplots for the median (red) AQ, (green) SP and (blue) CP scores in the BAliBASE reference sets</p>
                  </caption>
                  <text>
                     <p><b>Barplots for the median (red) AQ, (green) SP and (blue) CP scores in the BAliBASE reference sets</b>. Error bars show the 25% and 75% percentile values.</p>
                  </text>
                  <graphic file="1471-2105-7-484-5"/>
               </fig>
               <p>The aim of the reference set 20 is to test the ability of programs to align the sequence families having disrupted by an "orphan" sequence. The reference set 30 consists of subgroups of sequences whose residue identities between the subgroups are less than 25%. According to the AQ and SP measures, the quality of all alignments was very high in the reference sets 20 and 30 (Figure <figr fid="F5">5</figr>). In the reference set 20, the median scores varied from 87% to 96% and from 92% to 97% in the AQ and SP scorings, respectively, whereas the CS score obtained clearly lower scores varying from 39% to 62%. In the reference set 30, the overall SP (80&#8211;92%) and especially CP scoring (42&#8211;73%) was somewhat lower than that of the AQ scoring (92&#8211;98%). In the AQ scoring, the L-INS-i and Clustal were slightly better than the other methods aligning 96/98% (ref 20/ref 30) of the conserved residues correctly. The Muscle and TCoffee scored almost as well and did not differ significantly from the L-INS-i and Clustal (Table <tblr tid="T2">2</tblr>). In the reference set 30, additionally, the FFT-NS-2 and Probcons did not differ from the best scoring methods. The Dialign obtained clearly the lowest quality scores (87% in ref 20 and 92% in ref 30), and differed significantly from the other methods. In the reference set 20, the SP scoring of the Probcons showed significantly better performance than the other programs. The SP scores of the four best programs: Probcons, L-INS-i, TCoffee and Muscle, were, however, within the 1.2% range from each other. Another difference between the AQ and the other scores was that while with the AQ scoring the Clustal (96%) was among the four top methods and the Probcons (90%) was the second worst method, with the SP and CS scoring the Probcons obtained the best results (97%/62%) and the Clustal (93%/45%) was the second worse method differing significantly from the better scoring methods (Table <tblr tid="T2">2</tblr>).</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>The alignment programs which obtained the highest AQ, SP and CS scores in different reference sets. </p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c cspan="3" ca="center">
                           <p>Top Programs</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Reference set</p>
                        </c>
                        <c ca="left">
                           <p>AQ</p>
                        </c>
                        <c ca="left">
                           <p>SP</p>
                        </c>
                        <c ca="left">
                           <p>CS</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>11</p>
                        </c>
                        <c ca="left">
                           <p>Probcons, L-INS-i, Muscle</p>
                        </c>
                        <c ca="left">
                           <p>ProbCons</p>
                        </c>
                        <c ca="left">
                           <p>ProbCons, L-INS-i</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>12</p>
                        </c>
                        <c ca="left">
                           <p>Probcons, Muscle, L-INS-i, Tcoffee</p>
                        </c>
                        <c ca="left">
                           <p>ProbCons</p>
                        </c>
                        <c ca="left">
                           <p>ProbCons</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>20</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, Clustal, Muscle, Tcoffee</p>
                        </c>
                        <c ca="left">
                           <p>ProbCons</p>
                        </c>
                        <c ca="left">
                           <p>ProbCons, L-INS-i</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>30</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, Clustal, TCoffee, FFT-NS-2, Muscle, Probcons</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, Probcons, Muscle, TCoffee</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, Probcons, TCoffee, Muscle</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>40</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, TCoffee</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, Probcons, TCoffee</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, TCoffee, Probcons</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>50</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, TCoffee, Probcons, Muscle, FFT-NS-2, Clustal</p>
                        </c>
                        <c ca="left">
                           <p>TCoffee, Probcons, L-INS-i</p>
                        </c>
                        <c ca="left">
                           <p>L-INS-i, TCoffee, Probcons, Muscle</p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>The programs have no statistically significant differences between each other in the particular set. Statistical analyses were performed using Wilcoxon signed rank test.</p>
                  </tblfn>
               </tbl>
               <p>The reference set 40 contains sequences with N/C-terminal extensions. In this reference set, the median AQ and SP scores varied from 85% to 94% and the median CP score from 49% to 68%. The L-INS-i obtained the best AQ scores aligning 94% of the conserved residues correctly (Figure <figr fid="F5">5</figr>). The differences between the L-INS-i and the other methods, except the TCoffee, were statistically significant (Table <tblr tid="T2">2</tblr>). The performance of the three quality scores were very similar; the only difference was that with the SP and CS scorings the quality of the Probcons alignments were comparable with the quality of the L-INS-i and TCoffee alignments.</p>
               <p>In the last reference set, the alignment includes sequences with internal insertions. In this reference set, the L-INS-i and TCoffee obtained approximately 5 to 6% better results than the other methods aligning more than 91% of the conserved residues correctly when the AQ scoring was used (Figure <figr fid="F5">5</figr>). The differences were, however, statistically significant only with respect to the Dialign (Table <tblr tid="T2">2</tblr>). According to the SP score, the TCoffee, Probcons and L-INS-i differentiated between the lower scoring methods FFT-NS-2, Muscle, Dialign and Clustal, even if the differences in the median values were very low. In the CP score, the result was similar to that of the SP score. The only difference was the Muscle, which ranked among the four best programs.</p>
               <p>To summarize, in the BAliBASE database the L-INS-i, Probcons, Muscle, TCoffee and Clustal all produced alignments with very high quality, whereas the FFT-NS-2 and Dialign performed generally worse than the other methods. The overall best method was the L-INS-i which was among the significantly best methods in all six reference sets (Table <tblr tid="T2">2</tblr>). The Probcons performed best in the reference sets 11 and 12, whereas in the other sets, the L-INS-i was the best scoring method. In the SP score, the Probcons differed significantly from the other methods in the reference sets 11&#8211;20 and was among the best scoring methods in all reference sets. The CS score results in much lower values than the other scores, but the ranking of the methods was very near to that of the SP score. In both scores, the Probcons, L-INS-, TCoffee and in some references Muscle produced the best alignments, whereas the Clustal, FFT-NS-2 and Dialign performed worse.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>In this paper, we have introduced a novel approach to objective alignment quality scoring. Unlike most of the existing methods, the proposed AQ score is not heuristic but is based on statistical theory. The score is mathematically motivated and its asymptotic properties are well known. The AQ score does not handle all alignment positions equally but concentrates on conserved positions only. In the present work, the AQ score is calculated with respect to the reference alignment. The future aim is to use the conserved alignment positions without the reference alignment. The proportion of conserved residues <it>ConsAA </it>can be used to assess the quality of the alignments also when the reference alignment is not available. Our preliminary results show a strong correlation between the predicted and reference alignment based AQ score values (data not shown here).</p>
         <p>The proposed scoring method is based on integrating the statistical hypotheses testing methods into the profile analysis framework. The attraction of profile analysis lies in the convenient treatment of the symbol frequency vector, which allows not only the incorporation of any classification or symbol similarity matrix but also the possibility to consider the influence of gaps and weights in a very simple manner <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Hence a score based on profile analysis immediately fulfills the six criteria set as requirements of a good conservation score by Valdar <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. A drawback of the current maxZ score is that the sequence weighting is not taken into consideration. Weighting the profiles with an appropriate value would allow the evolutionary relationships of the sequences in the multiple sequence alignment to be considered.</p>
         <p>The AQ score is based on comparing the number of conserved alignment positions between the test and the reference alignments, as assessed with the maxZ score, so that the dependency between the alignment positions is also considered. The multiple comparison problem is handled by using false discovery rate when choosing the conserved positions. To estimate the significance of the observed maxZ scores, we used the IS method. In genetics applications, the IS method has previously been successfully applied to the binomial distribution <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>. The application of the IS method to multinomial distribution is, however, not a trivial task because the parameter space is multidimensional. We used a mixture distribution as a sampling distribution for the multinomial distribution. With the help of simulations, we sought for the appropriate parameter values of the mixture distribution and approximated the number of the samples needed for the proper estimation of the significance values (see Additional file <supplr sid="S10">10</supplr>).</p>
         <suppl id="S10">
            <title>
               <p>Additional File 10</p>
            </title>
            <text>
               <p><b>Tuning the importance sampling procedure</b>. PDF file describes the simulation procedure for finding the appropriate parameter values of the importance sampling procedure.</p>
            </text>
            <file name="1471-2105-7-484-S10.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>In addition to developing an alignment quality scoring framework, our second objective was to test the alignment quality of commonly used multiple sequences alignment programs. We evaluated the quality of 7 alignment methods. The overall performance of the L-INS-i strategy of Mafft was the best. The Probcons worked best with groups of equi-distant sequences having residue identity less than 20% (ref 11) or 20&#8211;40% (ref 12). The L-INS-i and Clustal performed best with the reference set that consists of families aligned with a highly divergent "orphan" sequence (ref 20) or groups of equi-distant distantly related sequences (ref 30). The L-INS-i and TCoffee worked best when the sequences contained N/C-terminal extensions (ref 40) or internal insertions (ref 50). It should be noted, however, that the differences between the most of the alignment methods were negligible; in addition to the L-INS-i, also the Probcons, Muscle, TCoffee and Clustal produced alignments with very high quality. The Dialign and FFT-NS-2 strategy of Mafft, on the contrary, performed clearly worse than the other methods. The comparison between the Mafft and Muscle was potentially biased because the Muscle was run using default settings. Running the Muscle with the most accurate options would probably have affected the results.</p>
         <p>We evaluated the quality of the alignment software using the BAliBASE 3 database. Previous studies using the BAliBASE database have been performed for the database version 2 <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B3">3</abbr><abbr bid="B10">10</abbr><abbr bid="B1">1</abbr></abbrgrp>. The drawback of that database is that some of the reference alignments consist of a few sequences only. In the version 3, the reference sets have more sequences and therefore the current database suits better for statistical scoring of the alignment quality. The results are rather similar to those obtained in the previous studies using the SP or CP scores or modified versions of them <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B3">3</abbr><abbr bid="B10">10</abbr><abbr bid="B1">1</abbr></abbrgrp>. In our study, the performance of Mafft is better than reported earlier. This is because the previous results have been obtained using the NW-NS-i strategy, whereas we used the L-INS-i, the most accurate strategy of Mafft at the moment (see Mafft web page for more details <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>). Another difference was in the performance of Probcons: Do <it>et al</it>. <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> showed that in the reference sets 20&#8211;50, Probcons outperformed the other methods, while with our AQ scoring, the performance of the Probcons was poorer than that of the L-INS-i and TCoffee.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have presented a statistical approach to alignment quality scoring. The quality is characterized on the basis of conserved position information only, which is defined by using the modified Z score in conjunction with the profile analysis framework. The significance tests based on the importance sampling method define the conserved positions and false discovery rate correct the error caused by multiple testing. The final AQ score accounts for the residue frequency over the conserved alignment positions.</p>
         <p>We have compared the AQ scores of the 7 alignment methods using the BAliBASE as a benchmarking database. The results indicates that even if the L-INS-i obtained the best overall result, there are no great differences between the best scoring alignment methods: L-INS-i, Probcons, Muscle, TCoffee and Clustal whereas the FFT-NS-2 and Dialign usually scored worse. The comparison of the AQ and SP scores gave similar results indicating that the AQ score is a reliable method for assessing the quality of the multiple sequence alignments.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>The maximum Z-score</p>
            </st>
            <p>Let us assume that the occurrences of symbols at each alignment position are sampled from a discrete distribution with <it>&#946;</it><sub>1</sub>, <it>&#946;</it><sub>2</sub>, ..., <it>&#946;</it><sub><it>J </it></sub>as the true symbol probabilities. For DNA sequences <it>J </it>= 4 (bases A, C, G, and T), and for protein sequences <it>J </it>= 20 (amino acids A, C,..., Y). The statistical properties of the alignment are then completely characterized by the multinomial distribution model. In particular, the probability of a position with observed symbol frequencies <it>n</it><sub>1</sub>, <it>n</it><sub>2</sub>, ..., <it>n</it><sub><it>J </it></sub>is proportional to the product:</p>
            <p>
               <m:math name="1471-2105-7-484-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>&#8473;</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>n</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>n</m:mi>
                           <m:mn>2</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:mn>...</m:mn>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>n</m:mi>
                           <m:mi>J</m:mi>
                        </m:msub>
                        <m:mo>|</m:mo>
                        <m:msub>
                           <m:mi>&#946;</m:mi>
                           <m:mn>1</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>&#946;</m:mi>
                           <m:mn>2</m:mn>
                        </m:msub>
                        <m:mo>,</m:mo>
                        <m:mn>...</m:mn>
                        <m:mo>,</m:mo>
                        <m:msub>
                           <m:mi>&#946;</m:mi>
                           <m:mi>J</m:mi>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>~</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8719;</m:mo>
                              <m:mrow>
                                 <m:mi>j</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>J</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:msubsup>
                                 <m:mi>&#946;</m:mi>
                                 <m:mi>j</m:mi>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>n</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:msubsup>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaatuuDJXwAK1uy0HMmaeHbfv3ySLgzG0uy0HgiuD3BaGabaiab=LriqjabcIcaOiabd6gaUnaaBaaaleaacqaIXaqmaeqaaOGaeiilaWIaemOBa42aaSbaaSqaaiabikdaYaqabaGccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGUbGBdaWgaaWcbaGaemOsaOeabeaakiabcYha8HGaciab+j7aInaaBaaaleaacqaIXaqmaeqaaOGaeiilaWIae4NSdi2aaSbaaSqaaiabikdaYaqabaGccqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqGFYoGydaWgaaWcbaGaemOsaOeabeaakiabcMcaPiabc6ha+naarahabaGae4NSdi2aa0baaSqaaiabdQgaQbqaaiabd6gaUnaaBaaameaacqWGQbGAaeqaaaaaaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabdQeakbqdcqGHpis1aOGaeiOla4IaaCzcaiaaxMaadaqadaqaaiabigdaXaGaayjkaiaawMcaaaaa@6859@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The probability vector <b><it>&#946; </it></b>= (<it>&#946;</it><sub>1</sub>, <it>&#946;</it><sub>2</sub>, ..., <it>&#946;</it><sub><it>J</it></sub>) must satisfy the stochastic constraints: <it>&#946;</it><sub><it>j </it></sub>&#8805; 0 and <m:math name="1471-2105-7-484-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>J</m:mi></m:msubsup><m:mrow><m:msub><m:mi>&#946;</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:mstyle><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaaeWaqaaGGaciab=j7aInaaBaaaleaacqWGQbGAaeqaaaqaaiabdQgaQjabg2da9iabigdaXaqaaiabdQeakbqdcqGHris5aOGaeyypa0JaeGymaedaaa@3844@</m:annotation></m:semantics></m:math>. Let <it>N </it>be the number of sequences in the alignment and <m:math name="1471-2105-7-484-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>n</m:mi><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:msubsup><m:mo>&#8721;</m:mo><m:mrow><m:mi>i</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>J</m:mi></m:msubsup><m:mrow><m:msub><m:mi>n</m:mi><m:mi>j</m:mi></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGUbGBcqGH9aqpdaaeWaqaaiabd6gaUnaaBaaaleaacqWGQbGAaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabdQeakbqdcqGHris5aaaa@386A@</m:annotation></m:semantics></m:math> the actual number of symbols observed at the position, that is, the number of gaps subtracted from <it>N</it>. By maximizing the likelihood function (1) subject to the stochastic constraints, it can be easily shown that the maximum likelihood (ML) estimator <b><it>b </it></b>of the vector <b><it>&#946; </it></b>is given in the form of the observed relative frequencies</p>
            <p>
               <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-7-484-i4">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>b</m:mi>
                                       <m:mi>j</m:mi>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:msubsup>
                                       <m:mover accent="true">
                                          <m:mi>&#946;</m:mi>
                                          <m:mo>^</m:mo>
                                       </m:mover>
                                       <m:mi>j</m:mi>
                                       <m:mrow>
                                          <m:mtext>ML</m:mtext>
                                       </m:mrow>
                                    </m:msubsup>
                                    <m:mo>=</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:msub>
                                       </m:mrow>
                                       <m:mi>n</m:mi>
                                    </m:mfrac>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>2</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>...</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mi>J</m:mi>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaGaemOyai2aaSbaaSqaaiabdQgaQbqabaGccqGH9aqpiiGacuWFYoGygaqcamaaDaaaleaacqWGQbGAaeaacqqGnbqtcqqGmbataaGccqGH9aqpdaWcaaqaaiabd6gaUnaaBaaaleaacqWGQbGAaeqaaaGcbaGaemOBa4gaaiabcYcaSaqaaiabdQgaQjabg2da9iabigdaXiabcYcaSiabikdaYiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdQeakbaacqGGUaGlcaWLjaGaaCzcamaabmaabaGaeGOmaidacaGLOaGaayzkaaaaaa@4BCF@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>According to the properties of multinomial distribution, the expectation vector and the covariance matrix of the estimate are <b><it>&#946; </it></b>and <b>&#931;</b>, respectively, where</p>
            <p>
               <m:math name="1471-2105-7-484-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:msub>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mrow>
                                          <m:mo>=</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>&#946;</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:msub>
                                                   <m:mi>&#948;</m:mi>
                                                   <m:mrow>
                                                      <m:mi>i</m:mi>
                                                      <m:mi>j</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                                <m:mo>&#8722;</m:mo>
                                                <m:msub>
                                                   <m:mi>&#946;</m:mi>
                                                   <m:mi>j</m:mi>
                                                </m:msub>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                             <m:mi>n</m:mi>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mstyle>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>2</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>...</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mi>J</m:mi>
                                    <m:mo>.</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>3</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaWaaabeaeaacqGH9aqpdaWcaaqaaGGaciab=j7aInaaBaaaleaacqWGPbqAaeqaaOGaeiikaGIae8hTdq2aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGHsislcqWFYoGydaWgaaWcbaGaemOAaOgabeaakiabcMcaPaqaaiabd6gaUbaaaSqaaiabdMgaPjabdQgaQbqab0GaeyyeIuoakiabcYcaSaqaaiabdMgaPjabcYcaSiabdQgaQjabg2da9iabigdaXiabcYcaSiabikdaYiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdQeakjabc6caUaaacaWLjaGaaCzcamaabmaabaGaeG4mamdacaGLOaGaayzkaaaaaa@53DC@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The Kronecker's delta function is defined by <it>&#948;</it><sub><it>jj </it></sub>= 1 and <it>&#948;</it><sub><it>ij </it></sub>= 0 for all <it>i </it>&#8800; <it>j</it>.</p>
            <p>Given an appropriate symbol similarity matrix <b><it>C</it></b>, the entries of the profile <b><it>f </it></b>= <b><it>Cb </it></b>are expressed as linear combinations</p>
            <p>
               <m:math name="1471-2105-7-484-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>f</m:mi>
                           <m:mi>i</m:mi>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>j</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>J</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:msub>
                                 <m:mi>b</m:mi>
                                 <m:mi>j</m:mi>
                              </m:msub>
                              <m:msub>
                                 <m:mi>c</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:mrow>
                              </m:msub>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>=</m:mo>
                        <m:msubsup>
                           <m:mi>c</m:mi>
                           <m:mi>i</m:mi>
                           <m:mi>T</m:mi>
                        </m:msubsup>
                        <m:mi>b</m:mi>
                        <m:mo>,</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>4</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGMbGzdaWgaaWcbaGaemyAaKgabeaakiabg2da9maaqahabaGaemOyai2aaSbaaSqaaiabdQgaQbqabaGccqWGJbWydaWgaaWcbaGaemyAaKMaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGkbGsa0GaeyyeIuoakiabg2da9Gqadiab=ngaJnaaDaaaleaacqWGPbqAaeaacqWGubavaaGccqWFIbGycqGGSaalcaWLjaGaaCzcamaabmaabaGaeGinaqdacaGLOaGaayzkaaaaaa@4968@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where the vector <m:math name="1471-2105-7-484-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>c</m:mi><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:msubsup><m:mrow><m:mo stretchy="false">(</m:mo><m:msub><m:mi>c</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mi>J</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFJbWydaWgaaWcbaGaemyAaKgabeaakiabg2da9iabcIcaOiabdogaJnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeiykaKYaa0baaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabdQeakbaaaaa@3B26@</m:annotation></m:semantics></m:math> denotes the <it>i</it><sup>th </sup>row of <b><it>C </it></b>related to the symbols <it>i </it>= 1, 2, ..., <it>J</it>. The degree of positional conservation is calculated with respect to a predefined background distribution <b><it>&#946;</it></b><sup>0 </sup>= (<m:math name="1471-2105-7-484-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#946;</m:mi><m:mn>1</m:mn><m:mn>0</m:mn></m:msubsup><m:mo>,</m:mo><m:msubsup><m:mi>&#946;</m:mi><m:mn>2</m:mn><m:mn>0</m:mn></m:msubsup><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:msubsup><m:mi>&#946;</m:mi><m:mi>J</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFYoGydaqhaaWcbaGaeGymaedabaGaeGimaadaaOGaeiilaWIae8NSdi2aa0baaSqaaiabikdaYaqaaiabicdaWaaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab=j7aInaaDaaaleaacqWGkbGsaeaacqaIWaamaaaaaa@3D3C@</m:annotation></m:semantics></m:math>) under the null hypothesis H<sub>0 </sub>: <b><it>&#946; </it></b>= <b><it>&#946;</it></b><sup>0</sup>. The theoretical expectation vector and covariance matrix of the profile under H<sub>0 </sub>are</p>
            <p>
               <m:math name="1471-2105-7-484-i9" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi mathvariant="double-struck">E</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>f</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>=</m:mo>
                                    <m:mi>C</m:mi>
                                    <m:msup>
                                       <m:mi>&#946;</m:mi>
                                       <m:mn>0</m:mn>
                                    </m:msup>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mtext>and</m:mtext>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>&#8450;</m:mi>
                                    <m:mtext>ov</m:mtext>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>f</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>=</m:mo>
                                    <m:mi>C</m:mi>
                                    <m:msup>
                                       <m:mi>&#931;</m:mi>
                                       <m:mn>0</m:mn>
                                    </m:msup>
                                    <m:msup>
                                       <m:mi>C</m:mi>
                                       <m:mi>T</m:mi>
                                    </m:msup>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mo>,</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>5</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqadaaabaWefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFecFrcqGGOaakieWacqGFMbGzcqGGPaqkcqGH9aqpcqGFdbWqiiGacqqFYoGydaahaaWcbeqaaiabicdaWaaaaOqaaiabbggaHjabb6gaUjabbsgaKbqaaiab=jqidjabb+gaVjabbAha2jabcIcaOiab+zgaMjabcMcaPiabg2da9iab+neadHGabiab8n6atnaaCaaaleqabaGaeGimaadaaOGae43qam0aaWbaaSqabeaacqWGubavaaaaaOGaeiilaWIaaCzcaiaaxMaadaqadaqaaiabiwda1aGaayjkaiaawMcaaaaa@5716@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where the entries of <b>&#931;</b><sup>0 </sup>are defined as in (3) with <it>&#946;</it><sub><it>j </it></sub>replaced by <m:math name="1471-2105-7-484-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#946;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFYoGydaqhaaWcbaGaemOAaOgabaGaeGimaadaaaaa@30CC@</m:annotation></m:semantics></m:math>. After standardizing the individual profiles (4) with the corresponding quantities (5), the final <it>Z</it>-score takes the form</p>
            <p>
               <m:math name="1471-2105-7-484-i11" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>Z</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo>=</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:msubsup>
                                             <m:mi>c</m:mi>
                                             <m:mi>i</m:mi>
                                             <m:mi>T</m:mi>
                                          </m:msubsup>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mi>b</m:mi>
                                          <m:mo>&#8722;</m:mo>
                                          <m:msup>
                                             <m:mi>&#946;</m:mi>
                                             <m:mn>0</m:mn>
                                          </m:msup>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:msqrt>
                                             <m:mrow>
                                                <m:msubsup>
                                                   <m:mi>c</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:mi>T</m:mi>
                                                </m:msubsup>
                                                <m:msup>
                                                   <m:mi>&#931;</m:mi>
                                                   <m:mn>0</m:mn>
                                                </m:msup>
                                                <m:msub>
                                                   <m:mi>c</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:msqrt>
                                       </m:mrow>
                                    </m:mfrac>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>2</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mn>...</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mi>J</m:mi>
                                    <m:mo>.</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>6</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaGaemOwaO1aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpdaWcaaqaaGqadiab=ngaJnaaDaaaleaacqWGPbqAaeaacqWGubavaaGccqGGOaakcqWFIbGycqGHsisliiGacqGFYoGydaahaaWcbeqaaiabicdaWaaakiabcMcaPaqaamaakaaabaGae83yam2aa0baaSqaaiabdMgaPbqaaiabdsfaubaaiiqakiab9n6atnaaCaaaleqabaGaeGimaadaaOGae83yam2aaSbaaSqaaiabdMgaPbqabaaabeaaaaGccqGGSaalaeaacqWGPbqAcqGH9aqpcqaIXaqmcqGGSaalcqaIYaGmcqGGSaalcqGGUaGlcqGGUaGlcqGGUaGlcqGGSaalcqWGkbGscqGGUaGlaaGaaCzcaiaaxMaadaqadaqaaiabiAda2aGaayjkaiaawMcaaaaa@5548@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The statistic maps the residues of an alignment position onto a real number according to the observed symbol frequencies and their similarities among the symbol classes. If we use binary symbol similarities <m:math name="1471-2105-7-484-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>c</m:mi><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:msubsup><m:mrow><m:mo stretchy="false">(</m:mo><m:msub><m:mi>&#948;</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow></m:msub><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:mi>j</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mn>20</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaieWacqWFJbWydaWgaaWcbaGaemyAaKgabeaakiabg2da9iabcIcaOGGaciab+r7aKnaaBaaaleaacqWGPbqAcqWGQbGAaeqaaOGaeiykaKYaa0baaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabikdaYiabicdaWaaaaaa@3C45@</m:annotation></m:semantics></m:math>, we obtain a special case of the <it>Z</it>-score where the similarities among the symbols are ignored. Generally, <b><it>c</it></b><sub><it>i </it></sub>can have any fixed form appropriate for the study. In the present application, the score we propose for the positional conservation analysis is obtained by selecting the maximal <it>Z</it><sub><it>i</it></sub>-value over the symbol classes <it>i </it>= 1, 2, ..., <it>J</it>. We call this statistic the maximum <it>Z</it>-score and abbreviate it as maxZ:</p>
            <p>
               <m:math name="1471-2105-7-484-i13" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>max</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mtext>Z</m:mtext>
                        <m:mo>=</m:mo>
                        <m:munder>
                           <m:mrow>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>i</m:mi>
                              <m:mo>=</m:mo>
                              <m:mn>1</m:mn>
                              <m:mo>,</m:mo>
                              <m:mn>2</m:mn>
                              <m:mo>,</m:mo>
                              <m:mn>...</m:mn>
                              <m:mo>,</m:mo>
                              <m:mi>J</m:mi>
                           </m:mrow>
                        </m:munder>
                        <m:msub>
                           <m:mi>Z</m:mi>
                           <m:mi>i</m:mi>
                        </m:msub>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>7</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacyGGTbqBcqGGHbqycqGG4baEcqqGAbGwcqGH9aqpdaWfqaqaaiGbc2gaTjabcggaHjabcIha4bWcbaGaemyAaKMaeyypa0JaeGymaeJaeiilaWIaeGOmaiJaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemOsaOeabeaakiabdQfaAnaaBaaaleaacqWGPbqAaeqaaOGaeiOla4IaaCzcaiaaxMaadaqadaqaaiabiEda3aGaayjkaiaawMcaaaaa@49A3@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>We assume that a single conserved symbol class can already define the position as conserved. The symbol class obtaining the maximum <it>Z</it>-score, i.e.</p>
            <p>Cons = argmax<sub><it>i </it>= 1,2,...,<it>J</it></sub><it>Z</it><sub><it>i </it></sub>&#160;&#160;&#160; (8)</p>
            <p>defines the consensus residue for the particular alignment position.</p>
         </sec>
         <sec>
            <st>
               <p>The significance of the observed maxZ score</p>
            </st>
            <p>Once the maxZ-statistic has been evaluated, the next question concerns its significance, namely, whether the observed value of the maxZ-statistic is large enough to justify the rejection of H<sub>0 </sub>at a particular position. In this section, the problem of identifying conserved positions of a multiple alignment is considered as a statistical hypothesis testing problem. This consists of testing the null hypothesis H<sub>0 </sub>: <it>&#946;</it><sub><it>j </it></sub>= <m:math name="1471-2105-7-484-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#946;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFYoGydaqhaaWcbaGaemOAaOgabaGaeGimaadaaaaa@30CC@</m:annotation></m:semantics></m:math> against the alternative H<sub><it>A </it></sub>: <it>&#946;</it><sub><it>j </it></sub>&#8805; <m:math name="1471-2105-7-484-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msubsup><m:mi>&#946;</m:mi><m:mi>j</m:mi><m:mn>0</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFYoGydaqhaaWcbaGaemOAaOgabaGaeGimaadaaaaa@30CC@</m:annotation></m:semantics></m:math>, <it>j </it>= 1, 2,... <it>J</it>, where at least one inequality is proper. The problems caused by multiple comparisons within the position can be avoided by using the maxZ-statistic (7) as a test statistic, instead of the individual Z<sub><it>i</it></sub>-scores (6). Our aim is to test whether the observed value of maxZ is significantly larger than that which would be likely to arise under H<sub>0 </sub>due to random variation. The significance <it>p </it>of the observed value maxZ is formally defined by the tail probability function &#8473;(maxZ) = 1 - &#8473;(maxZ|H<sub>0</sub>). The smaller the <it>p</it>-value, the more extreme the maxZ-statistic and the stronger the evidence against H<sub>0</sub>. When the exact null distribution &#8473;(maxZ|H<sub>0</sub>) is not available, it is essential to have widely applicable procedures that provide good approximation. Two approaches to approximate the theoretical null distribution are described below.</p>
            <p><b>Monte Carlo </b>(MC) approximation is perhaps the most frequently used non-parametric method for estimating the significance of an observed test statistic <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. In the MC method, the samples are generated from the background distribution, and the null distribution is approximated through the cumulative sample distribution function. In the other words, the significance of the maxZ score is obtained by calculating the proportion of samples whose maxZ score is greater or equal to the observed maxZ value. However, because of the 20 dimensional parameter space, even with very large sample sizes, the probability of obtaining such an observation is very near to zero. Therefore the MC procedure is very ineffective and results in zero <it>p</it>-values with alignment positions which are only moderately conserved.</p>
            <p><b>Importance sampling </b>(IS), also referred as the weighted bootstrap re-sampling method, is a variant of the ordinary MC method <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B50">50</abbr></abbrgrp>. Let us denote by <it>y </it>= <it>n</it><sub>1</sub>, <it>n</it><sub>2</sub>, ... , <it>n</it><sub><it>J </it></sub>the sample of symbol frequencies from the multinomial distribution <it>g</it><sup>0 </sup>under H<sub>0 </sub>and <it>y</it><sub><it>obs </it></sub>the observed symbol frequencies at one alignment position. Define <it>t</it>(<it>y</it>) as an indicator function <it>t</it>(<it>y</it>) = <m:math name="1471-2105-7-484-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>I</m:mi><m:mrow><m:mo>{</m:mo><m:mi>max</m:mi><m:mo>&#8289;</m:mo><m:mtext>Z</m:mtext><m:mo stretchy="false">(</m:mo><m:mi>y</m:mi><m:mo stretchy="false">)</m:mo><m:mo>&#8805;</m:mo><m:mi>max</m:mi><m:mo>&#8289;</m:mo><m:mtext>Z</m:mtext><m:mo stretchy="false">(</m:mo><m:msub><m:mi>y</m:mi><m:mrow><m:mi>o</m:mi><m:mi>b</m:mi><m:mi>s</m:mi></m:mrow></m:msub><m:mo stretchy="false">)</m:mo><m:mo>}</m:mo></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGjbqsdaWgaaWcbaGaei4EaSNagiyBa0MaeiyyaeMaeiiEaGNaeeOwaOLaeiikaGIaemyEaKNaeiykaKIaeyyzImRagiyBa0MaeiyyaeMaeiiEaGNaeeOwaOLaeiikaGIaemyEaK3aaSbaaWqaaiabd+gaVjabdkgaIjabdohaZbqabaWccqGGPaqkcqGG9bqFaeqaaaaa@4830@</m:annotation></m:semantics></m:math>. The general idea of the importance sampling is to draw samples from any such distribution <it>g</it>* where all realizations which are possible in <it>g</it><sup>0 </sup>are also possible in <it>g</it>*. By choosing <it>g</it>*(<it>y</it>) &#8733; <it>t</it>(<it>y</it>)<it>g</it><sup><it>o</it></sup>(<it>y</it>) as an IS distribution and taking infinitely many samples from <it>g</it>*, the exact observed significance level is given by</p>
            <p>
               <m:math name="1471-2105-7-484-i15" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>&#8473;</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>max</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mtext>Z</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>y</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>&#8805;</m:mo>
                        <m:mi>max</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mtext>Z</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>y</m:mi>
                           <m:mrow>
                              <m:mi>o</m:mi>
                              <m:mi>b</m:mi>
                              <m:mi>s</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>0</m:mn>
                              </m:mrow>
                              <m:mi>&#8734;</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:mi>t</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:msup>
                                       <m:mi>g</m:mi>
                                       <m:mi>o</m:mi>
                                    </m:msup>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>g</m:mi>
                                    <m:mo>*</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                        </m:mstyle>
                        <m:mi>g</m:mi>
                        <m:mo>*</m:mo>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>y</m:mi>
                           <m:mi>i</m:mi>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>.</m:mo>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaatuuDJXwAK1uy0HMmaeHbfv3ySLgzG0uy0HgiuD3BaGabaiab=LriqjabcIcaOiGbc2gaTjabcggaHjabcIha4jabbQfaAjabcIcaOiabdMha5jabcMcaPiabgwMiZkGbc2gaTjabcggaHjabcIha4jabbQfaAjabcIcaOiabdMha5naaBaaaleaacqWGVbWBcqWGIbGycqWGZbWCaeqaaOGaeiykaKIaeiykaKIaeyypa0ZaaabCaeaadaWcaaqaaiabdsha0jabcIcaOiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaem4zaC2aaWbaaSqabeaacqWGVbWBaaGccqGGOaakcqWG5bqEdaWgaaWcbaGaemyAaKgabeaakiabcMcaPaqaaiabdEgaNjabcQcaQiabcIcaOiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeiykaKcaaaWcbaGaemyAaKMaeyypa0JaeGimaadabaGaeyOhIukaniabggHiLdGccqWGNbWzcqGGQaGkcqGGOaakcqWG5bqEdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiabc6caUaaa@7514@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Since this is an expectation of <m:math name="1471-2105-7-484-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>t</m:mi><m:mo stretchy="false">(</m:mo><m:mi>y</m:mi><m:mo stretchy="false">)</m:mo><m:mfrac><m:mrow><m:msup><m:mi>g</m:mi><m:mi>o</m:mi></m:msup><m:mo stretchy="false">(</m:mo><m:mi>y</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:mrow><m:mi>g</m:mi><m:mo>*</m:mo><m:mo stretchy="false">(</m:mo><m:mi>y</m:mi><m:mo stretchy="false">)</m:mo></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWG0baDcqGGOaakcqWG5bqEcqGGPaqkdaWcaaqaaiabdEgaNnaaCaaaleqabaGaem4Ba8gaaOGaeiikaGIaemyEaKNaeiykaKcabaGaem4zaCMaeiOkaOIaeiikaGIaemyEaKNaeiykaKcaaaaa@3CDC@</m:annotation></m:semantics></m:math> with respect to <it>g</it>*, we can approximate the observed significance level by taking <it>K </it>samples from <it>g</it>* and calculating an empirical tail probability function</p>
            <p>
               <m:math name="1471-2105-7-484-i17" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>&#8473;</m:mi>
                           <m:mrow>
                              <m:mi>I</m:mi>
                              <m:mi>S</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>max</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mtext>Z</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>y</m:mi>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>&#8805;</m:mo>
                        <m:mi>max</m:mi>
                        <m:mo>&#8289;</m:mo>
                        <m:mtext>Z</m:mtext>
                        <m:mo stretchy="false">(</m:mo>
                        <m:msub>
                           <m:mi>y</m:mi>
                           <m:mrow>
                              <m:mi>o</m:mi>
                              <m:mi>b</m:mi>
                              <m:mi>s</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mn>1</m:mn>
                           <m:mi>K</m:mi>
                        </m:mfrac>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>K</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mi>t</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mi>i</m:mi>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mfrac>
                                 <m:mrow>
                                    <m:msup>
                                       <m:mi>g</m:mi>
                                       <m:mi>o</m:mi>
                                    </m:msup>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>g</m:mi>
                                    <m:mo>*</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>y</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mfrac>
                           </m:mrow>
                        </m:mstyle>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>9</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaatuuDJXwAK1uy0HMmaeHbfv3ySLgzG0uy0HgiuD3BaGabaiab=LriqnaaBaaaleaacqWGjbqscqWGtbWuaeqaaOGaeiikaGIagiyBa0MaeiyyaeMaeiiEaGNaeeOwaOLaeiikaGIaemyEaKNaeiykaKIaeyyzImRagiyBa0MaeiyyaeMaeiiEaGNaeeOwaOLaeiikaGIaemyEaK3aaSbaaSqaaiabd+gaVjabdkgaIjabdohaZbqabaGccqGGPaqkcqGGPaqkcqGH9aqpdaWcaaqaaiabigdaXaqaaiabdUealbaadaaeWbqaaiabdsha0jabcIcaOiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeiykaKYaaSaaaeaacqWGNbWzdaahaaWcbeqaaiabd+gaVbaakiabcIcaOiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeiykaKcabaGaem4zaCMaeiOkaOIaeiikaGIaemyEaK3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaaaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGlbWsa0GaeyyeIuoakiabc6caUiaaxMaacaWLjaWaaeWaaeaacqaI5aqoaiaawIcacaGLPaaaaaa@763F@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>This gives us an IS estimate of the observed significance level.</p>
            <p>A possible drawback to using the simulated distribution is that the <it>p</it>-value can be zero in the highly conserved positions. Hence several highly conserved positions may obtain the same score, and they cannot be distinguished from each other. In order to avoid this, we used the following approximation for the significance value:</p>
            <p>
               <m:math name="1471-2105-7-484-i18" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mtable columnalign="left">
                        <m:mtr>
                           <m:mtd>
                              <m:mi>&#8473;</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>y</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>&#8805;</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>b</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mtd>
                        </m:mtr>
                        <m:mtr>
                           <m:mtd>
                              <m:mo>=</m:mo>
                              <m:mi>&#8473;</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>y</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>=</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>b</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>+</m:mo>
                              <m:mi>&#8473;</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>y</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>></m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>b</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mtd>
                        </m:mtr>
                        <m:mtr>
                           <m:mtd>
                              <m:mo>&#8776;</m:mo>
                              <m:mi>&#8473;</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>b</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>+</m:mo>
                              <m:mi>&#8473;</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>y</m:mi>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>&#8805;</m:mo>
                              <m:mi>max</m:mi>
                              <m:mo>&#8289;</m:mo>
                              <m:mtext>Z</m:mtext>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>b</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>;</m:mo>
                              <m:mi>y</m:mi>
                              <m:mo>&#8800;</m:mo>
                              <m:msub>
                                 <m:mi>y</m:mi>
                                 <m:mrow>
                                    <m:mi>o</m:mi>
                                    <m:mi>b</m:mi>
                                    <m:mi>s</m:mi>
                                 </m:mrow>
                              </m:msub>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>.</m:mo>
                           </m:mtd>
                        </m:mtr>
                     </m:mtable>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakqaabeqaamrr1ngBPrwtHrhAYaqeguuDJXwAKbstHrhAGq1DVbaceaGae8xgHaLaeiikaGIagiyBa0MaeiyyaeMaeiiEaGNaeeOwaOLaeiikaGIaemyEaKNaeiykaKIaeyyzImRagiyBa0MaeiyyaeMaeiiEaGNaeeOwaOLaeiikaGIaemyEaK3aaSbaaSqaaiabd+gaVjabdkgaIjabdohaZbqabaGccqGGPaqkcqGGPaqkaeaacqGH9aqpcqWFzecucqGGOaakcyGGTbqBcqGGHbqycqGG4baEcqqGAbGwcqGGOaakcqWG5bqEcqGGPaqkcqGH9aqpcyGGTbqBcqGGHbqycqGG4baEcqqGAbGwcqGGOaakcqWG5bqEdaWgaaWcbaGaem4Ba8MaemOyaiMaem4CamhabeaakiabcMcaPiabcMcaPiabgUcaRiab=LriqjabcIcaOiGbc2gaTjabcggaHjabcIha4jabbQfaAjabcIcaOiabdMha5jabcMcaPiabg6da+iGbc2gaTjabcggaHjabcIha4jabbQfaAjabcIcaOiabdMha5naaBaaaleaacqWGVbWBcqWGIbGycqWGZbWCaeqaaOGaeiykaKIaeiykaKcabaGaeyisISRae8xgHaLaeiikaGIaemyEaK3aaSbaaSqaaiabd+gaVjabdkgaIjabdohaZbqabaGccqGGPaqkcqGHRaWkcqWFzecucqGGOaakcyGGTbqBcqGGHbqycqGG4baEcqqGAbGwcqGGOaakcqWG5bqEcqGGPaqkcqGHLjYScyGGTbqBcqGGHbqycqGG4baEcqqGAbGwcqGGOaakcqWG5bqEdaWgaaWcbaGaem4Ba8MaemOyaiMaem4CamhabeaakiabcMcaPiabcUda7iabdMha5jabgcMi5kabdMha5naaBaaaleaacqWGVbWBcqWGIbGycqWGZbWCaeqaaOGaeiykaKIaeiOla4caaaa@B4D0@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>In other words, the probability of an observed value is separated from the probability of other sampled observations. Therefore, even if none of the sampled maxZ values are greater than the observed maxZ value, the corresponding <it>p</it>-value is nonzero.</p>
            <p><b>Importance sampling distribution</b> In order to avoid the problems of the MC procedure, it is natural to define the IS distribution such that it also gives observations from the upper tail of the maxZ distribution. We chose as an importance sampling distribution a mixture of the multinomial distribution</p>
            <p>
               <m:math name="1471-2105-7-484-i19" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>g</m:mi>
                                    <m:mo>*</m:mo>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#8473;</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:msub>
                                       <m:mi>n</m:mi>
                                       <m:mn>1</m:mn>
                                    </m:msub>
                                    <m:mo>,</m:mo>
                                    <m:msub>
                                       <m:mi>n</m:mi>
                                       <m:mn>2</m:mn>
                                    </m:msub>
                                    <m:mo>,</m:mo>
                                    <m:mn>...</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:msub>
                                       <m:mi>n</m:mi>
                                       <m:mi>J</m:mi>
                                    </m:msub>
                                    <m:mo>|</m:mo>
                                    <m:msubsup>
                                       <m:mi>&#946;</m:mi>
                                       <m:mn>1</m:mn>
                                       <m:mn>0</m:mn>
                                    </m:msubsup>
                                    <m:mo>,</m:mo>
                                    <m:msubsup>
                                       <m:mi>&#946;</m:mi>
                                       <m:mn>2</m:mn>
                                       <m:mn>0</m:mn>
                                    </m:msubsup>
                                    <m:mo>,</m:mo>
                                    <m:mn>...</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:msubsup>
                                       <m:mi>&#946;</m:mi>
                                       <m:mi>J</m:mi>
                                       <m:mn>0</m:mn>
                                    </m:msubsup>
                                    <m:mo>,</m:mo>
                                    <m:mi>&#945;</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>&#949;</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mo>=</m:mo>
                                    <m:mi>&#945;</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mfrac>
                                       <m:mi>n</m:mi>
                                       <m:mrow>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mrow>
                                                <m:mn>1</m:mn>
                                                <m:mo>,</m:mo>
                                                <m:mn>0</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mn>...</m:mn>
                                          <m:msub>
                                             <m:mi>n</m:mi>
                                             <m:mrow>
                                                <m:mi>J</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mn>0</m:mn>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mfrac>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8719;</m:mo>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>J</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:msubsup>
                                             <m:mi>&#946;</m:mi>
                                             <m:mrow>
                                                <m:mi>j</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mn>0</m:mn>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mn>0</m:mn>
                                                <m:msub>
                                                   <m:mi>n</m:mi>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>,</m:mo>
                                                      <m:mn>0</m:mn>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:msubsup>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mo>+</m:mo>
                                    <m:mfrac>
                                       <m:mrow>
                                          <m:mn>1</m:mn>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>&#945;</m:mi>
                                       </m:mrow>
                                       <m:mi>K</m:mi>
                                    </m:mfrac>
                                    <m:mstyle displaystyle="true">
                                       <m:msubsup>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>k</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mi>K</m:mi>
                                       </m:msubsup>
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mfrac>
                                             <m:mi>n</m:mi>
                                             <m:mrow>
                                                <m:msub>
                                                   <m:mi>n</m:mi>
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                      <m:mi>k</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                                <m:mn>...</m:mn>
                                                <m:msub>
                                                   <m:mi>n</m:mi>
                                                   <m:mrow>
                                                      <m:mi>J</m:mi>
                                                      <m:mo>,</m:mo>
                                                      <m:mi>k</m:mi>
                                                   </m:mrow>
                                                </m:msub>
                                             </m:mrow>
                                          </m:mfrac>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mstyle displaystyle="true">
                                             <m:msubsup>
                                                <m:mo>&#8719;</m:mo>
                                                <m:mrow>
                                                   <m:mi>j</m:mi>
                                                   <m:mo>=</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                                <m:mi>J</m:mi>
                                             </m:msubsup>
                                             <m:mrow>
                                                <m:msubsup>
                                                   <m:mi>&#946;</m:mi>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>,</m:mo>
                                                      <m:mi>k</m:mi>
                                                   </m:mrow>
                                                   <m:mrow>
                                                      <m:msub>
                                                         <m:mi>n</m:mi>
                                                         <m:mrow>
                                                            <m:mi>j</m:mi>
                                                            <m:mo>,</m:mo>
                                                            <m:mi>k</m:mi>
                                                         </m:mrow>
                                                      </m:msub>
                                                   </m:mrow>
                                                </m:msubsup>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                    </m:mstyle>
                                    <m:mo>,</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mrow>
                              <m:mn>10</m:mn>
                           </m:mrow>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqadeWabaaabaGaem4zaCMaeiOkaOIaeyypa0Zefv3ySLgznfgDOjdaryqr1ngBPrginfgDObcv39gaiqaacqWFzecucqGGOaakcqWGUbGBdaWgaaWcbaGaeGymaedabeaakiabcYcaSiabd6gaUnaaBaaaleaacqaIYaGmaeqaaOGaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemOBa42aaSbaaSqaaiabdQeakbqabaGccqGG8baFiiGacqGFYoGydaqhaaWcbaGaeGymaedabaGaeGimaadaaOGaeiilaWIae4NSdi2aa0baaSqaaiabikdaYaqaaiabicdaWaaakiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiab+j7aInaaDaaaleaacqWGkbGsaeaacqaIWaamaaGccqGGSaalcqGFXoqycqGGSaalcqGF1oqzcqGGPaqkaeaacqGH9aqpcqGFXoqycqGGOaakdaWcaaqaaiabd6gaUbqaaiabd6gaUnaaBaaaleaacqaIXaqmcqGGSaalcqaIWaamaeqaaOGaeiOla4IaeiOla4IaeiOla4IaemOBa42aaSbaaSqaaiabdQeakjabcYcaSiabicdaWaqabaaaaOGaeiykaKYaaebmaeaacqGFYoGydaqhaaWcbaGaemOAaOMaeiilaWIaeGimaadabaGaeGimaaJaemOBa42aaSbaaWqaaiabdQgaQjabcYcaSiabicdaWaqabaaaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemOsaOeaniabg+GivdaakeaacqGHRaWkdaWcaaqaaiabigdaXiabgkHiTiab+f7aHbqaaiabdUealbaadaaeWaqaaiabcIcaOmaalaaabaGaemOBa4gabaGaemOBa42aaSbaaSqaaiabigdaXiabcYcaSiabdUgaRbqabaGccqGGUaGlcqGGUaGlcqGGUaGlcqWGUbGBdaWgaaWcbaGaemOsaOKaeiilaWIaem4AaSgabeaaaaGccqGGPaqkdaqeWaqaaiab+j7aInaaDaaaleaacqWGQbGAcqGGSaalcqWGRbWAaeaacqWGUbGBdaWgaaadbaGaemOAaOMaeiilaWIaem4AaSgabeaaaaaaleaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGkbGsa0Gaey4dIunaaSqaaiabdUgaRjabg2da9iabigdaXaqaaiabdUealbqdcqGHris5aOGaeiilaWcaaiaaxMaacaWLjaWaaeWaaeaacqaIXaqmcqaIWaamaiaawIcacaGLPaaaaaa@B4F7@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>where 0 &lt;<it>&#945; </it>&lt; 1,</p>
            <p>
               <m:math name="1471-2105-7-484-i20" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>&#946;</m:mi>
                           <m:mrow>
                              <m:mi>i</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>j</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mrow>
                           <m:mo>{</m:mo>
                           <m:mrow>
                              <m:mtable columnalign="left">
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>&#949;</m:mi>
                                          <m:mo>+</m:mo>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mn>1</m:mn>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>&#949;</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:msubsup>
                                                   <m:mi>&#946;</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:mn>0</m:mn>
                                                </m:msubsup>
                                             </m:mrow>
                                             <m:mi>K</m:mi>
                                          </m:mfrac>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mi>j</m:mi>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mo stretchy="false">(</m:mo>
                                          <m:mn>1</m:mn>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mi>&#949;</m:mi>
                                          <m:mo stretchy="false">)</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:msubsup>
                                                   <m:mi>&#946;</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:mn>0</m:mn>
                                                </m:msubsup>
                                             </m:mrow>
                                             <m:mi>K</m:mi>
                                          </m:mfrac>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mo>&#8800;</m:mo>
                                          <m:mi>j</m:mi>
                                          <m:mo>,</m:mo>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaaiiGacqWFYoGydaWgaaWcbaGaemyAaKMaeiilaWIaemOAaOgabeaakiabg2da9maaceqabaqbaeaabiGaaaqaaiab=v7aLjabgUcaRiabcIcaOiabigdaXiabgkHiTiab=v7aLjabcMcaPmaalaaabaGae8NSdi2aa0baaSqaaiabdMgaPbqaaiabicdaWaaaaOqaaiabdUealbaacqGGSaalaeaacqWGPbqAcqGH9aqpcqWGQbGAcqGGSaalaeaacqGGOaakcqaIXaqmcqGHsislcqWF1oqzcqGGPaqkdaWcaaqaaiab=j7aInaaDaaaleaacqWGPbqAaeaacqaIWaamaaaakeaacqWGlbWsaaGaeiilaWcabaGaemyAaKMaeyiyIKRaemOAaOMaeiilaWcaaaGaay5Eaaaaaa@578B@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>and <it>n</it><sub><it>j</it>,<it>k </it></sub>denotes the <it>j</it>th symbol frequency in the <it>k</it>th mixture. Notice that the number of the mixture components is the same as the number of the symbols plus one, i.e. <it>K </it>+ 1 = <it>J </it>+ 1.</p>
            <p>The IS distribution consists of <it>K </it>+ 1 mixture components and two parameters <it>&#949; </it>and <it>&#945;</it>. The first component ensures that some samples are drawn from the background distribution. The other <it>K </it>components correspond to the symbols, such that the probability of one symbol is high (&#8776; &#949;) and the probabilities of the other symbols are proportional to their background probabilities. The mixture parameter <it>&#945; </it>determines which part of the samples are drawn from the background distribution and which from the other distributions. The shape parameter <it>&#949; </it>determines the weight for one of the symbols in each of the <it>K </it>mixture distributions.</p>
            <p>This particular distribution was chosen for two reasons: first, if we choose a large enough <it>&#949;</it>, we can draw extreme observations from the 20 dimensional parameter space, and thus obtain large maxZ values; second, the use of mixture parameter <it>&#945; </it>gives us a good coverage of the parameter space thereby ensuring that the process converges in a reasonable time. This is very important because the tests are made separately for each alignment position, and therefore the number of simulations made should be as low as possible.</p>
            <p><b>Importance sampling procedure</b> The following procedure summarizes the calculation of the observed significant levels using the importance sampling method.</p>
            <p>1. Calculate observed maxZ value maxZ(<it>y</it><sub><it>obs</it></sub>) using formula (7).</p>
            <p>2. Choose parameter values <it>&#945; </it>and <it>&#949; </it>for the IS distribution <it>g* </it>in formula (10) and the number of the samples <it>S</it>. Initialize ratio <it>r </it>= 0.</p>
            <p>3. Generate a sample from distribution <it>g</it>* in formula (10) with the chosen <it>&#945; </it>and <it>&#949;</it>.</p>
            <p>4. Calculate maxZ(<it>y</it>) value for the sampled observation using formula (7).</p>
            <p>5. If maxZ(<it>y</it>) &#8805; maxZ(<it>y</it><sub><it>obs</it></sub>), calculate ratio <it>r </it>= <it>r </it>+ <it>g</it><sup><it>o</it></sup>(<it>y</it><sub><it>i</it></sub>)/<it>g</it>* (<it>y</it><sub><it>i</it></sub>). Otherwise do nothing.</p>
            <p>6. Repeat the stages 3&#8211;5 <it>S </it>times.</p>
            <p>7. Calculate the <it>p</it>-value according to the ratio <it>r</it>/<it>S</it>.</p>
            <p>8. Output the -log(<it>p</it>) values.</p>
            <p><b>Parameter values</b> In the evaluation of the maxZ score, parameter values <it>&#945; </it>= 0.4 and <it>&#949; </it>= 0.7 were used for the alignments with <it>N </it>&#8804; 100, and <it>&#945; </it>= 0.4, <it>&#949; </it>= 0.8 for the remaining alignments. In the evaluation of the AQ score, <it>&#945; </it>= 0.4 and <it>&#949; </it>= 0.7 were used for all benchmarking datasets. The number of the sample was <it>S </it>= 40,000 when evaluating the maxZ score, and <it>S </it>= 10,000 when calculating the quality of the alignments. After ensuring the sufficient accuracy of the procedure, the number of the samples was decreased in the alignment quality calculations, as there the objective was to define the number of the significant positions rather than the exact <it>p</it>-values. This enabled us to speed up the quality comparisons. The choice of the parameter values has been described in detail in the supplementary material (see <supplr sid="S10">Additional file 10</supplr>). For the background distribution <it>g</it><sup>0</sup>, we used the distribution of amino acids in the full alignment under investigation.</p>
         </sec>
         <sec>
            <st>
               <p>The alignment quality (AQ) score</p>
            </st>
            <p>This section describes how the positional significance levels can be used to derive the alignment quality (AQ) score for the entire multiple sequence alignment. In the previous section, the conserved positions of the multiple sequence alignment were detected by the statistical hypothesis testing procedure. The significance tests were performed simultaneously at each alignment position. This simultaneous testing for the family of hypotheses causes a large multiple comparison problem which must be considered when deriving the quality score for the entire alignment.</p>
            <p>Traditionally, these kinds of multiple comparison problems are solved by controlling the Family-Wise-Error-rate (FWE), i.e., the probability that at least one hypothesis is erroneously rejected <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Control of the FWE, however, decreases the probability of detecting the truly unconserved positions. Moreover, as several positions in the multiple sequence alignment can be considered to be conserved, a more natural approach is to control the False Discovery Rate (FDR), i.e., the expected proportion of the erroneously rejected null hypotheses. Controlling the expected proportion of rejected null hypotheses from the total number of the rejected hypotheses was first introduced by Benjamini and Hochberg <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
            <p>Controlling the FDR led us to derive the alignment quality score following the Benjamini+Yekutieli's procedure <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. More precisely, we use the following step-up procedure for controlling the FDR among arbitrarily dependent test statistics:</p>
            <p>1. Calculate <it>p</it>-values for each alignment position. Let <it>P</it><sub>(1) </sub>&#8804; <it>P</it><sub>(2) </sub>&#8804; ... <it>P</it><sub>(<it>m</it>) </sub>be the ordered list of the <it>p</it>-values.</p>
            <p>2. Calculate <it>j</it>* = max{<it>j </it>: <it>P</it><sub>(<it>j</it>) </sub>&#8804; <m:math name="1471-2105-7-484-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mrow><m:mi>j</m:mi><m:mi>q</m:mi></m:mrow><m:mi>m</m:mi></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabdQgaQjabdghaXbqaaiabd2gaTbaaaaa@30E7@</m:annotation></m:semantics></m:math>}, where 0 &lt;<it>q </it>&lt; 1 is any fixed FDR and <it>m </it>is the length of the multiple sequence alignment.</p>
            <p>3. If positive <it>j</it>* exists, choose positions associated with P<sub>(1)</sub>, <it>P</it><sub>(2)</sub>, ..., <it>P</it><sub>(<it>j</it>*) </sub>as conserved.</p>
            <p>With the help of the number of the conserved positions <it>j</it>*, we obtain the proportion of conserved residues <it>ConsAA </it>by dividing the number of residues in the conserved positions by the number of residues in the whole alignment</p>
            <p>
               <m:math name="1471-2105-7-484-i22" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>C</m:mi>
                        <m:mi>o</m:mi>
                        <m:mi>n</m:mi>
                        <m:mi>s</m:mi>
                        <m:mi>A</m:mi>
                        <m:mi>A</m:mi>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:msubsup>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>j</m:mi>
                                       <m:mo>&#8727;</m:mo>
                                    </m:mrow>
                                 </m:msubsup>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>n</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:msubsup>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>m</m:mi>
                                 </m:msubsup>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mi>n</m:mi>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mfrac>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mrow>
                              <m:mn>11</m:mn>
                           </m:mrow>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGdbWqcqWGVbWBcqWGUbGBcqWGZbWCcqWGbbqqcqWGbbqqcqGH9aqpdaWcaaqaamaaqadabaGaemOBa42aaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemOAaOMaey4fIOcaniabggHiLdaakeaadaaeWaqaaiabd6gaUnaaBaaaleaacqWGPbqAaeqaaaqaaiabdMgaPjabg2da9iabigdaXaqaaiabd2gaTbqdcqGHris5aaaakiabc6caUiaaxMaacaWLjaWaaeWaaeaacqaIXaqmcqaIXaqmaiaawIcacaGLPaaaaaa@4EDE@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Here <it>n</it><sub><it>i </it></sub>denotes the number of residues at the position <it>i</it>.</p>
            <p>By comparing the <it>ConsAAs </it>calculated from the test and reference alignments, we can define the alignment quality <it>AQ </it>score as</p>
            <p><it>AQ </it>= [1 - (|<it>ConsAA</it><sub><it>ref </it></sub>- <it>ConsAA</it><sub><it>test</it></sub>|/<it>ConsAA</it><sub><it>ref</it></sub>)] * 100. &#160;&#160;&#160; (12)</p>
            <p>The <it>AQ </it>score addresses to the question how many percent the test ConsAA is from the reference ConsAA. It is presumed that the higher the AQ value the better is the quality of the alignment. Note that the AQ score does not require the conserved positions to be the same in the test and reference alignments, only the number of the conserved residues counts.</p>
            <sec>
               <st>
                  <p>Multiple sequence alignment programs</p>
               </st>
               <p>We compared the alignment quality of six multiple sequence alignment programs which have been widely used in bioinformatics: Clustal <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, TCoffee <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, Dialign2 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, ProbCons<abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, Muscle <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, and Mafft <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. We used the default settings of the programs (Table <tblr tid="T3">3</tblr>). Out of 7 possible alignment strategies with Mafft, we chose L-INS-i, the most accurate method at the moment, and FFT-NS-2, the default method.</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Alignment programs and parameters used.</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="center">
                           <p>Program</p>
                        </c>
                        <c ca="left">
                           <p>Version</p>
                        </c>
                        <c ca="left">
                           <p>Parameters (strategy)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Clustal[9]</p>
                        </c>
                        <c ca="left">
                           <p>1.83</p>
                        </c>
                        <c ca="left">
                           <p>default</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>TCoffee[10]</p>
                        </c>
                        <c ca="left">
                           <p>2.66</p>
                        </c>
                        <c ca="left">
                           <p>default</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Dialign2[11]</p>
                        </c>
                        <c ca="left">
                           <p>2.2.1</p>
                        </c>
                        <c ca="left">
                           <p>default</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Probcons [12]</p>
                        </c>
                        <c ca="left">
                           <p>1.10</p>
                        </c>
                        <c ca="left">
                           <p>default</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Muscle[13]</p>
                        </c>
                        <c ca="left">
                           <p>3.52</p>
                        </c>
                        <c ca="left">
                           <p>default</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Mafft[14, 15]</p>
                        </c>
                        <c ca="left">
                           <p>5.667</p>
                        </c>
                        <c ca="left">
                           <p>-localpair -maxiterate 1000 (L-INS-i)</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Mafft[14, 15]</p>
                        </c>
                        <c ca="left">
                           <p>5.667</p>
                        </c>
                        <c ca="left">
                           <p>default (FFT-NS-2)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Benchmarking database</p>
               </st>
               <p>We used the BAliBASE 3.0 database to test the alignment quality of the alignment programs. The BAliBASE is built for comparing alignment programs <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The database contains 218 multiple protein sequence alignments which have been divided into 5 reference sets. The reference set 1 includes equi-distant sequences, whose identity is less than 20% (ref 11) or between 20 and 40% (ref 12). The 2nd reference set consists of families aligned with a highly divergent "orphan" sequence. The 3rd reference set includes subgroups of sequences whose residue identity between the subgroups is less than 25%. The sequences of the 4th reference set contains N/C-terminal extensions. The 5th reference set consists of sequences with internal insertions.</p>
               <p>Each alignment in the BAliBASE has two versions: one with full-length sequences and another with truncated sequences containing the sequences corresponding to the homologous regions only. We used only the truncated sequences, except in reference set 4, which only contains the full-length sequences. The BAliBASE annotates reliably aligned regions as core blocks. As in most of the studies using BAliBASE, we compared the alignment programs using the core block sequences only.</p>
               <p>The BALiBASE provides a program called bali_score for calculating the SP and CS quality measures for the test alignment <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. We used both of these scores for benchmarking.</p>
            </sec>
            <sec>
               <st>
                  <p>Comparison procedure</p>
               </st>
               <p>In order to compare the quality of the alignments, the alignment programs were used to align each family in the BAliBASE database. The significance of the observed maxZ score was calculated for each alignment position. The proportion of conserved residues <it>ConsAA </it>was then calculated using 15 different FDRs varying from 0.01 to 0.15. The AQ score was calculated using the core blocks of each alignment. The core blocks of the new alignments were determined to consist of positions including one or more residues in the core block of the reference alignment. Additionally, the SP and CS scores were calculated for the core blocks using bali_score program.</p>
               <p>The AQ, SP and CS scores were calculated for the 7 test alignments and the BAliBASE reference alignments for each set of sequences in the five reference sets. The results are presented as medians within each reference because the distributions of the AQ score values in the references were skewed. The statistical significance of the differences between the alignment programs were tested at FDR = 0.05 using Wilcoxon signed rank test. This test statistic was chosen because the same sequences were used in each method, and hence the scores could not be considered to be independent. The Bonferroni correction was used within each reference set to correct the effect of making multiple tests simultaneously. In the results section, the corrected p-values less than 0.05 were considered as statistically significant. For comparison of the relationship between the individual AQ and SP scores, we calculated the Spearman rank correlation coefficient separately for each 7 alignment methods.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>All authors participated in conceiving and designing the manuscript. VA, TA and EU carried out the theoretical considerations. VA performed the computational experiments and statistical analysis. VA, TA and MV drafted the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank Pentti Riikonen for help with the MultiDisp graphics program. This work was supported by a grant from the Graduate School in Computational Biology, Bioinformatics and Biometry (ComBi). Financial support from the Academy of Finland, the National Technology Agency and the Medical Research Fund of Tampere University Hospital is gratefully acknowledged.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>A comprehensive comparison of multiple sequence alignment programs</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>2682</fpage>
            <lpage>2690</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148477</pubid>
                  <pubid idtype="pmpid" link="fulltext">10373585</pubid>
                  <pubid idtype="doi">10.1093/nar/27.13.2682</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set</p>
            </title>
            <aug>
               <au>
                  <snm>Karplus</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>713</fpage>
            <lpage>720</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.8.713</pubid>
                  <pubid idtype="pmpid" link="fulltext">11524372</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Quality assessment of multiple alignment programs</p>
            </title>
            <aug>
               <au>
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2002</pubdate>
            <volume>529</volume>
            <fpage>126</fpage>
            <lpage>130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(02)03189-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12354624</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>APDB: a novel measure for benchmarking sequence alignment methods without reference alignments</p>
            </title>
            <aug>
               <au>
                  <snm>O'Sullivan</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Zehnder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Grosdidier</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>i215</fpage>
            <lpage>221</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg1029</pubid>
                  <pubid idtype="pmpid" link="fulltext">12855461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Automatic assessment of alignment quality</p>
            </title>
            <aug>
               <au>
                  <snm>Lassmann</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>7120</fpage>
            <lpage>7128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1316116</pubid>
                  <pubid idtype="pmpid" link="fulltext">16361270</pubid>
                  <pubid idtype="doi">10.1093/nar/gki1020</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Profile analysis &#8211; detection of distantly related proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Gribskov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McLachlan</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1987</pubdate>
            <volume>84</volume>
            <fpage>4355</fpage>
            <lpage>4358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">305087</pubid>
                  <pubid idtype="pmpid" link="fulltext">3474607</pubid>
                  <pubid idtype="doi">10.1073/pnas.84.13.4355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Using the SIR algorithm to simulate posterior distributions</p>
            </title>
            <aug>
               <au>
                  <snm>Rubin</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Bayesian Statistics 3</source>
            <publisher>Oxford UK: Oxford University Press</publisher>
            <editor>Bernardo MH, an DeGroot KM, Lindley CV, Smith AFM</editor>
            <pubdate>1988</pubdate>
            <fpage>395</fpage>
            <lpage>402</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Controlling the false discovery rate &#8211; a practical and powerful approach to multiple testing</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>J R Stat Soc Ser B</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jeanmougin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>4876</fpage>
            <lpage>4882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147148</pubid>
                  <pubid idtype="pmpid" link="fulltext">9396791</pubid>
                  <pubid idtype="doi">10.1093/nar/25.24.4876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>T-Coffee: a novel method for fast and accurate multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Notredame</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>302</volume>
            <fpage>205</fpage>
            <lpage>217</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4042</pubid>
                  <pubid idtype="pmpid" link="fulltext">10964570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>211</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/15.3.211</pubid>
                  <pubid idtype="pmpid" link="fulltext">10222408</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>ProbCons: probabilistic consistency-based multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Do</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Mahabhashyam</snm>
                  <fnm>MSP</fnm>
               </au>
               <au>
                  <snm>Brudno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Batzoglou</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>330</fpage>
            <lpage>340</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">546535</pubid>
                  <pubid idtype="pmpid" link="fulltext">15687296</pubid>
                  <pubid idtype="doi">10.1101/gr.2821705</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>MUSCLE: multiple sequence alignment with high accuracy and high throughput</p>
            </title>
            <aug>
               <au>
                  <snm>Edgar</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>1792</fpage>
            <lpage>1797</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">390337</pubid>
                  <pubid idtype="pmpid" link="fulltext">15034147</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform</p>
            </title>
            <aug>
               <au>
                  <snm>Katoh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Misawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kuma</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Miyata</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>3059</fpage>
            <lpage>3066</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">135756</pubid>
                  <pubid idtype="pmpid" link="fulltext">12136088</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>MAFFT version 5: improvement in accuracy of multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Katoh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kuma</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Toh</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Miyata</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>511</fpage>
            <lpage>518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">548345</pubid>
                  <pubid idtype="pmpid" link="fulltext">15661851</pubid>
                  <pubid idtype="doi">10.1093/nar/gki198</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Multiple sequence alignment: algorithms and 
applications</p>
            </title>
            <aug>
               <au>
                  <snm>Gotoh</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Advances in Biophysics</source>
            <pubdate>1999</pubdate>
            <volume>36</volume>
            <fpage>159</fpage>
            <lpage>206</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0065-227X(99)80007-0</pubid>
                  <pubid idtype="pmpid">10463075</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Scoring residue conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Valdar</snm>
                  <fnm>WSJ</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2002</pubdate>
            <volume>48</volume>
            <fpage>227</fpage>
            <lpage>241</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10146</pubid>
                  <pubid idtype="pmpid" link="fulltext">12112692</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Database of homology-derived protein structures and the structural meaning of sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1991</pubdate>
            <volume>9</volume>
            <fpage>56</fpage>
            <lpage>68</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.340090107</pubid>
                  <pubid idtype="pmpid">2017436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Information-theoretical entropy as a measure of sequence variability</p>
            </title>
            <aug>
               <au>
                  <snm>Shenkin</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Erman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mastrandrea</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1991</pubdate>
            <volume>11</volume>
            <fpage>297</fpage>
            <lpage>313</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.340110408</pubid>
                  <pubid idtype="pmpid">1758884</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Identifying DNA and protein patterns with statistically significant alignments of multiple sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Hertz</snm>
                  <fnm>GZ</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <fpage>563</fpage>
            <lpage>577</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/15.7.563</pubid>
                  <pubid idtype="pmpid" link="fulltext">10487864</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Detecting subtle sequence signals &#8211; a Gibbs sampling strategy for multiple alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Boguski</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Neuwald</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Wootton</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1993</pubdate>
            <volume>262</volume>
            <fpage>208</fpage>
            <lpage>214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.8211139</pubid>
                  <pubid idtype="pmpid" link="fulltext">8211139</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The classification of amino-acid conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>J Theor Biol</source>
            <pubdate>1986</pubdate>
            <volume>119</volume>
            <fpage>205</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-5193(86)80075-3</pubid>
                  <pubid idtype="pmpid">3461222</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Prediction of protein secondary structure and active sites using the alignment of homologous sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Zvelebil</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Sternberg</snm>
                  <fnm>MJE</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1987</pubdate>
            <volume>195</volume>
            <fpage>957</fpage>
            <lpage>961</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(87)90501-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">3656439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function</p>
            </title>
            <aug>
               <au>
                  <snm>Mirny</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Shakhnovich</snm>
                  <fnm>EI</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>291</volume>
            <fpage>177</fpage>
            <lpage>196</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2911</pubid>
                  <pubid idtype="pmpid" link="fulltext">10438614</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Protein-sequence alignments &#8211; a strategy for the hierarchical analysis of residue conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Livingstone</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1993</pubdate>
            <volume>9</volume>
            <fpage>745</fpage>
            <lpage>756</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8143162</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Performance evaluation of amino-acid substitution matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1993</pubdate>
            <volume>17</volume>
            <fpage>49</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.340170108</pubid>
                  <pubid idtype="pmpid">8234244</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Amino-acid substitution during functionally constrained divergent evolution of protein sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Benner</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Gonnet</snm>
                  <fnm>GH</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1994</pubdate>
            <volume>7</volume>
            <fpage>1323</fpage>
            <lpage>1332</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7700864</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A model of evolutionary change in proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Dayhoff</snm>
                  <fnm>MO</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Orcutt</snm>
                  <fnm>BC</fnm>
               </au>
            </aug>
            <source>Atlas of protein sequence and structure</source>
            <publisher>Washington DC: National biomedical research foundation</publisher>
            <editor>Dayhoff MO</editor>
            <pubdate>1978</pubdate>
            <volume>5</volume>
            <fpage>345</fpage>
            <lpage>358</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The multiple sequence alignment problem in biology</p>
            </title>
            <aug>
               <au>
                  <snm>Carrillo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>SIAM J Appl Math</source>
            <pubdate>1988</pubdate>
            <volume>48</volume>
            <fpage>1073</fpage>
            <lpage>1082</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1137/0148063</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Towards a reliable objective function for multiple sequence alignments</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ripp</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Thierry</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>314</volume>
            <fpage>937</fpage>
            <lpage>951</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5187</pubid>
                  <pubid idtype="pmpid" link="fulltext">11734009</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>AL2CO: calculation of positional conservation in a protein sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Pei</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>700</fpage>
            <lpage>712</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.8.700</pubid>
                  <pubid idtype="pmpid" link="fulltext">11524371</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Statistical methods for identifying conserved residues in multiple sequence alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Ahola</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Aittokallio</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Uusipaikka</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Vihinen</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Stat Appl Genet Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>1</issue>
            <fpage>Article28</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.2202/1544-6115.1074</pubid>
                  <pubid idtype="pmpid">16646807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Efficient estimation of emission probabilities in profile hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Ahola</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Aittokallio</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Uusipaikka</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Vihinen</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>2359</fpage>
            <lpage>2368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg328</pubid>
                  <pubid idtype="pmpid" link="fulltext">14668219</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The Pfam protein families database</p>
            </title>
            <aug>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Coin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Durbin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Finn</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Hollich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marshaff</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sonnhammer</snm>
                  <fnm>ELL</fnm>
               </au>
               <au>
                  <snm>Studholme</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Yeats</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D138</fpage>
            <lpage>D141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308855</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681378</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh121</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Identification of functionally conserved residues with the use of entropy-variability plots</p>
            </title>
            <aug>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Paiva</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Paiva</snm>
                  <fnm>ACM</fnm>
               </au>
               <au>
                  <snm>Vriend</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <fpage>544</fpage>
            <lpage>552</lpage>
            <url>http://www.gpcr.org/articles/2002_1/index.html</url>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10490</pubid>
                  <pubid idtype="pmpid" link="fulltext">12910454</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>A common motif in G-protein-coupled 7 transmembrane helix receptors</p>
            </title>
            <aug>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Paiva</snm>
                  <fnm>ACM</fnm>
               </au>
               <au>
                  <snm>Vriend</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Comput Aided Mol Des</source>
            <pubdate>1993</pubdate>
            <volume>7</volume>
            <fpage>649</fpage>
            <lpage>658</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF00125323</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>MultiDisp graphics program</p>
            </title>
            <url>http://bioinf.uta.fi/cgi-bin/MultiDisp.cgi</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Conservation and covariance in PH domain sequences: physicochemical profile and information theoretical analysis of XLA-causing mutations in the Btk PH domain</p>
            </title>
            <aug>
               <au>
                  <snm>Shen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Vihinen</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Protein Eng Des Sel</source>
            <pubdate>2004</pubdate>
            <volume>17</volume>
            <issue>3</issue>
            <fpage>267</fpage>
            <lpage>276</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/gzh030</pubid>
                  <pubid idtype="pmpid" link="fulltext">15082835</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>SH2 domains recognize specific phosphopeptide sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Songyang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Shoefson</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Chaudhuri</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pawson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Haser</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ratnofsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lechleider</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Neel</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Birge</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Fajardo</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Hanafusa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schaffhausen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cantley</snm>
                  <fnm>LC</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>72</volume>
            <fpage>767</fpage>
            <lpage>778</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90404-E</pubid>
                  <pubid idtype="pmpid">7680959</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>SH2 domains, interaction modules and cellular wiring</p>
            </title>
            <aug>
               <au>
                  <snm>Pawson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Nash</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Trends Cell Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>504</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0962-8924(01)02154-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">11719057</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Molecular recognition by SH2 domains</p>
            </title>
            <aug>
               <au>
                  <snm>Bradshaw</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Waksman</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Adv Protein Chem</source>
            <pubdate>2002</pubdate>
            <volume>61</volume>
            <fpage>161</fpage>
            <lpage>210</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12461824</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Binding of a high-affinity phosphotyrosyl peptide to the Src Sh2 domain &#8211; crystal-structures of the complexed and peptide-free forms</p>
            </title>
            <aug>
               <au>
                  <snm>Waksman</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Shoelson</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Pant</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cowburn</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kuriyan</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>72</volume>
            <fpage>779</fpage>
            <lpage>790</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90405-F</pubid>
                  <pubid idtype="pmpid" link="fulltext">7680960</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>3-Dimensional solution structure of the Src homology-2 domain of C-Abl</p>
            </title>
            <aug>
               <au>
                  <snm>Overduin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rios</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Baltimore</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cowburn</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1992</pubdate>
            <volume>70</volume>
            <fpage>697</fpage>
            <lpage>704</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(92)90437-H</pubid>
                  <pubid idtype="pmpid" link="fulltext">1505033</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>M13 endopeptidases: new conserved motifs correlated with structure, and simultaneous phylogenetic occurrence of PHEX and the bony fish</p>
            </title>
            <aug>
               <au>
                  <snm>Bianchetti</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Oudet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2002</pubdate>
            <volume>47</volume>
            <fpage>481</fpage>
            <lpage>488</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10075</pubid>
                  <pubid idtype="pmpid" link="fulltext">12001226</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations</p>
            </title>
            <aug>
               <au>
                  <snm>Bahr</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Thierry</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>323</fpage>
            <lpage>326</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29792</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125126</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.323</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Maximum likelihood estimation by counting methods under polygenic and mixed models in human pedigrees</p>
            </title>
            <aug>
               <au>
                  <snm>Ott</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>1979</pubdate>
            <volume>31</volume>
            <fpage>161</fpage>
            <lpage>175</lpage>
            <xrefbib>
               <pubid idtype="pmpid">453201</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Importance sampling <it>I</it>: Computing multimodel-P values in linkage analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Kong</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Frigge</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Irwin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Am J Hum Genet</source>
            <pubdate>1992</pubdate>
            <volume>51</volume>
            <fpage>1413</fpage>
            <lpage>1429</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1463020</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>mafft 5.7</p>
            </title>
            <url>http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft/</url>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Discussion of paper by MS Bartlett</p>
            </title>
            <aug>
               <au>
                  <snm>Bernard</snm>
                  <fnm>GA</fnm>
               </au>
            </aug>
            <source>J R Stat Soc Ser B</source>
            <pubdate>1963</pubdate>
            <volume>25</volume>
            <fpage>294</fpage>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Bayesian statistics without tears &#8211; a sampling resampling perspective</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>AFM</fnm>
               </au>
               <au>
                  <snm>Gelfand</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>American Statistician</source>
            <pubdate>1992</pubdate>
            <volume>46</volume>
            <fpage>84</fpage>
            <lpage>88</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2684170</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <aug>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tamhane</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>Multiple comparison procedures</source>
            <publisher>New York: John Wiley &amp; Sons</publisher>
            <pubdate>1987</pubdate>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The control of the false discovery rate in multiple testing under dependency</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yekutieli</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Annals of Statistics</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>1165</fpage>
            <lpage>1188</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1214/aos/1013699998</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
