<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-44</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Pinto</snm>
               <mi>R</mi>
               <fnm>Francisco</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>fpinto@fm.ul.pt</email>
            </au>
            <au id="A2">
               <snm>Carri&#231;o</snm>
               <mi>A</mi>
               <fnm>Jo&#227;o</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>jcarrico@vinci.inesc-id.pt</email>
            </au>
            <au id="A3">
               <snm>Ramirez</snm>
               <fnm>M&#225;rio</fnm>
               <insr iid="I1"/>
               <email>ramirez@fm.ul.pt</email>
            </au>
            <au id="A4">
               <snm>Almeida</snm>
               <mi>S</mi>
               <fnm>Jonas</fnm>
               <insr iid="I2"/>
               <insr iid="I4"/>
               <email>jalmeida@mdanderson.org</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina de Lisboa, Av. Professor Egas Moniz, 1649-028 Lisboa, Portugal</p>
            </ins>
            <ins id="I2">
               <p>Grupo de Biomatem&#225;tica, Instituto de Tecnologia Qu&#237;mica e Biol&#243;gica, R. Quinta Grande, 6, 2780 Oeiras, Portugal</p>
            </ins>
            <ins id="I3">
               <p>Instituto de Engenharia de Sistemas e Computadores: Investiga&#231;&#227;o e Desenvolvimento (INESC-ID), R. Alves Redol 9, 1000-029 Lisboa, Portugal</p>
            </ins>
            <ins id="I4">
               <p>Department of Biostatistics, and Applied Mathematics, Univ. Texas, MDAnderson Cancer Center, Houston, Texas, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>44</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/44</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17286861</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-44</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>04</day>
               <month>10</month>
               <year>2006</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>07</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>07</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Pinto et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here, we describe a novel measure &#8211; the Ranked Adjusted Rand (<it>RAR</it>) index. <it>RAR </it>differs from existing methods by evaluating the extent of agreement between any two groupings, taking into account the intercluster distances. This characteristic is relevant to evaluate cases of pairs of entities grouped in the same cluster by one method and separated by another. The latter method may assign them to close neighbour clusters or, on the contrary, to clusters that are far apart from each other. <it>RAR </it>is applicable even when intercluster distance information is absent for both or one of the groupings. In the first case, <it>RAR </it>is equal to its predecessor, Adjusted Rand (<it>HA</it>) index. Artificially designed clusterings were used to demonstrate situations in which only <it>RAR </it>was able to detect differences in the grouping patterns. A study with larger simulated clusterings ensured that in realistic conditions, <it>RAR </it>is effectively integrating distance and partition information. The new method was applied to biological examples to compare 1) two microbial typing methods, 2) two gene regulatory network distances and 3) microarray gene expression data with pathway information. In the first application, one of the methods does not provide intercluster distances while the other originated a hierarchical clustering. <it>RAR </it>proved to be more sensitive than <it>HA </it>in the choice of a threshold for defining clusters in the hierarchical method that maximizes agreement between the results of both methods.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p><it>RAR </it>has its major advantage in combining cluster distance and partition information, while the previously available methods used only the latter. <it>RAR </it>should be used in the research problems were <it>HA </it>was previously used, because in the absence of inter cluster distance effects it is an equally effective measure, and in the presence of distance effects it is a more complete one.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Grouping individual entities into sets with identical properties is a recurrent task in bioinformatics, taxonomy and phylogeny studies. When there are <it>a priori </it>reasons that allow the identification of properties that define each group, it is possible to use classification algorithms to distribute individuals among the possible classes. In other situations, different classes are defined without the absolute knowledge of what properties (and values of those properties) could identify "natural" classes. The usual procedure is the collection of data characterizing each individual, relate every pair of individuals through a distance measure computed from the data and perform clustering algorithms to find a "natural" grouping structure of those individuals based on the collected data. For simplicity, most of the remaining manuscript will use the term clustering, but the problems and methods presented are also applicable to classifications.</p>
         <p>In some well established fields, researchers may assume a "gold standard" classification. If such gold standard is available, clustering results based on a particular kind of data can then be evaluated against it. False positives and false negatives can be identified and counted, enabling the computation of several related statistics. Even when gold standards are not available, different clusterings still need to be compared. Facing two different data sources characterizing the same set of biological entities and producing two different clusterings, one may wish to know to what extent and under which conditions one can maximize agreement or disagreement between two clusterings. This information may be useful to decide if it is worthwhile to collect and analyse both data sources since if their results are in complete agreement, then it may be enough to collect data from a single source. On the other hand, if the two clusterings disagree, combining their results may offer additional information and discriminatory power. Additionally, if the two data sources carry independent information, clusters that have a good match in both clusterings can be more reliable than clusters resulting from one data source alone.</p>
         <sec>
            <st>
               <p>From previous measures to Ranked Adjusted Rand</p>
            </st>
            <p>Since the 70's researchers in statistics, psychology and biology, have developed methods to compare clusterings. If distance matrices between individual entities are available for both clusterings, it may be possible to directly correlate the pairwise distances <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. But more frequently, researchers are interested in knowing if the resulting groups are similar or not. It is also possible to have highly correlated distance matrices that give rise to very different partitions due to scale heterogeneity in the distance values. Hence, the methods presented in the literature have been focused in the comparison of partitions (also designated flat clusterings), neglecting the closeness relationships between clusters. There are two main families of methods comparing partitions. One evaluating pairwise agreement (Rand, Adjusted Rand, Fowlkes-Mallows, Jaccard and Wallace indices) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>, the other searching for clusterwise agreement (Larsen, Meila's variation of information and Van Dongen indices) <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. In both families, some methods are asymmetric, that is, the agreement of clustering <it>A </it>with <it>B </it>is different of the agreement of <it>B </it>with <it>A </it><abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. This asymmetry can be helpful if the symmetric methods are being effected by the different discriminatory power of the two clusterings. Clusterwise methods are computed from a contingency table (<it>CT</it>, Table <tblr tid="T1">1</tblr>) that contains the dual classification of each individual entity in both clusterings, while pairwise methods are computed from a 2 by 2 mismatch matrix (<it>MM</it>, Table <tblr tid="T2">2</tblr>), derivable from the <it>CT</it>. Each of the four cells of <it>MM </it>count the pairs of entities that belong or not to the same cluster in either of the two clusterings. None of the two matrices <it>CT </it>or <it>MM </it>contain any information about the relatedness of the different clusters.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Contingency Table (<it>CT</it>).</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4" ca="center">
                        <p>
                           <it>C'</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>C'</it>
                           <sub>1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>C' </it>
                           <sub>2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>C'</it>
                           <sub>
                              <it>K'</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>C </it>marginal totals</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>C</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>C</it>
                           <sub>1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub>1,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub>1,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub>1,<it>K'</it></sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n</it>
                           <sub>1</sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>C</it>
                           <sub>2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub>2,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub>2,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub>2,<it>K'</it></sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n</it>
                           <sub>2</sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>C</it>
                           <sub>
                              <it>K</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub><it>k</it>,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub><it>k</it>,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>ct</it>
                           <sub><it>K</it>,<it>K'</it></sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n</it>
                           <sub>
                              <it>K</it>
                           </sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="center">
                        <p><it>C' </it>marginal totals</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n'</it>
                           <sub>1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n'</it>
                           <sub>2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n'</it>
                           <sub>
                              <it>K'</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>n</it>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Table used for the computation of cluster-wise measures of clustering agreement. <it>ct</it><sub><it>ij </it></sub>is the number of entities that are both in cluster <it>C</it><sub><it>i </it></sub>and <it>C'</it><sub><it>j</it></sub>.</p>
               </tblfn>
            </tbl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Mismatch Matrix (<it>MM</it>).</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <it>c'</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Match</p>
                     </c>
                     <c ca="center">
                        <p>Mismatch</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>c</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Match</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>a</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>b</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Mismatch</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>c</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>d</it>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>An auxiliary matrix for the computation of pair-wise measurements of clustering agreement. <it>a</it>, <it>b</it>, <it>c </it>and <it>d </it>represent counts of unique entity pairs.</p>
               </tblfn>
            </tbl>
            <p>Although the research in this area has produced many different methods, the classical methods are the most frequently referred, as an example, in a recent reference book on microarray data analysis, the only presented method to compare clusterings is equivalent to the Rand index <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. On the other hand there is no general consensus on the choice of the method to compare clusterings, and active research on alternative methods was motivated by microarray and systems biology approaches <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. It should be noted that the methods discussed here are not evaluating the quality or validating clustering algorithms. Instead the aim is to confront information of clusterings obtained from different data sources. It is also not a direct aim to achieve a combined better clustering closer to a hypothetical true classification. Nonetheless, these are possible secondary applications that are not tested in the present report. Additionally, researchers comparing clustering results should be aware that the measured levels of agreement could be strongly influenced by the inherent quality of the individual clusterings and by the type and quality of the datasets that originated the analysed clusterings.</p>
            <p>The motivation to develop a new method stemmed from the observation that, for the available measures, when pairs of entities are in the same cluster on one clustering, and in different clusters on the other, it is considered irrelevant if these clusters are close neighbours or, on the contrary, very distant. A solution for such a problem was developed in a related subject, the quantification of the agreement of different observers performing a diagnostic test <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. When the test has multiple possible categories with an ordinal relation (of disease severity, for example), weights are attributed to different degrees of disagreement. Minimal (when one observer chooses one category close to the one chosen by the other observer), intermediate and maximal disagreement (when the two observers choose categories in the two extremes of the ordinal scale), and these contribute proportionally to the overall measure of agreement computed.</p>
            <p>The use of a similar weighting strategy in the comparison of clusterings is not directly applicable. First, and in contrast to the observer agreement case, the two clusterings are not forced to have the same number of groups. Second, there is no predetermined correspondence between the clusters in both clusterings. Third, the closeness relationships between clusters are frequently more complex (needing two or more dimensions to be correctly represented) than the simple ordinal scale of diagnostic categories (an unidimensional representation). The main achievements of the proposed measure are the solutions for these three problems. It consists on the definition of a new way to record pairwise agreements in a Ranked Mismatch Matrix (<it>RMM</it>), enabling the combined accounting of partition and intercluster distance information in the computation of an overall clustering agreement measure. The new measure was named Ranked Adjusted Rand (<it>RAR</it>), because it can be considered an expansion of the previous Hubert and Arabie (<it>HA</it>) adjusted Rand index and both measures are equivalent when there is no intercluster distance information available for both clusterings.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Interpretation of <it>MDD </it>and <it>RAR </it>values</p>
            </st>
            <p>The Methods section describes how to compute <it>RAR </it>from a Ranked Mismatch Matrix (<it>RMM</it>, represented in Table <tblr tid="T3">3</tblr>) and the quantities Mean Diagonal Deviation (<it>MDD</it>) and expected <it>MDD </it>under independence of clusterings (<it>MDD</it><sup><it>ind</it></sup>). <it>MDD </it>can be interpreted as the expected change in intercluster distance rank for a randomly chosen pair of entities. Considering one entity pair (<it>a</it>, <it>b</it>). If in clustering <it>C</it>, <it>b </it>belongs to the <it>r</it><sup><it>th </it></sup>closest cluster to <it>a</it>'s cluster, then it is expected that in <it>C'</it>, <it>b </it>is in the <it>r </it>&#177; (<it>MDD</it><sup><it>ind </it></sup>&#215; <it>K'</it>)<sup><it>th </it></sup>closest cluster to <it>a</it>'s cluster (<it>K' </it>is the number of clusters in <it>C'</it>). This kind of interpretation can be very useful if the aim is to predict the clusters obtained with one technique or data source using the clustering information obtained by a different technique or data source. <it>MDD </it>can take the value 1 only in a single situation: when in one clustering all entities are in the same clustering and in the other, every entity is in its own cluster, and all clusters are equally distant from each other. On the other hand, a <it>MDD </it>value of 0 corresponds to two clusterings with exactly identical partitions and equally ranked relative distances between clusters. The <it>RAR </it>values compare the observed <it>MDD </it>values with the theoretical <it>MDD </it>value if the assignment of entities to clusters was independent in both clusterings (the agreement in both clusterings would only be due to chance alone). The maximum value taken by <it>RAR </it>is 1, when <it>MDD </it>is 0. If <it>MDD </it>&lt;<it>MDD</it><sup><it>ind</it></sup>, the average entity pair tends to have smaller intercluster distance rank changes from one clustering to the other than it would have in the independence situation. In this case <it>RAR </it>takes positive values, meaning that the clusterings are more similar than expected by chance agreement. If <it>MDD </it>> <it>MDD</it><sup><it>ind</it></sup>, <it>RAR </it>takes negative values, meaning that the deviation from perfect agreement is greater than expected by chance. The two last situations imply a very similar interpretation to the <it>HA </it>adjusted Rand case. <it>RAR </it>values are certainly less intuitive than interpreting simple Rand, but <it>RAR </it>provides more rich information about clustering agreement. <it>RAR </it>will be especially useful to distinguish situations in which <it>HA </it>or other measures are almost or completely identical. For these reasons we are proposing to use <it>RAR </it>in addition to previously available measures.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Ranked Mismatch Matrix (<it>RMM</it>).</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5" ca="center">
                        <p>
                           <it>C'</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Match</p>
                     </c>
                     <c ca="center">
                        <p>Mismatch 1</p>
                     </c>
                     <c ca="center">
                        <p>Mismatch 2</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>Mismatch <sub><it>q</it></sub></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>
                           <it>C</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Match</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>1,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>1,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>1,3</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>1,<it>q</it>+1</sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Mismatch 1</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>2,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>2,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>2,3</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>2,<it>q</it>+1</sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Mismatch 2</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>3,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>3,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>3,3</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub>3,<it>q</it>+1</sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Mismatch <sub><it>p</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub><it>p</it>+1,1</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub><it>p</it>+1,2</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub><it>p</it>+1,3</sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>...</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>rmm</it>
                           <sub><it>p</it>+1,<it>q</it>+1</sub>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>An auxiliary expanded matrix for the computation of pair-wise measurements of clustering agreement, accounting for intercluster distances. The mismatch row <it>i </it>indicates that the two entities are in different clusters <it>C</it><sub><it>x </it></sub>and <it>C</it><sub><it>y</it></sub>, and <it>C</it><sub><it>y </it></sub>is the <it>i</it><sup><it>th </it></sup>closest cluster to <it>C</it><sub><it>x</it></sub>. Column meaning is analogous.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p><it>RAR </it>in the absence of intercluster distance information</p>
            </st>
            <p>When both clusterings being compared are flat, that is, when there is no intercluster distance information for any of them, two entities can only be either in the same cluster or in equally dissimilar clusters. <it>RMM </it>becomes identical to <it>MM</it>. In that situation, <it>1-MDD </it>is equal to the Rand index. Analogously, <it>RAR </it>becomes <it>HA</it>, since the correction for chance agreement is similar for both measures in the absence of any intercluster distance information. Proof of this equivalence is presented in Additional file <supplr sid="S1">1</supplr>.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>Demonstration</b>. Pdf file with demonstration of <it>RAR </it>and <it>HA </it>equivalence in the absence of intercluster distance infromation.</p>
               </text>
               <file name="1471-2105-8-44-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p><it>RAR </it>with incomplete intercluster distance information</p>
            </st>
            <p>A major potential application of <it>RAR </it>is the comparison of a flat clustering with other for which interclusters distance information is available. One can use previously developed partition comparison measures to do this but the distance information available is neglected. However, <it>RAR </it>is able to compare both clusterings including the partial distance information available. The resulting <it>RMM </it>will have 2 &#215; (<it>q</it>+1) (or (<it>p</it>+1) &#215; 2) dimensions and it is possible to evaluate if the mismatches of the flat clustering tend to originate mismatches with larger rank differences in the other clustering than the flat matches.</p>
            <p>This and the previous sections discussed the use of <it>RAR </it>when there is a partial or total absence of distance information. However, in most of the situations that Rand or Adjusted Rand indexes have been used, information about distance was indeed available. For an example see reference <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. This is the most frequent situation when the partitions compared where produced by clustering algorithms. The clustering algorithm needs an inter-entity distance matrix, and this matrix is sufficient to derive the intercluster distances used for <it>RAR </it>computation. When the partitions are defined by classification methods, it may still be possible to have distance information, depending on the properties used to classify entities.</p>
         </sec>
         <sec>
            <st>
               <p><it>RAR </it>with ties in intercluster distances</p>
            </st>
            <p>A positive feature of the <it>R</it>(<it>i</it>, <it>j</it>) is that it is unnecessary to define rules to deal with rank ties. If a cluster has two neighbour clusters at the same distance, they will have the same intercluster distance rank. The only consequence is that the maximum intercluster distance rank (<it>p</it>+1 or <it>q</it>+1) decreases with the number of ties. <it>RMM </it>will have <it>p</it>+1 = <it>K </it>rows and <it>q</it>+1 = <it>K' </it>columns in the absence of ties in intercluster distance ranks. Each tie in <it>C </it>will reduce one row and each tie in <it>C' </it>will reduce one column to <it>RMM</it>. A higher number of ties can be due to a more discrete intercluster distance function and will produce a reduced <it>RMM </it>that is more similar to <it>MM</it>. The existence of ties is then responsible for the approximation of <it>RAR </it>to <it>HA</it>. This is consistent with the fact that more ties are a consequence of a lower resolution of the metric used to define the intercluster distance function. The minimal resolution corresponds to a binary distance function that can be 0 (same cluster) or 1 (different cluster) &#8211; that is, when <it>RAR </it>is equal to <it>HA</it>, as discussed previously.</p>
         </sec>
         <sec>
            <st>
               <p>Design of small scale examples</p>
            </st>
            <p>To clearly show the desirable properties of the <it>RAR </it>measure compared with previously available methods, four theoretical simple clusterings were created (Figure <figr fid="F1">1</figr>). One of the four, clustering <it>A</it>, is the original one, with 9 entities divided in 3 clusters. The position of the points in each clustering is relevant. Two points that are more distant are less similar. Clusterings <it>B </it>to <it>D </it>were originated from <it>A </it>by splitting the {1, 2, 3, 4} cluster in two. One of the resulting clusters kept the same location, while the other varies in size and location in the different clusterings. The coordinates and cluster identity of every corresponding nine entities in the four clusterings were used to compute <it>RAR </it>and other ten measures of clustering agreement between <it>A </it>and each of its transformed clusterings. The description and formulas of these additional measures are available in the corresponding references given in Table <tblr tid="T4">4</tblr>, together with the values computed for these examples.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Comparison of <it>RAR </it>with other measures applied to the small example of Figure 1.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Clusterings compared</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Clustering comparison measures [reference]</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>A-B</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>A-C</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>A-D</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Rand [2]</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>HA </it>[4]</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Jaccard [5]</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Wallace forward [6]</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Wallace reverse [6]</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Larsen forward [7]</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Larsen reverse [7]</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>MH </it>[8]</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Variation of Information [8]</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Van Dongen [9]</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                     <c ca="center">
                        <p>1.00</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>RAR</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.38</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.29</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.67</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Coordinates of the points in the four clusterings (<it>A</it>, <it>B</it>, <it>C </it>and <it>D</it>) of Figure 1 were used to compute <it>RAR </it>and other 10 measures of clustering agreement.</p>
               </tblfn>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Small clusterings example of <it>RAR</it>'s unique properties</p>
               </caption>
               <text>
                  <p><b>Small clusterings example of <it>RAR</it>'s unique properties</b>. Clustering <it>A </it>divides 9 points (numbered circles) in three clusters identified by rectangles. By splitting the {1, 2, 3, 4} cluster, the clusterings <it>B</it>, <it>C </it>and <it>D </it>were formed. One of the child clusters kept the same location. The second child cluster moved away from the original location. In <it>B </it>and <it>C</it>, the second child cluster has only one entity, while in <it>D </it>it has three. In <it>B </it>and <it>D </it>the two split clusters are nearest neighbours, while in <it>C </it>they are maximally separated. The two dimensional coordinates of the points in the figure were used to compute average distances between clusters and to calculate <it>RAR </it>and other clustering comparison measures. The results are presented in Table 4.</p>
               </text>
               <graphic file="1471-2105-8-44-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Analysis of the small scale examples</p>
            </st>
            <p>The first point that these theoretical examples demonstrate is that <it>RAR</it>, contrary to the previous partition comparison measures, is able to detect a greater disagreement between clusterings if the entities causing the disagreement, besides changing the composition of the clusters change also the proximity relationships between clusters. That is shown by the difference in <it>RAR </it>value for the comparisons of <it>A </it>with <it>B </it>and <it>A </it>with <it>C </it>(Figure <figr fid="F1">1</figr>). All the other ten comparison measures consider <it>B </it>and <it>C </it>equally similar to <it>A</it>. In fact, the change in cluster composition from <it>A </it>to <it>B </it>is identical to the change from <it>A </it>to <it>C</it>. The difference is that the newly formed clusters in <it>B </it>are the closest neighbours, while in <it>C </it>they are the most distant clusters. As the discussed clusterings involve a small number of entities and clusters, and considering that the distance between points is proportional to the dissimilarity that was used to generate the clusterings, observation of Figure <figr fid="F1">1</figr> clearly indicates that clustering <it>B </it>is more similar to <it>A </it>than <it>C </it>is.</p>
            <p>On the contrary, <it>RAR </it>indicates that clustering <it>D </it>is more similar to <it>A </it>than <it>B </it>is. This arises because in <it>D </it>only one entity has a different location comparing with <it>A</it>. From <it>A </it>to <it>B </it>three entities changed position, although to the same relative location of the one entity cluster in <it>D</it>. Again, only <it>RAR </it>detected this difference, while all the other measures remained unchanged. This happens because <it>RAR </it>uses the intercluster distance ranks. The <it>RMM </it>comparing <it>A </it>with <it>B </it>will have more point pairs out of the diagonal than the comparison of <it>A </it>with <it>D</it>. In the first comparison three points changed their relative position, affecting the <it>RMM </it>position of 3 &#215; 6 pairs of points. In the second comparison, only one point moved, hence, only 1 &#215; 8 pairs of points can have different intercluster distance ranks than they would in a perfect match comparison. Consequently, <it>RAR </it>attributes more weight to the change from <it>A </it>to <it>B </it>than from <it>A </it>to <it>D</it>. From the point of view of partition comparison measures, <it>B </it>and <it>D </it>have equal differences relatively to <it>A</it>. They both result from <it>A </it>by splitting a cluster of 4 entities into one with 3 and other with 1 entity alone. This is the reason why the 10 partition comparison measures in Table <tblr tid="T4">4</tblr> are not able to distinguish between the similarity of <it>A </it>with <it>B </it>and of <it>A </it>with <it>D</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Simulation of large scale clusterings</p>
            </st>
            <p>The two small comparisons of the previous section are extreme cases where the advantages of <it>RAR </it>were demonstrated, since it was able to detect differences between clusterings that none of the previously available methods where able to detect. But in realistic data sets, it is expected that a variable number of entities change their cluster membership and their relative position simultaneously. Additionally, some entity changes may contribute to make clusterings more similar while others may differentiate them. For these reasons, larger clusterings where simulated, also with more extensive entity shuffling. A complete description of these simulations is provided in Additional file <supplr sid="S2">2</supplr>. Briefly, five factors were systematically varied in simulated clustering comparisons: 1) number of entities, 2) number of clusters, 3) cluster size distribution, 4) fraction of entities changing cluster membership and relative position and 5) extension of change in relative position. The only factor that had an effect on the final <it>RAR </it>values was the fraction of entities changing cluster membership and location, producing a linear correlation coefficient of <it>r </it>= -0.918. The number of entities (<it>r </it>= 0.077), number of clusters (<it>r </it>= -0.109), and the cluster size distribution (<it>r </it>= -0.032) had negligible impact on <it>RAR </it>values. These low correlations support the conclusion that <it>RAR </it>values are not systematically influenced neither by the number of entities and clusters being compared nor by the distribution of cluster sizes. As the entities changing cluster membership were randomly selected from every possible cluster, the change in relative position of some entities could be balanced by entities moving in the opposite direction. Consequently, varying the extension of change in relative position produced highly variable results and a low correlation with <it>RAR </it>values (<it>r </it>= -0.016). To evaluate more precisely the influence of this factor on <it>RAR </it>values, a partial correlation analysis was performed on the relation between <it>RAR </it>values, <it>HA </it>values (that can be interpreted as <it>RAR </it>values without intercluster distance information) and the net change in entity relative position (measured by the correlation coefficient between the distance matrices of the two clusterings being compared). The results, presented in detail on supplementary material, show that <it>RAR </it>integrates independent information contained in the <it>HA </it>Index and in the correlation coefficient between distance matrices. The partial correlation of <it>RAR </it>with both factors are strong and positive (0.758 and 0.720), which means that both a higher fraction of entities changing cluster membership and a higher net change in entity relative position independently induce higher <it>RAR </it>values.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p><b>Comparison of simulated large clusterings</b>. Pdf file with methods, results and interpretation of the comparison of simulated large clusterings.</p>
               </text>
               <file name="1471-2105-8-44-S2.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Biological examples</p>
            </st>
            <p>To substantiate the general applicability of <it>RAR</it>, three examples with biological data are presented that compare 1) two microbial typing methods, 2) two gene regulatory network distances and 3) microarray gene expression data with pathway information. The first example is presented in the main text while the other two are included in Additional file <supplr sid="S3">3</supplr>.</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p><b>Biological examples</b>. Pdf file with biological examples of the use of <it>RAR</it>.</p>
               </text>
               <file name="1471-2105-8-44-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>Typing methods are major tools for the epidemiological characterization of bacterial pathogens, allowing the determination of clonal relationships between isolates based on their genotypic or phenotypic characteristics. Since typing schemes analyze different phenotypic or genotypic properties of bacteria, if some congruence between the methods is found, it suggests that a phylogenetic signal is being recovered by both methods, allowing greater confidence about evolutionary hypothesis or clonal dispersion of the strains under study. The same collection of bacterial isolates can be typed by different methodologies and it becomes of great epidemiological and evolutionary importance to understand the relationships between the clusters of isolates defined by the different methods. To this end we have recently evaluated the usefulness of a set of measures to quantitatively describe these relationships <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Data handling</p>
            </st>
            <p>We analyzed the data generated by the characterization of a collection of 325 macrolide-resistant <it>Streptococcus pyogenes </it><abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>. This collection was characterized by <it>emm </it>sequencing, that generates groups of isolates differing by less than 92% in their DNA sequence and by comparison of the patterns generated after digestion of total DNA with the <it>SmaI </it>endonuclease and separation by pulsed-field gel electrophoresis (PFGE). Dice coefficient was used to compute dissimilarity between PFGE band patterns, enabling the subsequent hierarchical clustering with average linkage. Measurement of the agreement between the <it>emm </it>classification and PFGE clusterings for the same data set has already been done using <it>HA </it>and Wallace indices (W) <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Wallace index of clustering <it>A </it>relatively to <it>B </it>is the probability that two entities are in the same cluster in <it>B</it>, knowing they were in the same cluster in <it>A</it>. It is a pairwise asymmetric clustering agreement measure <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. For the present work, PFGE clusterings were produced for 70 different Dice dissimilarity thresholds, covering all the possible values. Each of these clusterings was compared with the <it>emm </it>classification through <it>RAR</it>, <it>HA </it>and Wallace indices.</p>
         </sec>
         <sec>
            <st>
               <p>Practical example results and discussion</p>
            </st>
            <p>The dendrogram built with PFGE and <it>emm </it>type classification data is shown in Figure <figr fid="F1">1</figr> of the Additional file <supplr sid="S3">3</supplr>. Although major agreements for half of the <it>emm </it>types (1,4, 9, 11, 12 e 22) are identifiable within the dendrogram, it becomes a hard task to quantify the overall concordance including the other types. In order to compare the <it>emm </it>classification and the clustering with PFGE it is first required to define the threshold that maximizes the agreement for the two microbial typing methods. Figure <figr fid="F2">2</figr> shows the values of <it>RAR</it>, <it>HA </it>and Wallace indices for the range of Dice dissimilarity thresholds used on the PFGE clustering.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Ranked Adjusted Rand (<it>RAR</it>), Adjusted Rand (<it>HA</it>) and Wallace (<it>W</it>) indices for the comparison of <it>emm </it>type with PFGE clusterings using different Dice dissimilarity thresholds</p>
               </caption>
               <text>
                  <p><b>Ranked Adjusted Rand (<it>RAR</it>), Adjusted Rand (<it>HA</it>) and Wallace (<it>W</it>) indices for the comparison of <it>emm </it>type with PFGE clusterings using different Dice dissimilarity thresholds</b>. Dice dissimilarity is in a 0&#8211;100 scale. The plot in the top indicates the number of PFGE clusters originated with the respective threshold, while the number of <it>emm </it>types is always 12. The minimum threshold studied, 1, does not originate 325 clusters because there are sets of isolates whose PFGE band patterns have a Dice dissimilarity of 0. <it>W</it>(<it>emm-PFGE</it>) is the probability that a pair of isolates is in the same PFGE cluster knowing that they have the same <it>emm </it>type. Analogously, <it>W</it>(<it>PFGE-emm</it>) is the probability that a pair of isolates has the same <it>emm </it>type knowing that they are in the same PFGE cluster. <it>HA </it>reflects the evolution of both Wallace indices. The plateau of maximum <it>HA</it>, between the thresholds of 28 and 41, is a region of compromise where both Wallace indices are high. The curve of <it>RAR </it>values shows a more complex behaviour, with a plateau of maximum values between the thresholds of 20 and 29, and a significant decrease between 29 and 41, where <it>HA </it>is nearly constant.</p>
               </text>
               <graphic file="1471-2105-8-44-2"/>
            </fig>
            <sec>
               <st>
                  <p>Wallace index</p>
               </st>
               <p>As the threshold increases, the number of PFGE clusters diminishes, resulting on a set of larger clusters. On this process, the Wallace index of <it>emm </it>classification relative to PFGE increases, meaning that the probability that two isolates are grouped in the same PFGE cluster if they share the same <it>emm </it>type increases. Also, the fact that PFGE clusters are larger raises the probability that any two isolates belong to the same cluster. The step like increases on the ascending curve corresponds to the collapse of clusters that had many isolates with the same <it>emm </it>type. The Wallace index of PFGE relative to <it>emm </it>type, which matches the probability that two isolates have the same <it>emm </it>type, knowing that they are on the same PFGE cluster, shows the opposite behaviour. On this case, the step like decreases on the curve correspond to the collapse of clusters rich in different <it>emm </it>types.</p>
            </sec>
            <sec>
               <st>
                  <p>HA index</p>
               </st>
               <p>The <it>HA </it>curve reflects a compromise of the patterns of the two Wallace index curves, with a maximum around a 29% Dice similarity threshold, where both Wallace index curves present simultaneously relatively high values. <it>HA </it>is therefore similar to an average of the two Wallace indices, corrected by chance agreement.</p>
            </sec>
            <sec>
               <st>
                  <p>RAR</p>
               </st>
               <p><it>RAR </it>shows a distinct behaviour from the other measures. In opposition to <it>HA </it>and Wallace indices, <it>RAR </it>variation is not dominated by large regions of no or low variability of the measure, meaning that <it>RAR </it>is sensitive to factors that are not influencing the other measures. The <it>RAR </it>curve presents two similar maxima for thresholds at 20% and 29% Dice dissimilarity, with values of 0.2185 and 0.2178 respectively. These two points limit a window where <it>RAR </it>is nearly constant. The <it>RAR </it>threshold at 29% corresponds to the maximum value of <it>HA</it>, 0.9111. On the <it>HA </it>curve, this point marks the beginning of a low-variation region between Dice dissimilarity thresholds of 28% and 41%. This window is actually where the two measures, <it>RAR </it>and <it>HA</it>, disagree the most: <it>HA </it>is nearly constant while <it>RAR </it>is decreasing considerably. To clarify this different behaviour, <it>RMM </it>compositions for thresholds 20, 29 and 41 are shown on Figure <figr fid="F3">3</figr>.</p>
               <fig id="F3">
                  <title>
                     <p>Figure 3</p>
                  </title>
                  <caption>
                     <p>Ranked Mismatch Matrix (<it>RMM</it>) composition at different Dice dissimilarity thresholds for PFGE clustering</p>
                  </caption>
                  <text>
                     <p><b>Ranked Mismatch Matrix (<it>RMM</it>) composition at different Dice dissimilarity thresholds for PFGE clustering</b>. The <it>RMMs </it>for the comparison of <it>emm </it>type with PFGE clusterings have dimensions <it>p </it>&#215; 2, where <it>p </it>depends on the number of PFGE clusters and the two columns correspond to isolate pairs with the same or with different <it>emm </it>type. The PFGE intercluster distance rank is represented in the horizontal axis. The isolate pairs with the same <it>emm </it>type are represented with full lines while for pairs with different <it>emm </it>type a dashed line was used. The frequencies plotted in the vertical axis are relative, meaning that the content of each <it>RMM </it>element was divided by the sum of all <it>RMM </it>elements. It corresponds to the fraction of isolate pairs contributing for the respective <it>RMM </it>element. <it>RMM </it>composition was studied at three different thresholds (<it>T </it>= 21, 29 and 41) because, 21 is an optimal threshold for <it>RAR </it>but not for <it>HA</it>, 29 is an optimal threshold for both measures and 41 is a slightly sub-optimal threshold for <it>HA </it>(it is at the end of the maximal plateau of <it>HA </it>in Figure 3) and a bad threshold for <it>RAR</it>. The frequency distributions of isolate pairs with the same <it>emm </it>type are similar for the three thresholds. This is not the case for isolate pairs with different <it>emm </it>type. Here, as the threshold increases, the frequency peaks become larger and occur at lower cluster distance ranks, contributing in this way for a weaker agreement.</p>
                  </text>
                  <graphic file="1471-2105-8-44-3"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>RMM analysis for specific thresholds</p>
               </st>
               <p>Due to the fact that the <it>emm </it>classification does not offer distances between the different types, two isolates can only be labelled as being of the same <it>emm </it>type or not. This being so, <it>RMM </it>holds two colums (one with the isolates pairs with the same <it>emm </it>type and another with the isolates pairs with different <it>emm </it>types) and a number of lines corresponding to the maximum value of the PFGE inter cluster distance rank for each threshold. Figure <figr fid="F3">3</figr> shows three plots where each of these two columns is represented by a curve. On these plots, the frequencies of the isolates pairs are relative so that the sum of all the represented points, including both curves, is 1. For the three studied thresholds, the frequency distributions of isolate pairs with the same <it>emm </it>type for different cluster distance ranks are very similar. The major difference is that the plots for thresholds 29% and 41% show a higher frequency for isolate pairs with the same <it>emm </it>type and a cluster distance rank of 0, meaning that the pair is in the same PFGE cluster. It is this fact that is responsible for the higher values of <it>HA </it>for these thresholds. On the other hand, <it>HA </it>is not able to detect the differences in the frequency distribution of isolate pairs with different <it>emm </it>types for different cluster rank distances. Compared with thresholds 29% and 41%, this distribution for threshold 20% is flatter, meaning that isolate pairs with different <it>emm </it>types are more homogeneously distributed throughout the cluster distance rank scale. For the higher thresholds there are stronger peaks in this distribution, and they occur in the first half of the cluster distance rank scale. This contributes to a weaker agreement. For the threshold 29%, the increase in the frequency of pairs with the same <it>emm </it>type and in the same PFGE cluster balances the effect of the peaks in the distribution of pairs with different <it>emm </it>type, thus <it>RAR </it>is practically identical to the one for the threshold of 20%. For the threshold 41% the peaks are stronger, occur at lower values of cluster distance rank and there is no counteracting effect, causing a significant decrease in <it>RAR </it>value that is not observed for <it>HA</it>. In fact, to compute <it>HA </it>the frequencies of isolate pairs with different <it>emm </it>type and cluster distance rank greater than 1 are grouped in just one class. This is not the case for <it>RAR </it>that uses all the values in the <it>RMM </it>for its computation. One can argue that the higher peaks in the frequencies of isolate pairs for higher thresholds are due to the lower number of clusters. Fewer clusters correspond to fewer degrees of freedom in clustering formation. With more clusters it becomes easier to build clusterings with a more perfect agreement. As <it>RAR </it>computes a weighted average over all isolate pairs, in all <it>RMM </it>positions, it is more sensitive to the shape of the distribution of frequencies along the different matrix elements than to the actual frequency values. If for the different thresholds studied, the frequency distributions of isolate pairs for different cluster distance ranks were the same, <it>RAR </it>would give similar results to <it>HA</it>, which is expectable since <it>RAR </it>is an extension of the <it>HA </it>method. The main difference is that <it>RAR </it>is sensitive to changes in the discussed distributions, or, in other words, it is sensitive to different levels of disagreement when a pair of isolates is not in the same class or cluster. This practical example shows the power of using <it>RAR </it>jointly with clustering comparison measures that only evaluate partition divergence, like <it>HA </it>or the Wallace coefficients. Using <it>HA </it>alone, it would be difficult to choose a Dice dissimilarity threshold in the interval of 28 to 41%. Between those values the partitions compared are almost equally similar, and the gains in <it>W</it>(<it>emm-PFGE</it>) are compensated by lower <it>W</it>(<it>PFGE-emm</it>). But <it>RAR </it>values are clearly higher for the 29% than for the 41% threshold. As the <it>HA </it>partition similarity is practically the same for both thresholds, it is safe to say that the change in <it>RAR </it>is due to an intercluster distance disagreement effect. Comparing the variation of <it>RAR </it>values with the corresponding variation of <it>HA </it>or other measures provides an easier way to infer the meaning of <it>RAR </it>values. <it>RAR </it>has another maximum at 20% dissimilarity threshold, and the corresponding <it>HA </it>value is not in the maximal plateau. This means that looking only at partition information, a 20% threshold would be inferior to 28% or 41%, but at 20% the entity pairs have a stronger tendency to be in equally separated clusters in both clusterings, which increases the <it>RAR </it>value. The existence of a <it>RAR </it>maximum at this value is actually confirming the empirically accepted Dice dissimilarity threshold of 20% to define PFGE clusters <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, a value that does not correspond to the <it>HA </it>maximum.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>As previously stated, comparing different clusterings for the same set of entities is a recurrent task. Hubert and Arabie's Adjusted Rand (<it>HA</it>) index is still commonly used to quantify these comparisons <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. The new method described here, the Ranked Adjusted Rand (<it>RAR</it>), can be useful in all instances where <it>HA </it>is applicable. <it>RAR </it>is an extension of <it>HA</it>, and produces identical results when there is no intercluster distance information. The novelty introduced by the <it>RAR </it>measure is the way the Ranked Mismatch Matrix (<it>RMM</it>) is built. The fact that the contribution of each entity pair in <it>RMM </it>is determined by the intercluster distance rank function allows the recording of different levels of disagreement circumventing the problem of pre-ordering clusters and of difference of number of clusters in both clusterings.</p>
         <p>The artificial small examples highlighted the situations where <it>HA </it>and other available measures are not able to discriminate while <it>RAR </it>is. Namely, when from one clustering to another, a cluster is split in two and one or two of the child clusters change their localization relatively to the remaining clusters, only <it>RAR </it>is sensitive to differences in the relative distances of these new clusters as compared with the original clustering.</p>
         <p>When applied to the comparison of larger clusterings, <it>RAR </it>proved to be robust to factors like number of entities and clusters, and also to different cluster density patterns. From the viewpoint of computation time needed to execute <it>RAR </it>no special problems are anticipated even with its application to very large clusterings. Simulated clustering comparisons clarified that the distance information that <it>RAR </it>integrates is not the same that is already implicit in the partition information. For constant partition information <it>RAR </it>is still sensitive to distance information changes. Analogously, for constant correlation between distance matrices, <it>RAR </it>is still sensitive to changes in partitions.</p>
         <p><it>RAR </it>was tested with experimental data from the field of molecular epidemiology. The test case was a comparison between one flat classification, without interclass distance information, the <it>emm </it>types, and a hierarchical clustering, from PFGE data, where there was inter cluster distance information for several clusterings originated from the same dendrogram. <it>RAR </it>produced different results from <it>HA </it>and Wallace indices. Analysis of <it>RMM </it>content proved to be helpful in the detection of what disagreements or agreements were causing changes in <it>RAR </it>and <it>HA </it>values. In conclusion, use of the <it>RAR </it>measure lead to a more informed decision on the best threshold to generate a PFGE clustering with a maximum agreement with the <it>emm </it>type classification. Although measures like Rand, Jaccard and Wallace indices continue to be useful, especially because the numbers generated have an associated intuitive meaning, we argue that <it>RAR </it>supersedes the previous indices when measuring the overlap between clusterings or classifications. The foundation of this argument lays in the fact that <it>RAR </it>is sensitive to the same partition differences that previous methods also detected, but in addition it is also sensitive to intercluster distance changes.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p><it>RAR </it>description</p>
            </st>
            <p>A clustering <it>C </it>is a partition of the set of objects <it>D</it>, with <it>n </it>elements (identified below by the letters <it>i </it>and <it>j</it>), into sets (clusters) <it>C</it><sub>1</sub>, <it>C</it><sub>2</sub>,...<it>C</it><sub><it>K</it></sub>, with <it>n</it><sub>1</sub>, <it>n</it><sub>2</sub>,...<it>n</it><sub><it>K </it></sub>number of entities, all greater than 0. The task of measuring clustering agreement arises when, for the same set <it>D</it>, two different methods are used to produce two different clusterings, <it>C </it>and <it>C'</it>, with <it>K </it>and <it>K' </it>clusters each. To evaluate the overlap of the two partitions, a contingency table is built, where every element of <it>D </it>contributes to the cell of the corresponding clusters in both <it>C </it>and <it>C' </it>as shown in Table <tblr tid="T1">1</tblr>. Focusing on the pairwise agreement, the information in <it>CT </it>can be further condensed in a mismatch matrix represented in Table <tblr tid="T2">2</tblr>, where <it>a</it>, <it>b</it>, <it>c </it>and <it>d </it>represent the counts of entity pairs that fall in each of the four possible categories. For example, entity pairs in the <it>b </it>category are in the same cluster in <it>C </it>but in different clusters in <it>C'</it>. The sum of <it>a</it>, <it>b</it>, <it>c </it>and <it>d </it>is <it>n</it>(<it>n</it>-1)/2, the total number of unique entity pairs.</p>
         </sec>
         <sec>
            <st>
               <p>Adjusted Rand, the <it>RAR </it>predecessor</p>
            </st>
            <p>Hubert and Arabie proposed an adjusted Rand index to quantify clustering agreement <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>:</p>
            <p>
               <m:math name="1471-2105-8-44-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>H</m:mi>
                        <m:mi>A</m:mi>
                        <m:mo stretchy="false">(</m:mo>
                        <m:mi>C</m:mi>
                        <m:mo>,</m:mo>
                        <m:msup>
                           <m:mi>C</m:mi>
                           <m:mo>&#8242;</m:mo>
                        </m:msup>
                        <m:mo stretchy="false">)</m:mo>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>a</m:mi>
                              <m:mo>+</m:mo>
                              <m:mi>d</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:msub>
                                 <m:mi>n</m:mi>
                                 <m:mi>c</m:mi>
                              </m:msub>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>a</m:mi>
                              <m:mo>+</m:mo>
                              <m:mi>b</m:mi>
                              <m:mo>+</m:mo>
                              <m:mi>c</m:mi>
                              <m:mo>+</m:mo>
                              <m:mi>d</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:msub>
                                 <m:mi>n</m:mi>
                                 <m:mi>c</m:mi>
                              </m:msub>
                           </m:mrow>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>1</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGibascqWGbbqqcqGGOaakcqWGdbWqcqGGSaalcuWGdbWqgaqbaiabcMcaPiabg2da9maalaaabaGaemyyaeMaey4kaSIaemizaqMaeyOeI0IaemOBa42aaSbaaSqaaiabdogaJbqabaaakeaacqWGHbqycqGHRaWkcqWGIbGycqGHRaWkcqWGJbWycqGHRaWkcqWGKbazcqGHsislcqWGUbGBdaWgaaWcbaGaem4yamgabeaaaaGccaWLjaGaaCzcamaabmaabaGaeGymaedacaGLOaGaayzkaaaaaa@4B69@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Where <it>n</it><sub><it>c </it></sub>is the correction for chance agreement, corresponding to the expected sum of <it>a </it>and <it>d </it>if <it>C </it>and <it>C' </it>where totally independent clusterings:</p>
            <p>
               <m:math name="1471-2105-8-44-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:msub>
                           <m:mi>n</m:mi>
                           <m:mi>c</m:mi>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>n</m:mi>
                              <m:mo stretchy="false">(</m:mo>
                              <m:msup>
                                 <m:mi>n</m:mi>
                                 <m:mn>2</m:mn>
                              </m:msup>
                              <m:mo>+</m:mo>
                              <m:mn>1</m:mn>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mo>&#8722;</m:mo>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>n</m:mi>
                              <m:mo>+</m:mo>
                              <m:mn>1</m:mn>
                              <m:mo stretchy="false">)</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>k</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>K</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:msubsup>
                                       <m:mi>n</m:mi>
                                       <m:mi>k</m:mi>
                                       <m:mn>2</m:mn>
                                    </m:msubsup>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>n</m:mi>
                                    <m:mo>+</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mstyle displaystyle="true">
                                       <m:munderover>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mi>k</m:mi>
                                                <m:mo>&#8242;</m:mo>
                                             </m:msup>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:msup>
                                             <m:mi>K</m:mi>
                                             <m:mo>&#8242;</m:mo>
                                          </m:msup>
                                       </m:munderover>
                                       <m:mrow>
                                          <m:msubsup>
                                             <m:msup>
                                                <m:mi>n</m:mi>
                                                <m:mo>&#8242;</m:mo>
                                             </m:msup>
                                             <m:msup>
                                                <m:mi>k</m:mi>
                                                <m:mo>&#8242;</m:mo>
                                             </m:msup>
                                             <m:mn>2</m:mn>
                                          </m:msubsup>
                                          <m:mo>+</m:mo>
                                          <m:mn>2</m:mn>
                                          <m:mstyle displaystyle="true">
                                             <m:munderover>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mrow>
                                                   <m:mi>k</m:mi>
                                                   <m:mo>=</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                                <m:mi>K</m:mi>
                                             </m:munderover>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munderover>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:msup>
                                                            <m:mi>k</m:mi>
                                                            <m:mo>&#8242;</m:mo>
                                                         </m:msup>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:msup>
                                                         <m:mi>K</m:mi>
                                                         <m:mo>&#8242;</m:mo>
                                                      </m:msup>
                                                   </m:munderover>
                                                   <m:mrow>
                                                      <m:mfrac>
                                                         <m:mrow>
                                                            <m:msubsup>
                                                               <m:mi>n</m:mi>
                                                               <m:mi>k</m:mi>
                                                               <m:mn>2</m:mn>
                                                            </m:msubsup>
                                                            <m:msubsup>
                                                               <m:msup>
                                                                  <m:mi>n</m:mi>
                                                                  <m:mo>&#8242;</m:mo>
                                                               </m:msup>
                                                               <m:msup>
                                                                  <m:mi>k</m:mi>
                                                                  <m:mo>&#8242;</m:mo>
                                                               </m:msup>
                                                               <m:mn>2</m:mn>
                                                            </m:msubsup>
                                                         </m:mrow>
                                                         <m:mi>n</m:mi>
                                                      </m:mfrac>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mstyle>
                                       </m:mrow>
                                    </m:mstyle>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mrow>
                              <m:mn>2</m:mn>
                              <m:mo stretchy="false">(</m:mo>
                              <m:mi>n</m:mi>
                              <m:mo>&#8722;</m:mo>
                              <m:mn>1</m:mn>
                              <m:mo stretchy="false">)</m:mo>
                           </m:mrow>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>2</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGUbGBdaWgaaWcbaGaem4yamgabeaakiabg2da9maalaaabaGaemOBa4MaeiikaGIaemOBa42aaWbaaSqabeaacqaIYaGmaaGccqGHRaWkcqaIXaqmcqGGPaqkcqGHsislcqGGOaakcqWGUbGBcqGHRaWkcqaIXaqmcqGGPaqkdaaeWbqaaiabd6gaUnaaDaaaleaacqWGRbWAaeaacqaIYaGmaaGccqGHsislcqGGOaakcqWGUbGBcqGHRaWkcqaIXaqmcqGGPaqkdaaeWbqaaiqbd6gaUzaafaWaa0baaSqaaiqbdUgaRzaafaaabaGaeGOmaidaaOGaey4kaSIaeGOmaiZaaabCaeaadaaeWbqaamaalaaabaGaemOBa42aa0baaSqaaiabdUgaRbqaaiabikdaYaaakiqbd6gaUzaafaWaa0baaSqaaiqbdUgaRzaafaaabaGaeGOmaidaaaGcbaGaemOBa4gaaaWcbaGafm4AaSMbauaacqGH9aqpcqaIXaqmaeaacuWGlbWsgaqbaaqdcqGHris5aaWcbaGaem4AaSMaeyypa0JaeGymaedabaGaem4saSeaniabggHiLdaaleaacuWGRbWAgaqbaiabg2da9iabigdaXaqaaiqbdUealzaafaaaniabggHiLdaaleaacqWGRbWAcqGH9aqpcqaIXaqmaeaacqWGlbWsa0GaeyyeIuoaaOqaaiabikdaYiabcIcaOiabd6gaUjabgkHiTiabigdaXiabcMcaPaaacaWLjaGaaCzcamaabmaabaGaeGOmaidacaGLOaGaayzkaaaaaa@7BB3@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Milligan and Cooper <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and more recently Steinley <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> performed comparative studies of several pairwise clustering agreement criteria. They found <it>HA </it>the criterion with the most desirable properties, especially the zero expected value in the case of independent clusterings and the robustness to changes in cluster number and cluster size heterogeneity. The basic principle of <it>HA </it>is to compute the fraction of entity pairs in the diagonal of <it>MM</it>, because those pairs are the ones contributing to the general agreement. The pairs in <it>b </it>and <it>c </it>have a null contribution to the agreement. This fraction must be corrected for the expected chance agreement.</p>
         </sec>
         <sec>
            <st>
               <p>Ranked Mismatch Matrix (<it>RMM</it>), a new format for the presentation of clustering data</p>
            </st>
            <p>To include the intercluster distance information, the entity pairs in <it>a </it>should continue to have a maximum contribution to the overall agreement, but <it>b, c </it>and <it>d </it>entity pairs should have different contributions according to the degree of mismatch in each of the two clusterings. First an intercluster distance rank function <it>R </it>is defined for every pair of entities (<it>i</it>, <it>j</it>) of a data set <it>D </it>(expression 3).</p>
            <p><it>R</it>(<it>i</it>, <it>j</it>) = (<it>x</it>, <it>y</it>): <it>i</it>, <it>j </it>&#8712; {1,2,...<it>n</it>}; <it>x </it>&#8712; {1,2,...<it>K </it>- 1}; <it>y </it>&#8712; {1,2,...<it>K' </it>- 1} &#160;&#160;&#160; (3)</p>
            <p><it>R</it>(<it>i</it>, <it>j</it>) = (<it>x</it>, <it>y</it>), means that in clustering <it>C</it>, entity <it>j </it>is in the <it>x</it><sup><it>th </it></sup>cluster closer to the one of entity <it>i</it>, and in clustering <it>C'</it>, the cluster of entity <it>j </it>is the <it>y</it><sup><it>th </it></sup>closer to the cluster of <it>i</it>. In the case <it>i </it>and <it>j </it>are in the same cluster in <it>C</it>, <it>x </it>will be 0. If <it>i </it>and <it>j </it>are in the same cluster in <it>C'</it>, <it>y </it>will be 0. The distance between two clusters is here measured as the average distance between their entities. This is only possible when distances between every pair of entities are available. According to the problem, other intercluster distance function can be defined. For instance the standard single, complete or other linkage functions of hierarchical clustering can be used. In the absence of any distance information, the distance between a cluster and itself is 0 and between two different clusters is 1. Additionally, the intercluster distance definition does not have to be the same in the two clusterings being compared. These definitions allow the <it>RAR </it>method to be applied to any pair of clusterings. With the help of the intercluster distance rank function the Ranked Mismatch Matrix (<it>RMM</it>), represented in Table <tblr tid="T3">3</tblr>, can be computed, with the general element <it>rmm</it><sub><it>x</it>,<it>y </it></sub>defined as:</p>
            <p>
               <m:math name="1471-2105-8-44-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>r</m:mi>
                        <m:mi>m</m:mi>
                        <m:msub>
                           <m:mi>m</m:mi>
                           <m:mrow>
                              <m:mi>x</m:mi>
                              <m:mo>,</m:mo>
                              <m:mi>y</m:mi>
                           </m:mrow>
                        </m:msub>
                        <m:mo>=</m:mo>
                        <m:mstyle displaystyle="true">
                           <m:munderover>
                              <m:mo>&#8721;</m:mo>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mo>=</m:mo>
                                 <m:mn>1</m:mn>
                              </m:mrow>
                              <m:mi>n</m:mi>
                           </m:munderover>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>j</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mi>n</m:mi>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mi>H</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8800;</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mi>H</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>R</m:mi>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>i</m:mi>
                                    <m:mo>,</m:mo>
                                    <m:mi>j</m:mi>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo>=</m:mo>
                                    <m:mo stretchy="false">(</m:mo>
                                    <m:mi>x</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo>,</m:mo>
                                    <m:mi>y</m:mi>
                                    <m:mo>&#8722;</m:mo>
                                    <m:mn>1</m:mn>
                                    <m:mo stretchy="false">)</m:mo>
                                    <m:mo stretchy="false">)</m:mo>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                        </m:mstyle>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>4</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGYbGCcqWGTbqBcqWGTbqBdaWgaaWcbaGaemiEaGNaeiilaWIaemyEaKhabeaakiabg2da9maaqahabaWaaabCaeaacqWGibascqGGOaakcqWGPbqAcqGHGjsUcqWGQbGAcqGGPaqkcqWGibascqGGOaakcqWGsbGucqGGOaakcqWGPbqAcqGGSaalcqWGQbGAcqGGPaqkcqGH9aqpcqGGOaakcqWG4baEcqGHsislcqaIXaqmcqGGSaalcqWG5bqEcqGHsislcqaIXaqmcqGGPaqkcqGGPaqkaSqaaiabdQgaQjabg2da9iabigdaXaqaaiabd6gaUbqdcqGHris5aaWcbaGaemyAaKMaeyypa0JaeGymaedabaGaemOBa4ganiabggHiLdGccaWLjaGaaCzcamaabmaabaGaeGinaqdacaGLOaGaayzkaaaaaa@6280@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p><it>H</it>(<it>x</it>) is a Heaviside step function that takes the value 1 when <it>x </it>is true and 0 otherwise. The double sum includes the equal entity pairs of type (<it>i</it>, <it>i</it>) and the repeated entity pairs of types (<it>i</it>, <it>j</it>) and (<it>j</it>, <it>i</it>). The pairs of the first type do not contribute to the final sum due to the Heaviside function <it>H</it>(<it>i</it>&#8800;<it>j</it>). The repeated pairs (differing only by the order of the entities inside the pair) need to be accounted in the sum because, for each of the individual clusterings, the intercluster distance rank is not necessarily symmetric. As an example, cluster <it>A </it>may be the closest neighbour of cluster <it>B</it>, but the closest neighbour of cluster <it>B </it>may be <it>C </it>and not <it>A</it>. In <it>RMM</it>, intercluster distance rank information for every pair of entities is recorded without identifying which clusters are separated at what rank distance. This is important because for each cluster, the <it>i</it><sup><it>th </it></sup>neighbour cluster can be different. In this way, the intercluster distance information can be integrated with the partition comparison without the need of a strict ordinal relationship between clusters (like the example of disease severity referred in the introduction), of a known cluster correspondence between clusterings or of an equal number of clusters in both clusterings.</p>
         </sec>
         <sec>
            <st>
               <p>Measuring clustering agreement</p>
            </st>
            <p>For two very similar clusterings, the majority of the entity pairs would contribute for elements close to the matrix diagonal. Even if <it>RMM </it>is not square, an alternative geometrical diagonal can be traced, linking the centre of the <it>rmm</it><sub>1,1 </sub>element (with coordinates (0,0)) with the center of the <it>rmm</it><sub><it>p</it>+1,<it>q</it>+1 </sub>element (with coordinates (<it>p</it>, <it>q</it>)) If, on the contrary, the clusterings disagree to a large extent, most entity pairs will be far from the diagonal, concentrated around <it>rmm</it><sub><it>p</it>+1,1 </sub>and <it>rmm</it><sub>1,<it>q</it>+1</sub>. From these considerations it immediately follows that a good measure of cluster disagreement is the Mean Diagonal Deviation (<it>MDD</it>) for all the entity pairs in <it>RMM</it>.</p>
            <p>
               <m:math name="1471-2105-8-44-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>M</m:mi>
                        <m:mi>D</m:mi>
                        <m:mi>D</m:mi>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>p</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mstyle displaystyle="true">
                                       <m:munderover>
                                          <m:mo>&#8721;</m:mo>
                                          <m:mrow>
                                             <m:mi>j</m:mi>
                                             <m:mo>=</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mi>q</m:mi>
                                             <m:mo>+</m:mo>
                                             <m:mn>1</m:mn>
                                          </m:mrow>
                                       </m:munderover>
                                       <m:mrow>
                                          <m:mi>r</m:mi>
                                          <m:mi>m</m:mi>
                                          <m:msub>
                                             <m:mi>m</m:mi>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>j</m:mi>
                                             </m:mrow>
                                          </m:msub>
                                       </m:mrow>
                                    </m:mstyle>
                                    <m:mo>&#8901;</m:mo>
                                    <m:mrow>
                                       <m:mo>|</m:mo>
                                       <m:mrow>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>i</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>p</m:mi>
                                          </m:mfrac>
                                          <m:mo>&#8722;</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mi>j</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mn>1</m:mn>
                                             </m:mrow>
                                             <m:mi>q</m:mi>
                                          </m:mfrac>
                                       </m:mrow>
                                       <m:mo>|</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:mrow>
                              <m:msup>
                                 <m:mi>n</m:mi>
                                 <m:mn>2</m:mn>
                              </m:msup>
                              <m:mo>&#8722;</m:mo>
                              <m:mi>n</m:mi>
                           </m:mrow>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>5</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGnbqtcqWGebarcqWGebarcqGH9aqpdaWcaaqaamaaqahabaWaaabCaeaacqWGYbGCcqWGTbqBcqWGTbqBdaWgaaWcbaGaemyAaKMaeiilaWIaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGXbqCcqGHRaWkcqaIXaqma0GaeyyeIuoakiabgwSixpaaemaabaWaaSaaaeaacqWGPbqAcqGHsislcqaIXaqmaeaacqWGWbaCaaGaeyOeI0YaaSaaaeaacqWGQbGAcqGHsislcqaIXaqmaeaacqWGXbqCaaaacaGLhWUaayjcSdaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGWbaCcqGHRaWkcqaIXaqma0GaeyyeIuoaaOqaaiabd6gaUnaaCaaaleqabaGaeGOmaidaaOGaeyOeI0IaemOBa4gaaiaaxMaacaWLjaWaaeWaaeaacqaI1aqnaiaawIcacaGLPaaaaaa@630B@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>The quantity inside the modulus is the normalized distance of the element (<it>i</it>, <it>j</it>) to the <it>RMM </it>diagonal, such that for the more distant elements (<it>rmm</it><sub>1,<it>q</it>+1 </sub>and <it>rmm</it><sub><it>p</it>+1,1</sub>) it takes the value of 1. Consequently, the maximum value of <it>MDD </it>is also 1. The modulus implies that <it>MDD </it>is always greater or equal to 0. To obtain a measure of agreement between clusterings it is enough to compute 1-<it>MDD</it>, although this quantity is not yet corrected for chance agreement. To perform this correction, the expected <it>MDD </it>value under independence of clusterings <it>C </it>and <it>C' </it>(conditional on the marginals of <it>CT </it>and on the intercluster ranked distances) must be known. To compute this <it>MDD</it><sup><it>ind </it></sup>it is first necessary to build <it>RMM</it><sup><it>ind </it></sup>according to:</p>
            <p>
               <m:math name="1471-2105-8-44-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtable>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mi>r</m:mi>
                                    <m:mi>m</m:mi>
                                    <m:msubsup>
                                       <m:mi>m</m:mi>
                                       <m:mrow>
                                          <m:mi>x</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:mi>y</m:mi>
                                       </m:mrow>
                                       <m:mrow>
                                          <m:mi>i</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mi>d</m:mi>
                                       </m:mrow>
                                    </m:msubsup>
                                    <m:mo>=</m:mo>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mstyle displaystyle="true">
                                             <m:munderover>
                                                <m:mo>&#8721;</m:mo>
                                                <m:mrow>
                                                   <m:mi>i</m:mi>
                                                   <m:mo>=</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                                <m:mi>K</m:mi>
                                             </m:munderover>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munderover>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>j</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:mi>K</m:mi>
                                                   </m:munderover>
                                                   <m:mrow>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>H</m:mi>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>R</m:mi>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mo>&#8704;</m:mo>
                                                      <m:mi>s</m:mi>
                                                      <m:mo>:</m:mo>
                                                      <m:mi>s</m:mi>
                                                      <m:mo>&#8712;</m:mo>
                                                      <m:msub>
                                                         <m:mi>C</m:mi>
                                                         <m:mi>i</m:mi>
                                                      </m:msub>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>,</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mo>&#8704;</m:mo>
                                                      <m:mi>t</m:mi>
                                                      <m:mo>:</m:mo>
                                                      <m:mi>t</m:mi>
                                                      <m:mo>&#8712;</m:mo>
                                                      <m:msub>
                                                         <m:mi>C</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:msub>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>=</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>x</m:mi>
                                                      <m:mo>&#8722;</m:mo>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:msub>
                                                         <m:mi>n</m:mi>
                                                         <m:mi>i</m:mi>
                                                      </m:msub>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:msub>
                                                         <m:mi>n</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:msub>
                                                      <m:mo>&#8722;</m:mo>
                                                      <m:mi>H</m:mi>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>i</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mi>j</m:mi>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:msub>
                                                         <m:mi>n</m:mi>
                                                         <m:mi>i</m:mi>
                                                      </m:msub>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mstyle>
                                          <m:mo stretchy="false">)</m:mo>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                    <m:mo>&#215;</m:mo>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                           <m:mtr>
                              <m:mtd>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munderover>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:msup>
                                                         <m:mi>K</m:mi>
                                                         <m:mo>&#8242;</m:mo>
                                                      </m:msup>
                                                   </m:munderover>
                                                   <m:mrow>
                                                      <m:mstyle displaystyle="true">
                                                         <m:munderover>
                                                            <m:mo>&#8721;</m:mo>
                                                            <m:mrow>
                                                               <m:mi>j</m:mi>
                                                               <m:mo>=</m:mo>
                                                               <m:mn>1</m:mn>
                                                            </m:mrow>
                                                            <m:msup>
                                                               <m:mi>K</m:mi>
                                                               <m:mo>&#8242;</m:mo>
                                                            </m:msup>
                                                         </m:munderover>
                                                         <m:mrow>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mi>H</m:mi>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mi>R</m:mi>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mo>&#8704;</m:mo>
                                                            <m:mi>s</m:mi>
                                                            <m:mo>:</m:mo>
                                                            <m:mi>s</m:mi>
                                                            <m:mo>&#8712;</m:mo>
                                                            <m:msub>
                                                               <m:mi>C</m:mi>
                                                               <m:mi>i</m:mi>
                                                            </m:msub>
                                                            <m:mo>&#8242;</m:mo>
                                                            <m:mo stretchy="false">)</m:mo>
                                                            <m:mo>,</m:mo>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mo>&#8704;</m:mo>
                                                            <m:mi>t</m:mi>
                                                            <m:mo>:</m:mo>
                                                            <m:mi>t</m:mi>
                                                            <m:mo>&#8712;</m:mo>
                                                            <m:msub>
                                                               <m:mi>C</m:mi>
                                                               <m:mi>j</m:mi>
                                                            </m:msub>
                                                            <m:mo>&#8242;</m:mo>
                                                            <m:mo stretchy="false">)</m:mo>
                                                            <m:mo stretchy="false">)</m:mo>
                                                            <m:mo>=</m:mo>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mo>&#8901;</m:mo>
                                                            <m:mo>,</m:mo>
                                                            <m:mi>y</m:mi>
                                                            <m:mo>&#8722;</m:mo>
                                                            <m:mn>1</m:mn>
                                                            <m:mo stretchy="false">)</m:mo>
                                                            <m:mo stretchy="false">)</m:mo>
                                                            <m:mo>&#8901;</m:mo>
                                                            <m:msub>
                                                               <m:mi>n</m:mi>
                                                               <m:mi>i</m:mi>
                                                            </m:msub>
                                                            <m:mo>&#8242;</m:mo>
                                                            <m:mo>&#8901;</m:mo>
                                                            <m:msub>
                                                               <m:mi>n</m:mi>
                                                               <m:mi>j</m:mi>
                                                            </m:msub>
                                                            <m:mo>&#8242;</m:mo>
                                                            <m:mo>&#8722;</m:mo>
                                                            <m:mi>H</m:mi>
                                                            <m:mo stretchy="false">(</m:mo>
                                                            <m:mi>i</m:mi>
                                                            <m:mo>=</m:mo>
                                                            <m:mi>j</m:mi>
                                                            <m:mo stretchy="false">)</m:mo>
                                                            <m:mo>&#8901;</m:mo>
                                                            <m:msub>
                                                               <m:mi>n</m:mi>
                                                               <m:mi>i</m:mi>
                                                            </m:msub>
                                                            <m:mo>&#8242;</m:mo>
                                                            <m:mo stretchy="false">)</m:mo>
                                                         </m:mrow>
                                                      </m:mstyle>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                       <m:mo>/</m:mo>
                                       <m:mrow>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msup>
                                                   <m:mi>n</m:mi>
                                                   <m:mn>2</m:mn>
                                                </m:msup>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mi>n</m:mi>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mtd>
                           </m:mtr>
                        </m:mtable>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>6</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeGabaaabaGaemOCaiNaemyBa0MaemyBa02aa0baaSqaaiabdIha4jabcYcaSiabdMha5bqaaiabdMgaPjabd6gaUjabdsgaKbaakiabg2da9maabmaabaWaaabCaeaadaaeWbqaaiabcIcaOiabdIeaijabcIcaOiabdkfasjabcIcaOiabcIcaOiabgcGiIiabdohaZjabcQda6iabdohaZjabgIGiolabdoeadnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKIaeiilaWIaeiikaGIaeyiaIiIaemiDaqNaeiOoaOJaemiDaqNaeyicI4Saem4qam0aaSbaaSqaaiabdQgaQbqabaGccqGGPaqkcqGGPaqkcqGH9aqpcqGGOaakcqWG4baEcqGHsislcqaIXaqmcqGGSaalcqGHflY1cqGGPaqkcqGGPaqkcqGHflY1cqWGUbGBdaWgaaWcbaGaemyAaKgabeaakiabgwSixlabd6gaUnaaBaaaleaacqWGQbGAaeqaaOGaeyOeI0IaemisaGKaeiikaGIaemyAaKMaeyypa0JaemOAaOMaeiykaKIaeyyXICTaemOBa42aaSbaaSqaaiabdMgaPbqabaaabaGaemOAaOMaeyypa0JaeGymaedabaGaem4saSeaniabggHiLdaaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGlbWsa0GaeyyeIuoakiabcMcaPaGaayjkaiaawMcaaiabgEna0cqaamaalyaabaWaaeWaaeaadaaeWbqaamaaqahabaGaeiikaGIaemisaGKaeiikaGIaemOuaiLaeiikaGIaeiikaGIaeyiaIiIaem4CamNaeiOoaOJaem4CamNaeyicI4Saem4qam0aaSbaaSqaaiabdMgaPbqabaaccaGccqWFYaIOcqGGPaqkcqGGSaalcqGGOaakcqGHaiIicqWG0baDcqGG6aGocqWG0baDcqGHiiIZcqWGdbWqdaWgaaWcbaGaemOAaOgabeaakiab=jdiIkabcMcaPiabcMcaPiabg2da9iabcIcaOiabgwSixlabcYcaSiabdMha5jabgkHiTiabigdaXiabcMcaPiabcMcaPiabgwSixlabd6gaUnaaBaaaleaacqWGPbqAaeqaaOGae8NmGiQaeyyXICTaemOBa42aaSbaaSqaaiabdQgaQbqabaGccqWFYaIOcqGHsislcqWGibascqGGOaakcqWGPbqAcqGH9aqpcqWGQbGAcqGGPaqkcqGHflY1cqWGUbGBdaWgaaWcbaGaemyAaKgabeaakiab=jdiIkabcMcaPaWcbaGaemOAaOMaeyypa0JaeGymaedabaGafm4saSKbauaaa0GaeyyeIuoaaSqaaiabdMgaPjabg2da9iabigdaXaqaaiqbdUealzaafaaaniabggHiLdaakiaawIcacaGLPaaaaeaadaqadaqaaiabd6gaUnaaCaaaleqabaGaeGOmaidaaOGaeyOeI0IaemOBa4gacaGLOaGaayzkaaaaaaaacaWLjaGaaCzcamaabmaabaGaeGOnaydacaGLOaGaayzkaaaaaa@E72D@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p><it>MDD</it><sup><it>ind </it></sup>is then computed like <it>MDD </it>(expression 5), changing <it>RMM </it>elements by those of <it>RMM</it><sup><it>ind</it></sup>. <it>RAR </it>is the correction of (1-<it>MDD</it>) for chance agreement and is the result of the following expression:</p>
            <p>
               <m:math name="1471-2105-8-44-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mi>R</m:mi>
                        <m:mi>A</m:mi>
                        <m:mi>R</m:mi>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mrow>
                              <m:mi>M</m:mi>
                              <m:mi>D</m:mi>
                              <m:msup>
                                 <m:mi>D</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>n</m:mi>
                                    <m:mi>d</m:mi>
                                 </m:mrow>
                              </m:msup>
                              <m:mo>&#8722;</m:mo>
                              <m:mi>M</m:mi>
                              <m:mi>D</m:mi>
                              <m:mi>D</m:mi>
                           </m:mrow>
                           <m:mrow>
                              <m:mi>M</m:mi>
                              <m:mi>D</m:mi>
                              <m:msup>
                                 <m:mi>D</m:mi>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mi>n</m:mi>
                                    <m:mi>d</m:mi>
                                 </m:mrow>
                              </m:msup>
                           </m:mrow>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mrow>
                           <m:mo>(</m:mo>
                           <m:mn>7</m:mn>
                           <m:mo>)</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGsbGucqWGbbqqcqWGsbGucqGH9aqpdaWcaaqaaiabd2eanjabdseaejabdseaenaaCaaaleqabaGaemyAaKMaemOBa4MaemizaqgaaOGaeyOeI0Iaemyta0KaemiraqKaemiraqeabaGaemyta0KaemiraqKaemiraq0aaWbaaSqabeaacqWGPbqAcqWGUbGBcqWGKbazaaaaaOGaaCzcaiaaxMaadaqadaqaaiabiEda3aGaayjkaiaawMcaaaaa@483C@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
            <p>Functions to compute the <it>RAR </it>measure for any two clusterings were implemented in MATLAB (Release 14), and are available in Additional file <supplr sid="S4">4</supplr> or at the toolbox's webpage <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p><b>MATLAB toolbox</b>. Zip file with MATLAB functions to compute <it>RAR </it>and related measures.</p>
               </text>
               <file name="1471-2105-8-44-S4.zip">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>FRP, MR and JSA conceived the study. FRP and JAC computationally implemented the new methods. FRP, MR and JAC interpreted the results. FRP wrote the manuscript. All authors revised and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to acknowledge Margarida Carrolo for precious help in manuscript preparation. Partial support for this work was provided by contract PREVIS (LSHM-CT-2003-503413 from the European Community) awarded to Jonas S. Almeida. Francisco R. Pinto and Jo&#227;o A. Carri&#231;o were financially supported by the Portuguese Foundation for Science and Technology with the grants SFRH/BD/6488/2001, SFRH/BPD/21746/2005 and SFRH/BD/3123/2000.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Methods of Comparing Classifications</p>
            </title>
            <aug>
               <au>
                  <snm>Rohlf</snm>
                  <fnm>FJ</fnm>
               </au>
            </aug>
            <source>Annu Rev Ecol Syst</source>
            <pubdate>1974</pubdate>
            <volume>5</volume>
            <fpage>101</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1146/annurev.es.05.110174.000533</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Objective criteria for the evaluation of clustering methods</p>
            </title>
            <aug>
               <au>
                  <snm>Rand</snm>
                  <fnm>WM</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>1973</pubdate>
            <volume>66</volume>
            <fpage>846</fpage>
            <lpage>850</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2284239</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A method for comparing two hierarchical clusterings</p>
            </title>
            <aug>
               <au>
                  <snm>Fowlkes</snm>
                  <fnm>EB</fnm>
               </au>
               <au>
                  <snm>Mallows</snm>
                  <fnm>CL</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>1983</pubdate>
            <volume>78</volume>
            <fpage>553</fpage>
            <lpage>569</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2288117</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Comparing partitions</p>
            </title>
            <aug>
               <au>
                  <snm>Hubert</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Arabie</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Journal of Classification</source>
            <pubdate>1985</pubdate>
            <volume>2</volume>
            <fpage>193</fpage>
            <lpage>218</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/BF01908075</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <aug>
               <au>
                  <snm>Sneath</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Sokal</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Numerical Taxonomy</source>
            <publisher>San Francisco: Freeman</publisher>
            <pubdate>1973</pubdate>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Comment</p>
            </title>
            <aug>
               <au>
                  <snm>Wallace</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>1983</pubdate>
            <volume>78</volume>
            <fpage>569</fpage>
            <lpage>576</lpage>
            <xrefbib>
               <pubid idtype="doi">10.2307/2288118</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Fast and effective text mining using linear time document clustering</p>
            </title>
            <aug>
               <au>
                  <snm>Larsen</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Aone</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Conference on Knowledge Discovery and Data Mining</source>
            <pubdate>1999</pubdate>
            <fpage>16</fpage>
            <lpage>22</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comparing clusterings by the variation of information</p>
            </title>
            <aug>
               <au>
                  <snm>Meila</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Sixteenth Annual Conference on Computational Learning Theory (COLT). Springer</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Performance criteria for graph clustering and markov cluster experiments</p>
            </title>
            <aug>
               <au>
                  <snm>van Dongen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Technical report INS-R0012, Centrum voor Wiskunde en Informatica</source>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Clustering microarray data</p>
            </title>
            <aug>
               <au>
                  <snm>Chipman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Statistical Analysis of Gene Expression Microarray Data</source>
            <publisher>Boca Raton, Florida: Chapman &amp; Hall/CRC</publisher>
            <editor>Speed T</editor>
            <edition>1</edition>
            <pubdate>2003</pubdate>
            <fpage>159</fpage>
            <lpage>200</lpage>
            <note>[N. Keiding BM, T. Speed, P. van der Heijden (Series Editor): <it>Interdisciplinary Statistics Series</it>]</note>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Evaluation and comparison of gene clustering methods in microarray analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Thalamuthu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mukhopadhyay</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zheng</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>GC</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16882653</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Measurement of observer agreement</p>
            </title>
            <aug>
               <au>
                  <snm>Kundel</snm>
                  <fnm>HL</fnm>
               </au>
               <au>
                  <snm>Polansky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Radiology</source>
            <pubdate>2003</pubdate>
            <volume>228</volume>
            <fpage>303</fpage>
            <lpage>308</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12819342</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Illustration of a Common Framework for Relating Multiple Typing Methods by Application to Macrolide-Resistant Streptococcus pyogenes</p>
            </title>
            <aug>
               <au>
                  <snm>Carrico</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Silva-Costa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Melo-Cristino</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pinto</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>de Lencastre</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Almeida</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Ramirez</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Clin Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>44</volume>
            <fpage>2524</fpage>
            <lpage>2532</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1489512</pubid>
                  <pubid idtype="pmpid" link="fulltext">16825375</pubid>
                  <pubid idtype="doi">10.1128/JCM.02536-05</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Rapid inversion of the prevalences of macrolide resistance phenotypes paralleled by a diversification of T and emm types among Streptococcus pyogenes in Portugal</p>
            </title>
            <aug>
               <au>
                  <snm>Silva-Costa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ramirez</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Melo-Cristino</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pathogens</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Antimicrobial Agents and Chemotherapy</source>
            <pubdate>2005</pubdate>
            <volume>49</volume>
            <fpage>2109</fpage>
            <lpage>2111</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1128/AAC.49.5.2109-2111.2005</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Identification of macrolide-resistant clones of Streptococcus pyogenes in Portugal</p>
            </title>
            <aug>
               <au>
                  <snm>Silva-Costa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ramirez</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Melo-Cristino</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Clin Microbiol Infect</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <fpage>513</fpage>
            <lpage>518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1469-0691.2006.01408.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">16700698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Characterization of the genetic lineages responsible for pneumococcal invasive disease in Portugal</p>
            </title>
            <aug>
               <au>
                  <snm>Serrano</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Melo-Cristino</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carrico</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Ramirez</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Clin Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>43</volume>
            <fpage>1706</fpage>
            <lpage>1715</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1081348</pubid>
                  <pubid idtype="pmpid" link="fulltext">15814989</pubid>
                  <pubid idtype="doi">10.1128/JCM.43.4.1706-1715.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Assessment of band-based similarity coefficients for automatic type and subtype classification of microbial isolates analyzed by pulsed-field gel electrophoresis</p>
            </title>
            <aug>
               <au>
                  <snm>Carrico</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Pinto</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Simas</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Nunes</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sousa</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Frazao</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>de Lencastre</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Almeida</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>J Clin Microbiol</source>
            <pubdate>2005</pubdate>
            <volume>43</volume>
            <fpage>5483</fpage>
            <lpage>5490</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1287802</pubid>
                  <pubid idtype="pmpid" link="fulltext">16272474</pubid>
                  <pubid idtype="doi">10.1128/JCM.43.11.5483-5490.2005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Minimum entropy clustering and applications to gene expression analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Proc IEEE Comput Syst Bioinform Conf</source>
            <pubdate>2004</pubdate>
            <fpage>142</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubid idtype="pmpid">16448008</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Dynamic model-based clustering for time-course gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Wu</snm>
                  <fnm>FX</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Kusalik</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>821</fpage>
            <lpage>836</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0219720005001314</pubid>
                  <pubid idtype="pmpid" link="fulltext">16078363</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A study of the comparability of external criteria for hierarchical cluster analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Milligan</snm>
                  <fnm>GW</fnm>
               </au>
               <au>
                  <snm>Cooper</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Multivariate Behavioral Research</source>
            <pubdate>1986</pubdate>
            <volume>21</volume>
            <fpage>441</fpage>
            <lpage>458</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1207/s15327906mbr2104_5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Properties of the Hubert-Arabic adjusted Rand index</p>
            </title>
            <aug>
               <au>
                  <snm>Steinley</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Psychol Methods</source>
            <pubdate>2004</pubdate>
            <volume>9</volume>
            <fpage>386</fpage>
            <lpage>396</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1037/1082-989X.9.3.386</pubid>
                  <pubid idtype="pmpid" link="fulltext">15355155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>RAR toolbox webpage</p>
            </title>
            <url>http://www.imm.ul.pt/html/uni13_fp.html</url>
         </bibl>
      </refgrp>
   </bm>
</art>

