<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-10-99</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Markov clustering versus affinity propagation for the partitioning of protein interaction graphs</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Vlasblom</snm>
               <fnm>James</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jim.vlasblom@utoronto.ca</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Wodak</snm>
               <mi>J</mi>
               <fnm>Shoshana</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>shoshana@sickkids.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Molecular Structure and Function Program, Hospital for Sick Children, 555 University Avenue, Toronto Ontario, M5G 1X8, Canada </p>
            </ins>
            <ins id="I2">
               <p>Department of Biochemistry University of Toronto, 1 King's College Circle, Toronto Ontario, M5S 1A8, Canada </p>
            </ins>
            <ins id="I3">
               <p>Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto Ontario,  M5S 1A8, Canada</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>1</issue>
         <fpage>99</fpage>
         <url>http://www.biomedcentral.com/1471-2105/10/99</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19331680</pubid>
               <pubid idtype="doi">10.1186/1471-2105-10-99</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>12</day>
               <month>9</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>30</day>
               <month>3</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>30</day>
               <month>3</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Vlasblom and Wodak; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>In this work we compare the performance of the Affinity Propagation (AP) and Markov Clustering (MCL) procedures. To this end we derive an unweighted network of protein-protein interactions from a set of 408 protein complexes from <it>S. cervisiae </it>hand curated in-house, and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise. To evaluate the performance on a weighted protein interaction graph, we also apply the two algorithms to the consolidated protein interaction network of <it>S. cerevisiae</it>, derived from genome scale purification experiments and to versions of this network in which varying proportions of the links have been randomly shuffled.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our analysis shows that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Protein-protein interactions play a key role in cellular processes and significant efforts are being devoted world wide to characterizing such interactions on the scale of whole genomes (for review see <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>). Genome scale data on protein interactions are typically obtained using experimental methods for detecting binary interactions<abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>, or by affinity purifications of tagged proteins coupled to analytical methods for identifying the co-purified partners <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. These data are in general represented as large networks, or graphs where hundreds or thousands of proteins are linked to one another <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. For a recent review of network analysis techniques as applied to protein interaction networks, see <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
         <p>It is well known however that proteins tend to function in groups, or complexes, which in the yeast <it>S. cerevisiae </it>contain on average 4.7 different types of subunits <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. An important goal has therefore been to reliably identify protein complexes from the protein interaction graphs. This task is commonly carried out using graph clustering procedures, which aim at detecting densely connected regions within the interaction graphs.</p>
         <p>Clustering is an unsupervised learning method that tackles the task of producing an intrinsic grouping of data elements on the basis of some metric (a 'distance' or similarity measure between elements). It requires solving an optimization problem, which is usually achieved with the help of heuristic algorithms whose ability to approximate the best solution (global minimum) may vary widely<abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Their application in the context of protein interaction networks encounters the additional problem of dealing with the significant level of background noise in these networks<abbrgrp><abbr bid="B15">15</abbr></abbrgrp> (e.g. spurious interactions that have no biological meaning). Dealing with a high level of noise is a major challenge for clustering procedures, as this requires mitigating the effect of noise by various means &#8211; for example by taking into account the topology properties of the network, either during the clustering process or by modifying the distance metric to incorporate such properties prior to clustering.</p>
         <p>There exists a wealth of clustering algorithms of which hierarchical clustering (for review see <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>) and K-means<abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> are classical examples. More recently a variety of other algorithms have been proposed<abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, and some of these have been applied to the identification of highly connected nodes in protein interaction graphs<abbrgrp><abbr bid="B7">7</abbr><abbr bid="B21">21</abbr></abbrgrp>.</p>
         <p>So far, one of the most successful clustering procedures used in deriving complexes from protein interaction networks seems to be the Markov Cluster algorithm (MCL)<abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Unlike most hierarchical clustering procedures, this algorithm considers the connectivity properties of the underlying network. It has been used to derive complexes from protein interaction data in two recent comprehensive analyses of the yeast interactome <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B21">21</abbr></abbrgrp>. Furthermore, in a recent benchmark carried out by Broh&#233;e et al<abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, MCL was shown to be especially effective for clustering protein interactions in that it possesses a high degree of noise-tolerance in comparison to other algorithms such as the Molecular Complex Detection (MCODE)<abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and Super Paramagnetic Clustering (SPC)<abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
         <p>Over a year ago, a novel promising clustering procedure termed Affinity Propagation (AP) was proposed <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Affinity propagation identifies representative examples (exemplars) within the dataset by exchanging real-valued messages between all data points. Points are then grouped with their most representative exemplar to give the final set of clusters. AP was applied to a variety of problems including face recognition, and gene identification from putative exons using microarray data, and was shown to be faster and more accurate than the K-Centers<abbrgrp><abbr bid="B18">18</abbr></abbrgrp> clustering algorithm. A subsequent note suggested however, that AP was similar to the earlier vertex substitution heuristic (VSH), and that it did not perform any better<abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. This prompted the AP authors to provide evidence that AP outperforms VSH on large problems &#8211; where it runs much faster, and was more accurate than several clustering algorithms tested<abbrgrp><abbr bid="B27">27</abbr></abbrgrp>.</p>
         <p>In view of the interest in applying efficient clustering procedures to biological networks in order to identify and characterize functional modules, this paper expands the analysis of Broh&#233;e et al<abbrgrp><abbr bid="B15">15</abbr></abbrgrp> to the comparison of the AP and MCL algorithms. Such comparison has not been previously reported.</p>
         <p>Following Broh&#233;e et al, we first derive an unweighted network of protein-protein interactions from a set of up-to-date hand curated protein complexes from <it>S. cervisiae</it><abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise.</p>
         <p>To test performance on a more realistic weighted protein interaction graph, we also apply the two algorithms to the high confidence consolidated protein interaction network of <it>S. cerevisiae </it>recently derived by Collins et al<abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and to versions of this network in which varying proportions of the links have been randomly shuffled. The computed clusters are compared to the same set of curated <it>S. cerevisiae </it>complexes in order to assess the robustness of the two algorithms.</p>
         <p>The comparative analysis on the unweighted networks proposed here has the advantage of representing a self-consistent approach, in which information on a predefined number of cliques is used to build the network, and hence the expected result from partitioning this network is well defined. The choice of the weighted high confidence consolidated network of <it>S. cerevisiae </it>recently derived from purification data also enables to quantify the performance of the clustering procedures by comparing computed clusters to the annotated complexes. Such quantification is difficult with <it>S. cerevisiae </it>protein interaction networks built using yeast two hybrid data, because these interactions differ significantly from co-complex interactions<abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Partitioning this network using any method is hence unlikely to yield clusters comparable to complexes. The much larger human protein interaction networks compiled from different sources and stored in databases such as HPRD (~50,000 interactions), would not serve our purpose either, given the still limited number of fully annotated human protein complexes against which the clustering results can be compared.</p>
         <sec>
            <st>
               <p>The clustering algorithms</p>
            </st>
            <p>The Markov clustering algorithm (MCL) simulates random walks on the underlying interaction network, by alternating two operations: expansion, and inflation. First, loops are added to the input graph &#8211; by default, the loop weight for each node is assigned as the maximum weight of all edges connected to the node &#8211; and this graph is then translated into a stochastic "Markov" matrix. This matrix represents the transition probabilities between all pairs of nodes, and the probability of a random walk of length n between any two nodes can be calculated by raising this matrix to the exponent n &#8211; a process referred to as expansion. As higher length paths are more common between nodes in the same cluster than nodes within different clusters, the probabilities between nodes in the same complex will typically be higher in expanded matrices. MCL further exaggerates this effect by taking entry wise exponents of the expanded matrix, and then rescaling each column so that it remains stochastic &#8211; a process called inflation. Clusters are identified by alternating expansion and inflation until the graph is partitioned into subsets so that there are no longer paths between these subsets.</p>
            <p>Affinity Propagation (AP) identifies cluster centers, or exemplars, from the graph, which in some sense are a representative member of the cluster. Initially, all nodes are considered as exemplars, though each node is manually assigned a "preference" that it should be chosen as an exemplar. If no prior knowledge is available on which nodes should be favored as exemplars, then all nodes can be assigned the same preference value &#8211; where the magnitude can be used to control cluster granularity. For each node i and each candidate exemplar k, AP computes the "responsibility" r(i, k), which indicates how well suited k is as an exemplar for i, and the "availability" a(i, k) reflecting the evidence that i should choose k as an exemplar.</p>
            <p>
               <display-formula>
                  <graphic file="1471-2105-10-99-i1.gif"/>
               </display-formula>
            </p>
            <p>Where the matrix s(i, k) denotes the similarity (eg. edge weight) between the two nodes i and k, and the diagonal of this matrix contains the preferences for each node. The above two equations are iterated until a good set of exemplars emerges. Each node i can then be assigned to the exemplar k which maximizes the sum a(i, k) + r(i, k), and if i = k, then i is an exemplar. A damping factor between 0 and 1 is used to control for numerical oscillations.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Performance on unweighted protein interaction graphs</p>
            </st>
            <p>Both algorithms are first applied to partition unweighted protein interaction graphs. The original version of these graphs was built from a set of 408 <it>S. cerevisiae </it>protein complexes hand curated in-house<abbrgrp><abbr bid="B28">28</abbr></abbrgrp> (see Additional File <supplr sid="S1">1</supplr>). In this graph, nodes represent individual proteins from these complexes, and any two proteins belonging to the same complex are linked by an edge. Figure <figr fid="F1">1a</figr> illustrates this graph as rendered by the Cytoscape<abbrgrp><abbr bid="B31">31</abbr></abbrgrp> network visualization and analysis software. This rather disjoint graph is comprised of 11,249 interactions and 1,628 proteins, where the majority of the proteins are linked only to members of the same complex, forming distinct cliques, and only a small fraction are linked to members of different complexes. This graph is clearly a less challenging test for clustering procedures than protein interaction networks built from experimental data, since those networks include an appreciable level of spurious links (False Positive links). Networks built from experimental data typically feature more links between proteins in different complexes and not all members of a given complex are always linked to one another. To better mimic these more realistic networks we randomly add or remove links to this original network in various proportions, as done by Broh&#233;e et al. <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> thereby generating different versions of the original network which include varying levels of noise, representing different proportions of False Positives (FP) and False Negatives (FN) (links deleted from the graph, but present in the original network).</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p><b>Curated Complexes.</b> Curated complexes<abbrgrp><abbr bid="B28">28</abbr></abbrgrp> used to generate the unweighted networks, and taken as the reference set for computing the Geometric Accuracy (<it>Acc</it>) and Separation (<it>Sep</it>) values (see main text for detail).</p>
               </text>
               <file name="1471-2105-10-99-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Original unweighted protein interaction graph and graphs of curated complexes linked through their shared components</p>
               </caption>
               <text>
                  <p><b>Original unweighted protein interaction graph and graphs of curated complexes linked through their shared components</b>. (a)<b/>Unweighted protein interaction graph comprising 1628 proteins (nodes) and 11 249 interactions (edges) generated from the 408 hand curated complexes of <it>S. cerevisiae</it><abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. (b, c)<b/>two copies of a portion of the graph in (a), where complexes (nodes) are linked to one another whenever they share at least 2 components, and the node size is proportional to the number of unique proteins each complex contains. (b)<b/>and (c)<b/>have the AP and MCL clusters respectively mapped onto the curated complexes, so that pie charts show proportions of complex components that are annotated to the same AP or MCL cluster. The mapped clusters are computed from versions of the original unweighted network shown in (a) in which 20% of the edges were randomly added and 20% randomly removed. Complexes whose components distribute among many clusters appear as multi-colored pie graphs, whereas those that are annotated to the same cluster appear solid-colored. The bright red color indicates the proportion of components that were assigned to singleton clusters by the AP or MCL algorithm. All the comparisons were performed with partitions obtained by optimizing the MCL and AP parameters respectively (see Methods). The pie graphs were generated using the GenePro plugin<abbrgrp><abbr bid="B32">32</abbr></abbrgrp> for Cytoscape<abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
               </text>
               <graphic file="1471-2105-10-99-1"/>
            </fig>
            <p>The MCL and AP clustering procedures were each applied to the different versions of the networks and the correspondence between the computed clusters and the original 408 curated complexes was evaluated for each network version. The correspondence was quantified using the Geometric Accuracy (<it>Acc</it>) and Geometric Separation (<it>Sep</it>) criteria as previously defined<abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. <it>Acc </it>is computed as the geometric mean of the Positive Predictive Value and Sensitivity with which the clusters recall the original complexes. The <it>Sep </it>parameter is defined as the geometric mean of two quantities that measure how cluster components are on average distributed amongst complexes and how complex components are distributed among clusters, respectively (see Methods for further details).</p>
            <p>To enable as fair a comparison as possible, values of the adjustable parameters in each clustering algorithm were selected so as to maximize the sum of the <it>Acc </it>and <it>Sep </it>values for the clusters computed from each network (see Methods).</p>
            <p>Figures <figr fid="F1">1b</figr> and <figr fid="F1">1c</figr> present a visual overview of the results obtained from an unweighted network, derived from the original network by adding 20% of the edges, and rendered using the GenePro<abbrgrp><abbr bid="B32">32</abbr></abbrgrp> plugin for Cytoscape. They show that the AP clusters are more fragmented than those obtained with MCL, as components annotated to the same curated complex are often distributed among several AP clusters, whereas the MCL clusters tend to map more fully into the curated complexes. This result is summarized by the <it>Acc </it>and <it>Sep </it>parameters listed in the Additional File <supplr sid="S2">2</supplr>. To further understand how each algorithm handles noise, simulated here by random addition and subtraction of graph edges, we focus on the effects of either adding (Figure <figr fid="F2">2</figr>) or removing (Figure <figr fid="F3">3</figr>) edges. While AP and MCL yield solutions with virtually identical Acc and Sep values for the original network (zero noise level), the AP algorithm did not converge for most of the noisier networks. The one with 20% random edge addition was among the few for which it did converge, but the <it>Acc </it>and <it>Sep </it>values of the resulting clusters were much lower than those obtained with MCL on the same network. AP also did not converge for the majority of networks with simultaneous random edge addition and removal (Additional File <supplr sid="S2">2</supplr>). In contrast, MCL generated clustering solutions with relatively high <it>Acc </it>and <it>Sep </it>at all noise levels. Interestingly however, for networks containing high noise levels, the MCL clusters group only a fraction of the proteins comprising the interaction network, leaving the remaining proteins ungrouped (singletons) (Additional File <supplr sid="S2">2</supplr>).</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p><b>Clustering Results, Unweighted Network.</b> AP and MCL results for all parameters tested, at all noise levels for the unweighted network. Columns descriptions are listed in the 'col_descriptions' worksheet tab of this spreadsheet. See also reference <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and main text for descriptions of <it>PPV, Sensitivity, Acc, SepCl, SepCo</it>, and <it>Sep</it>.</p>
               </text>
               <file name="1471-2105-10-99-S2.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Effect of random addition of edges to the original unweighted protein interaction graph on the performance of the MCL and AP algorithms</p>
               </caption>
               <text>
                  <p><b>Effect of random addition of edges to the original unweighted protein interaction graph on the performance of the MCL and AP algorithms</b>. Performance is evaluated by plotting two parameters, the Geometric Accuracy (<it>Acc</it>) (a), and Separation (<it>Sep</it>) (b). The random addition of varying proportions of edges mimics noise created due to varying proportions of False Positive interactions (spurious interactions). For AP, only those points where the algorithm converged are plotted. Definitions of <it>Acc </it>and <it>Sep </it>are given in Methods. The open circle marks the <it>Acc </it>and <it>Sep </it>values achieved by the curated complexes used to generate the original protein interaction graph, as measured against themselves &#8211; note that separation is &lt; 1 due to shared components between complexes. Dashed lines indicate the values obtained from random graphs used as controls (see Methods). The solid horizontal line shows the <it>Acc </it>(a) or <it>Sep </it>(b) values achieved by not grouping any proteins (i.e. a "clustering" that consists entirely of singletons).</p>
               </text>
               <graphic file="1471-2105-10-99-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Effect of random removal of edges from the original unweighted protein interaction graph on the performance of the MCL and AP algorithms</p>
               </caption>
               <text>
                  <p><b>Effect of random removal of edges from the original unweighted protein interaction graph on the performance of the MCL and AP algorithms</b>. Performance is evaluated by plotting two parameters, the Geometric Accuracy (<it>Acc</it>) (a), and Separation (<it>Sep</it>) (b). The solid horizontal line, dashed lines, and the open circle are as described in the figure 2 caption.</p>
               </text>
               <graphic file="1471-2105-10-99-3"/>
            </fig>
            <p>We also tested AP on an unweighted network of 15 982 human protein-protein interactions comprising 5850 unique proteins, annotated as experimentally characterized using affinity capture or reconstituted complexes in version 2.0.50 of the BioGRiD database<abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Similar to the results obtained for unweighted networks to which artificial noise was added, AP did not converge for this more realistic network derived from inherently noisy experimental data. MCL produced clusterings containing between 663 and 1566 clusters, depending on the inflation value. A detailed analysis of these clusters is outside the scope of this report, but the size distributions of the clusters in the MCL partitions produced at various inflation values (Additional File <supplr sid="S3">3</supplr>) indicate that they are not all trivial singleton or extremely large clusters.</p>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p><b>MCL Human PPI Size Distribution.</b> Size distributions of clusters in partitions computed by MCL on a human PPI network (15 982 interactions and 5850 proteins) extracted from version 2.0.50 of the Biogrid database<abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. AP did not converge for this network, precluding any comparison.</p>
               </text>
               <file name="1471-2105-10-99-S3.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The <it>Acc </it>and <it>Sep </it>were also evaluated for the 408 curated complexes directly. As expected,<it> Acc</it>, which quantifies the maximum extent of overlap between complexes and clusters &#8211; and vice versa &#8211; is 1 for these complexes (Figures <figr fid="F2">2a</figr>, <figr fid="F3">3a</figr>). Lower <it>Acc </it>values are obtained for the partitions derived by both clustering algorithms &#8211; largely due to shared components in the original complexes, which can obscure their detection, especially for smaller clusters. In contrast, shared components lower the <it>Sep </it>values of the original complexes, and hence as the clustering algorithms partition the graphs they can achieve higher <it>Sep </it>values at low noise levels (Figures <figr fid="F2">2b</figr>, <figr fid="F3">3b</figr>).</p>
            <p>These results depart sharply from those expected for random partitions, as also illustrated in Figures <figr fid="F2">2a, b</figr>, <figr fid="F3">3a</figr>, and <figr fid="F3">3b</figr>. Random partitions were generated by randomly permuting the assignments of the proteins to clusters within the MCL and AP predictions.</p>
         </sec>
         <sec>
            <st>
               <p>Performance on a weighted biological protein interaction graph</p>
            </st>
            <p>A second series of tests was performed using interaction graphs built from the consolidated network of Collins et al<abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, where each protein-protein link has an associated confidence score ranging in values from 0 to 1. As in previous studies<abbrgrp><abbr bid="B21">21</abbr><abbr bid="B29">29</abbr></abbrgrp>, only the high confidence portion of the network was considered, comprising links whose scores are above a confidence threshold of 0.38<abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The resulting network comprised 12,035 interactions and 1,921 proteins. Since this network represents predicted associations from data derived in two recent high-throughput experimental studies<abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>, some noise will naturally be present. We did however generate noisier versions of this network by randomly shuffling increasing fractions of edges, and re-evaluating the results for each of these versions. As for the performance tests on the unweighted graphs, the parameters of each algorithm are adjusted so as to optimize the correspondence with the curated complexes, by maximizing the sum of the <it>Acc </it>and <it>Sep </it>values as done above for the comparative analysis on the unweighted graphs.</p>
            <p>On this more realistic network, both AP and MCL were able to predict clusters at all the tested noise levels. The results illustrated in Figure <figr fid="F4">4</figr>, show that, as expected, the <it>Acc </it>value tends to decrease with added noise for both algorithms, and that the <it>Acc </it>of MCL is higher than AP at all noise levels. The shaded areas in Figure <figr fid="F4">4</figr> indicate the ranges in the <it>Acc </it>and <it>Sep </it>values covered by solutions obtained by varying the parameters of the AP and MCL algorithms, respectively. It can be seen from this figure that our parameter selection procedure was successful in identifying parameter values that approximately maximize the <it>Acc </it>and <it>Sep </it>measures independently at all noise levels.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>(a) Geometric accuracy (<it>Acc</it>) and (b) Separation (<it>Sep</it>) as varying numbers of edges are shuffled within the weighted network</p>
               </caption>
               <text>
                  <p><b>(a) Geometric accuracy (<it>Acc</it>) and (b) Separation (<it>Sep</it>) as varying numbers of edges are shuffled within the weighted network</b>. Shaded regions indicate the range of <it>Acc </it>(a) or <it>Sep </it>(b) values achieved by each clustering algorithm as their parameter values were varied. The open circle marks the <it>Acc </it>and <it>Sep </it>values achieved by the curated complexes used to generate the unweighted networks. The solid horizontal line and dashed lines are as described in the caption of Figure 2.</p>
               </text>
               <graphic file="1471-2105-10-99-4"/>
            </fig>
            <p>We also see that at high levels of noise, the results are no longer meaningful as the clusters predicted by either algorithm consist almost entirely of singletons. Both algorithms have a slowly decreasing <it>Sep </it>as progressively more edges are shuffled. When no artificial noise is introduced, both algorithms are roughly comparable in terms of <it>Acc</it>, although the AP solution has slightly lower <it>Sep </it>and incorporates 61 fewer proteins than MCL (see Figure <figr fid="F5">5</figr>), which are classified as singletons. Examples of complexes recovered by MCL, but not AP are given in Figure S3 of the Additional File <supplr sid="S4">4</supplr>. As the level of artificial noise increases to 10%, both algorithms maintain approximately the same number of clusters and proteins. At 20%&#8211;30% noise, the optimal MCL solution in terms of <it>Acc+Sep </it>happens to correspond to a much coarser clustering than that obtained with AP (smaller number of clusters in Figure <figr fid="F5">5</figr>). However, using different Inflation values can generate solutions featuring finer granularities with only a minor decrease in <it>Acc+Sep </it>(Additional File <supplr sid="S5">5</supplr>). Overall, at around 60&#8211;70% noise predictions from both algorithms begin degenerating into singletons. The relative performance of MCL and AP does not depend on the objective function <it>Acc+Sep</it>. We verified indeed that at any preference value used for AP, clustering solutions produced by MCL have higher <it>Acc </it>and equivalent or higher <it>Sep </it>values at all inflation values tested (Figure S1 in Additional File <supplr sid="S4">4</supplr>).</p>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p><b>Supplementary Figures.</b> Supplementary figures referred to in the main text.</p>
               </text>
               <file name="1471-2105-10-99-S4.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Number of clusters and proteins included in these clusters obtained by each algorithm on the weighted network at different noise levels, after removing singletons</p>
               </caption>
               <text>
                  <p><b>Number of clusters and proteins included in these clusters obtained by each algorithm on the weighted network at different noise levels, after removing singletons</b>. At higher noise levels, the optimal AP solution in terms of <it>Acc</it>+<it>Sep </it>consists almost entirely of singletons, though coarser solutions also exist (see text, and Additional File <supplr sid="S5">5</supplr>).</p>
               </text>
               <graphic file="1471-2105-10-99-5"/>
            </fig>
            <p>To gain insight into how the MCL and AP clustering solutions change as edges are randomly shuffled, we plotted the mass fraction and area fraction (Figures <figr fid="F6">6a, b</figr>) for the optimal clustering at each noise level as found above. The mass fraction of a clustering solution for a weighted graph is simply the fraction of the total edge weight that is entirely contained within clusters. The area fraction assumes that each identified cluster is a clique, and measures the number of these clique edges relative to the total number of edges in a clique of all nodes (see Methods). We see that for both algorithms, the mass fraction decreased as edges are shuffled &#8211; which is expected given that formerly intra-complex edges are being reassigned as inter-complex edges during the shuffling. The area fractions also decreased for both algorithms, suggesting more granular clusterings.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Mass and Area Fractions of the AP and MCL solutions at varying noise levels</p>
               </caption>
               <text>
                  <p><b>Mass and Area Fractions of the AP and MCL solutions at varying noise levels</b>. These values assess the intrinsic efficiency of a clustering in terms of the amount of edge mass captured, with higher values indicating improved efficiency (see Methods for detail). The Mass fraction summarizes how much of the total edge weight is captured within clusters. The Area fraction is a measure of cluster granularity, such that higher area fractions correspond to coarser clusterings (See text).</p>
               </text>
               <graphic file="1471-2105-10-99-6"/>
            </fig>
            <p>Overall we find that MCL tends to generate a more granular clustering in the presence of noise (Figures <figr fid="F7">7a, b</figr>) &#8211; although at very low noise levels AP produces more singletons and 2-members clusters than MCL. We also find that the higher <it>Acc </it>obtained with MCL in the presence of noise is maintained across the entire range of complex sizes (Figure S2b in Additional File <supplr sid="S4">4</supplr>), so that MCL's ability to recapitulate the curated complexes even at high noise levels (40% artificial noise) is better than AP for complexes of all sizes. In contrast, AP generally produces coarser clusterings as noise is increased, although the number of very large (16 or more components) clusters does decrease, reducing the overall area fraction.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Size distributions of clusters produced by the AP and MCL algorithm using the weighted network, when a) no edges are shuffled and b) when 40% of edges are shuffled</p>
               </caption>
               <text>
                  <p><b>Size distributions of clusters produced by the AP and MCL algorithm using the weighted network, when a) no edges are shuffled and b) when 40% of edges are shuffled</b>.</p>
               </text>
               <graphic file="1471-2105-10-99-7"/>
            </fig>
            <p>These results, together with the superior <it>Acc </it>and <it>Sep </it>values obtained with MCL at high noise levels suggest that this algorithm is a better choice for weighted protein interaction networks.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In summary, our analysis has shown that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. It is possible that AP as it stands, is not suitable for unweighted networks (as discussed below), although this is not specified in the instructions for using the program or in the original publication<abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         <p>On weighted graphs constructed using data from high throughput experiments believed to be incomplete and usually quite noisy, the difference in performance is also notable. MCL achieves higher <it>Acc </it>and equivalent or better <it>Sep </it>at all significant noise levels. Furthermore, at low to moderate noise levels, these solutions include more proteins than AP. Parameters for either algorithm can be adjusted to affect the final granularity of the cluster, but either the <it>Acc </it>or the <it>Sep </it>will be lower.</p>
         <p>Thus for physical interaction networks, we find that MCL outperforms AP in terms of its ability to generate meaningful partitions. The other cited advantages of the AP algorithm, namely its speed and ability to tackle very large networks, play only a minor role in the present application. Indeed both MCL and AP run very fast (&lt; 10 seconds) on the weighted consolidated network of 12,035 interactions and 1,921 proteins. As noise is added to this network, AP can also fail converge at certain preference values (Figure S1 in Additional File <supplr sid="S4">4</supplr> and Additional File <supplr sid="S5">5</supplr>), and it can be difficult to determine which parameters will lead to convergence. For example, AP didn't converge at any of the Preference values tested for unweighted networks with edges randomly removed. On weighted networks with 30% noise, the algorithm converged at Preference values 0.65 and 0.9 only (Additional File <supplr sid="S4">4</supplr>). Thus for this application, one difficulty in using AP is to determine an appropriate interval and level of granularity for searching Preference values. The AP authors provide tools to assist in choosing sensible Preference intervals, but not for choosing granularity. In situations where AP does not converge, the authors recommend increasing the Damping factor, the maximum number of iterations, and the number of iterations required for convergence &#8211; although increasing these parameters can increase the runtime of the algorithm.</p>
         <suppl id="S5">
            <title>
               <p>Additional File 5</p>
            </title>
            <text>
               <p><b>Clustering Results, Weighted Network.</b> AP and MCL results for all parameters tested, at all noise levels for the weighted network. Column descriptions are given in the 'col_descriptions' worksheet tab of this spreadsheet. See also reference <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and main text for descriptions of <it>PPV, Sensitivity, Acc, SepCl, SepCo</it>, and <it>Sep</it>.</p>
            </text>
            <file name="1471-2105-10-99-S5.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>The MCL algorithm effectively considers both edge weight and graph topology (connectivity) information. AP, on the other hand, can fail in situations where high weight edges connect two clusters. Consider the artificial situation where two cliques, A and B, are connected by a single, relatively high weight edge. If one of the nodes comprising this edge is an exemplar in clique A, the adjacent node in clique B may be incorrectly assigned to A by AP, despite being highly connected to members of B. This suggests that MCL achieves its robust performance by always considering network topology, whereas AP relies in part on the 'distance metric' (edge weight) to capture this information. To overcome this limitation one could define a modified distance metric that simultaneously captures both the propensity of two proteins to interact and the graph topology, and re-run AP on the modified graph. To some extent, the PE score is such a metric as higher scores are assigned to proteins that repeatedly co-purify together in affinity capture experiments, and lower scores are assigned to non-specific interactions that occur between promiscuous proteins. Indeed, on the PE weighted network of Collins et al, the performance of AP is much closer to that of MCL when the network is unperturbed, as randomly shuffling edges distorts the topology information contained in the edge weights. In the unweighted network, where no topological information is captured by the distance metric, AP is only able to successfully cluster unperturbed networks with very few inter-complex edges (shared components).</p>
         <p>As noted in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, the relative accuracy and performance of clustering algorithms can vary greatly for different datasets, and this report makes no attempt to address the breadth of problems for which one algorithm outperforms the other.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Building the protein interaction graphs</p>
            </st>
            <p>The unweighted interaction graph was defined by considering all possible pairs of proteins that were annotated to the same complex within a gold-standard set of yeast protein complexes<abbrgrp><abbr bid="B28">28</abbr></abbrgrp> (Additional File <supplr sid="S1">1</supplr>). Each edge was assigned a weight of 1. The resulting network comprised 11,238 interactions (edges) and 1624 proteins (nodes). For AP, the input pairwise 'similarities' were defined twice for every pair of proteins i, j as S(i, j) = S(j, i) = 1 if protein i and j were annotated to the same gold standard complex.</p>
            <p>The weighted interaction network is that derived by Collins et al<abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. The weight of each edge represents the confidence score of each putative interaction, as defined in ref<abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. These confidence scores range from 0.38 to 1. For AP, the input 'similarities' were again defined twice for every pair of interacting proteins i, j as S(i, j) = S(j, i) = c, where c is the confidence assigned to the interaction.</p>
         </sec>
         <sec>
            <st>
               <p>Performance assessment</p>
            </st>
            <p>The ability of each clustering algorithm to recapitulate the known complexes from the weighted or unweighted interaction graph was measured using the Geometric Accuracy (<it>Acc</it>) and Geometric Separation (<it>Sep</it>), which are derived from a Confusion Table T, where each entry T<sub>i, j </sub>gives the number of proteins in common between complex i and cluster j<abbrgrp><abbr bid="B15">15</abbr></abbrgrp>:</p>
            <p>
               <display-formula>
                  <graphic file="1471-2105-10-99-i2.gif"/>
               </display-formula>
            </p>
            <p>
               <display-formula>
                  <graphic file="1471-2105-10-99-i3.gif"/>
               </display-formula>
            </p>
            <p>The <it>Acc </it>indicates the tradeoff between the Sensitivity and the Positive Predictive Value (PPV), and is calculated by taking the geometric mean of these two quantities. Sensitivity is defined as the weighted average complex-wise sensitivities, S<sub>i</sub>, and cluster-wise positive predictive values P<sub>j</sub>. S<sub>i </sub>measures the best overlap of complex i with the predicted clusters, and Pj measures the best overlap of cluster j with the gold standard complexes, relative to the number of components in cluster j that are contained in the original set of complexes. The <it>Acc </it>alone may not give an accurate evaluation of a clustering &#8211; for example, if the clustering consists of very large and very small clusters<abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. In this case both the complex-wise Sensitivity and cluster-wise PPV will be high.</p>
            <p>A second measure, the <it>Sep</it>, is therefore calculated to measure the one-to-one correspondence between predicted clusters and complexes. It is defined as the geometric mean of the average complex-wise and average cluster-wise separation, which are each derived from confusion tables modified, respectively, to indicate the fraction of overlap of each complex with every cluster, or each cluster with every complex. Unlike Broh&#233;e et al, all calculations done here consider only those components that exist in both datasets.</p>
         </sec>
         <sec>
            <st>
               <p>Graph Properties</p>
            </st>
            <p>The mass and area fractions were computed for each unweighted graph using the clminfo tool provided with the MCL implementation. The mass fraction measures the total weight of all edges (interactions) that occur between proteins within the same cluster, relative to the total weight of all edges. For an interaction network of edges, <it>E</it>, and the subset of these edges contained entirely within clusters <it>E</it>* &#8838; <it>E</it>, the mass fraction is given by:</p>
            <p>
               <display-formula>
                  <graphic file="1471-2105-10-99-i4.gif"/>
               </display-formula>
            </p>
            <p>where <it>w</it>(<it>e</it>) denotes the weight of edge e.</p>
            <p>The area fraction is calculated by translating a clustering into an interaction graph by considering each cluster as a clique, and then dividing the number of clique edges by the number of edges within a full clique. For a graph with <it>N </it>nodes and a clustering of the graph containing <it>C </it>clusters, where the number of components in the <it>i</it><sup><it>th </it></sup>cluster is given by <it>n</it><sub><it>i</it></sub>:</p>
            <p>
               <display-formula>
                  <graphic file="1471-2105-10-99-i5.gif"/>
               </display-formula>
            </p>
         </sec>
         <sec>
            <st>
               <p>Parameter Optimization</p>
            </st>
            <p>Each clustering was performed with parameters that maximized the Geometric Accuracy and Separation. For MCL this involved sampling Inflation parameter values of 1.5 &#8211; 4 in steps of 0.1. For the AP algorithm we sampled the Preference parameters from 0.1&#8211;1 in steps of 0.05. The damping factor was set to 0.99, the maximum number of iterations to 15,000, and the number of iterations required for convergence to 1500. For AP, all proteins were assigned the same preference.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JV participated in the design of the study and performed the analysis. SW assisted in the study design, analysis, and manuscript preparation. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We are grateful to Delbert Dueck and Brendan Frey for guidance in using the Affinity Propagation algorithm. Miguel Santos, and the systems support team of the Centre for Computational Biology at the Hospital for Sick Children are thanked for help with the computer systems. S.J.W. is Tier 1 Canada Research Chair in Computational Biology and Bioinformatics and acknowledges support from the Canada Institute for Health Research, the Hospital for Sick Children and the Sickkids Foundation, Toronto, Canada.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The social network of a cell: Recent advances in interactome mapping</p>
            </title>
            <aug>
               <au>
                  <snm>Charbonnier</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gallego</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>Biotechnology annual review</source>
            <pubdate>2008</pubdate>
            <volume>14</volume>
            <fpage>1</fpage>
            <lpage>28</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18606358</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Interactome: gateway into systems biology</p>
            </title>
            <aug>
               <au>
                  <snm>Cusick</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Klitgord</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Vidal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Human molecular genetics</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <issue>Spec No. 2</issue>
            <fpage>R171</fpage>
            <lpage>181</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16162640</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>A novel genetic system to detect protein-protein interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Fields</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1989</pubdate>
            <volume>340</volume>
            <issue>6230</issue>
            <fpage>245</fpage>
            <lpage>246</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2547163</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Split ubiquitin as a sensor of protein interactions in vivo</p>
            </title>
            <aug>
               <au>
                  <snm>Johnsson</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Varshavsky</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proceedings of the National Academy of Sciences of the United States of America</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <issue>22</issue>
            <fpage>10340</fpage>
            <lpage>10344</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">45015</pubid>
                  <pubid idtype="pmpid">7937952</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Functional organization of the yeast proteome by systematic analysis of protein complexes</p>
            </title>
            <aug>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Bosche</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Grandi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Marzioch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bauer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Michon</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Cruciat</snm>
                  <fnm>CM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <issue>6868</issue>
            <fpage>141</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11805826</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Proteome survey reveals modularity of the yeast cell machinery</p>
            </title>
            <aug>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Aloy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Grandi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Boesche</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marzioch</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rau</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Bastuck</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dumpelfeld</snm>
                  <fnm>B</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <issue>7084</issue>
            <fpage>631</fpage>
            <lpage>636</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16429126</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Global landscape of protein complexes in the yeast Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Krogan</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Cagney</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhong</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Ignatchenko</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Datta</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tikuisis</snm>
                  <fnm>AP</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <issue>7084</issue>
            <fpage>637</fpage>
            <lpage>643</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16554755</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Functional and topological characterization of protein interaction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Yook</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Oltvai</snm>
                  <fnm>ZN</fnm>
               </au>
               <au>
                  <snm>Barabasi</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2004</pubdate>
            <volume>4</volume>
            <issue>4</issue>
            <fpage>928</fpage>
            <lpage>942</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15048975</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Network genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Ideker</snm>
                  <fnm>TE</fnm>
               </au>
            </aug>
            <source>Ernst Schering Research Foundation workshop</source>
            <pubdate>2007</pubdate>
            <issue>61</issue>
            <fpage>89</fpage>
            <lpage>115</lpage>
            <xrefbib>
               <pubid idtype="pmpid">17249498</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Interaction networks for systems biology</p>
            </title>
            <aug>
               <au>
                  <snm>Bader</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kuhner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gavin</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>FEBS letters</source>
            <pubdate>2008</pubdate>
            <volume>582</volume>
            <issue>8</issue>
            <fpage>1220</fpage>
            <lpage>1224</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18282471</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Protein networking: insights into global functional organization of proteomes</p>
            </title>
            <aug>
               <au>
                  <snm>Pieroni</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>de la Fuente van Bentem</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mancosu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Capobianco</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hirt</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>de la Fuente</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2008</pubdate>
            <volume>8</volume>
            <issue>4</issue>
            <fpage>799</fpage>
            <lpage>816</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18297653</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The cell as a collection of protein machines: preparing the next generation of molecular biologists</p>
            </title>
            <aug>
               <au>
                  <snm>Alberts</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>92</volume>
            <issue>3</issue>
            <fpage>291</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9476889</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Using protein affinity chromatography to probe structure of protein machines</p>
            </title>
            <aug>
               <au>
                  <snm>Formosa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barry</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Alberts</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1991</pubdate>
            <volume>208</volume>
            <fpage>24</fpage>
            <lpage>45</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1779837</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Algorithms for clustering data</p>
            </title>
            <aug>
               <au>
                  <snm>Jain</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Dubes</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Upper Saddle River: Prentice-Hall Advanced Reference Series archive</source>
            <pubdate>1988</pubdate>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Evaluation of clustering algorithms for protein-protein interaction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Brohee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>van Helden</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>488</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1637120</pubid>
                  <pubid idtype="pmpid" link="fulltext">17087821</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Statistical Analysis of Gene Expression Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Chipman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <publisher>Boca Raton, FL: Chapman and Hall</publisher>
            <pubdate>2003</pubdate>
            <fpage>159</fpage>
            <lpage>199</lpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Hastie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Friedman</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <publisher>New York: Springer</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Some methods for classification and analysis of multivariate observations</p>
            </title>
            <aug>
               <au>
                  <snm>MacQueen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Procedings of the Fifth Berkeley Symposium on Math, Statistics, and Probability</source>
            <pubdate>1967</pubdate>
            <volume>1</volume>
            <fpage>281</fpage>
            <lpage>297</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Least squares quantization in PCM</p>
            </title>
            <aug>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Information Theory</source>
            <pubdate>1982</pubdate>
            <volume>28</volume>
            <fpage>128</fpage>
            <lpage>137</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Network-based prediction of protein function</p>
            </title>
            <aug>
               <au>
                  <snm>Sharan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ulitsky</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Syst Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <fpage>88</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1847944</pubid>
                  <pubid idtype="pmpid" link="fulltext">17353930</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Identifying functional modules in the physical interactome of Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Pu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vlasblom</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Emili</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wodak</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2007</pubdate>
            <volume>7</volume>
            <issue>6</issue>
            <fpage>944</fpage>
            <lpage>960</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17370254</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Graph Clustering by Flow Simulation</p>
            </title>
            <aug>
               <au>
                  <snm>van Dongen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>PhD Thesis</source>
            <publisher>University of Utrecht</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B23">
            <title>
               <p>An automated method for finding molecular complexes in large protein interaction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Bader</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Hogue</snm>
                  <fnm>CW</fnm>
               </au>
            </aug>
            <source>BMC bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">149346</pubid>
                  <pubid idtype="pmpid" link="fulltext">12525261</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Superparamagnetic clustering of data</p>
            </title>
            <aug>
               <au>
                  <snm>Blatt</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wiseman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Domany</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Physical review letters</source>
            <pubdate>1996</pubdate>
            <volume>76</volume>
            <issue>18</issue>
            <fpage>3251</fpage>
            <lpage>3254</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10060920</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Clustering by passing messages between data points</p>
            </title>
            <aug>
               <au>
                  <snm>Frey</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Dueck</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science (New York, NY)</source>
            <pubdate>2007</pubdate>
            <volume>315</volume>
            <issue>5814</issue>
            <fpage>972</fpage>
            <lpage>976</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Comment on "Clustering by passing messages between data points"</p>
            </title>
            <aug>
               <au>
                  <snm>Brusco</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Kohn</snm>
                  <fnm>HF</fnm>
               </au>
            </aug>
            <source>Science (New York, NY)</source>
            <pubdate>2008</pubdate>
            <volume>319</volume>
            <issue>5864</issue>
            <fpage>726</fpage>
            <note>author reply 726.</note>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Response to Comment on "Clustering by Passing Messages Between Data Points"</p>
            </title>
            <aug>
               <au>
                  <snm>Frey</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Dueck</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science.</source>
            <pubdate>2008</pubdate>
            <volume>319</volume>
            <issue>5864</issue>
            <fpage>726d</fpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Up-to-date catalogues of yeast protein complexes</p>
            </title>
            <aug>
               <au>
                  <snm>Pu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Cho</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wodak</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Nucleic acids research</source>
            <pubdate>2009</pubdate>
            <volume>37</volume>
            <issue>3</issue>
            <fpage>825</fpage>
            <lpage>831</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2647312</pubid>
                  <pubid idtype="pmpid" link="fulltext">19095691</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Collins</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Kemmeren</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>XC</fnm>
               </au>
               <au>
                  <snm>Greenblatt</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Spencer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Holstege</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Krogan</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Mol Cell Proteomics</source>
            <pubdate>2007</pubdate>
            <volume>6</volume>
            <issue>3</issue>
            <fpage>439</fpage>
            <lpage>450</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17200106</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>High-quality binary protein interaction map of the yeast interactome network</p>
            </title>
            <aug>
               <au>
                  <snm>Yu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Braun</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Yildirim</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Lemmens</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Venkatesan</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sahalie</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hirozane-Kishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gebreab</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Simonis</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science. </source>
            <pubdate>2008</pubdate>
            <volume>322</volume>
            <issue>5898</issue>
            <fpage>104</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18719252</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Cytoscape: a software environment for integrated models of biomolecular interaction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Markiel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ozier</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Baliga</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Ramage</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Amin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Schwikowski</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ideker</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>11</issue>
            <fpage>2498</fpage>
            <lpage>2504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403769</pubid>
                  <pubid idtype="pmpid" link="fulltext">14597658</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>GenePro: a cytoscape plug-in for advanced visualization and analysis of interaction networks</p>
            </title>
            <aug>
               <au>
                  <snm>Vlasblom</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Superina</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Orsi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wodak</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics (Oxford, England)</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>17</issue>
            <fpage>2178</fpage>
            <lpage>2179</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16921162</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>BioGRID: a general repository for interaction datasets</p>
            </title>
            <aug>
               <au>
                  <snm>Stark</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Breitkreutz</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Reguly</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Boucher</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Breitkreutz</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Tyers</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic acids research</source>
            <pubdate>2006</pubdate>
            <issue>34 Database</issue>
            <fpage>D535</fpage>
            <lpage>539</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347471</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381927</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

