<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-482</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Identifying protein complexes directly from high-throughput TAP data with Markov random fields</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Rungsarityotin</snm>
               <fnm>Wasinee</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>rungsari@molgen.mpg.de</email>
            </au>
            <au id="A2">
               <snm>Krause</snm>
               <fnm>Roland</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>rkrause@mpiib-berlin.mpg.de</email>
            </au>
            <au id="A3">
               <snm>Sch&#246;dl</snm>
               <fnm>Arno</fnm>
               <insr iid="I3"/>
               <email>aschoedl@think-cell.com</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Schliep</snm>
               <fnm>Alexander</fnm>
               <insr iid="I1"/>
               <email>schliep@molgen.mpg.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Max Planck Institute for Molecular Genetics, Department of Computational Molecular Biology, Ihnestr. 73, D-14195 Berlin, Germany</p>
            </ins>
            <ins id="I2">
               <p>Max Planck Institute for Infection Biology, Department of Cellular Microbiology, Charit&#233;platz 1, D-10117 Berlin, Germany</p>
            </ins>
            <ins id="I3">
               <p>Think-cell software, Invalidenstr. 43, D-10115 Berlin, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>482</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/482</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18093306</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-482</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>18</day>
               <month>5</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Rungsarityotin et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Recent advances in proteomic technologies allow comprehensive investigations of protein-protein interactions on a genomic scale. Interacting proteins provide detailed information on basic biomolecular mechanisms and are a valuable tool in the exploration of cellular life. Protein complexes are physical entities that are formed by stable associations of several proteins to perform a common, often complex function; in fact most of the basic cellular processes such as transcription, translation or cell cycle control are carried out by protein complexes. The goal of our work is to identify protein complexes directly from experimental results obtained from co-immunoprecipitation techniques, in particular the important tandem affinity purification approach (TAP) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. TAP employs a fusion protein carrying an affinity tag that is used to bind the protein to a matrix; subsequent washing and cleavage of the tag allows for obtaining the complexes under almost native conditions. The identification of the mixture of different proteins is usually carried out by mass spectrometry. Genome wide screens using TAP are available for the yeast <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>In prior approaches for predicting protein complexes, the experimental observations had to be condensed into a protein interaction graph. A protein-protein interaction graph is an undirected graph <it>G </it>= (<it>V</it>, <it>E</it>) where <it>V </it>is a set of nodes representing proteins and <it>E </it>is a set of edges. An edge indicates, depending on the particular model, either a physical interaction or protein complex co-membership of two proteins and may be weighted to designate interaction probability. All approaches that use an unweighted (e.g., thresholded) interaction graph as an intermediate step suffer from the problem that the uncertainty contained in the observation is no longer represented in the interaction graph, and cannot be properly accounted for when computing the clustering.</p>
         <p>Moreover, most existing techniques for predicting protein complexes rely on heuristics for further analysis of the protein interaction graph. Often several parameters have to be chosen, usually with very little guidance from theory. Instead, parameters are optimized on benchmark data sets <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp> and thus depend on the existence of such data sets for successful prediction. Other, more stringent algorithms suffer from the requirement of having an absolute measure of an interaction as input <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>In contrast to previous methods that rely on constructing an intermediate interaction graph, our model-based approach uses the experimental measurements directly, which should provide a more rigorous framework for protein-protein interaction analysis. Our probabilistic model explicitly and quantitatively states the assumptions about how protein interactions are exposed by the experimental technique. A suitable algorithm then uses this model to subsequently compute a clustering.</p>
         <p>For this work, we focus on partitioning proteins into complexes. Furthermore, any pair of proteins is assumed to either interact or not, independent of the context of other proteins in which it appears. As a consequence, clusters never overlap and each protein is assigned only to a single cluster. Several proteins are known to be part of more than one protein complex. While the problem is biologically relevant, only few proteins are <it>bona fide </it>members of many complexes <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and even more complex methods such as used by Gavin <it>et al. </it>identify largely non-overlapping solutions (cores) as basic, reliable elements <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Our work is inspired by an approach for evaluating protein-protein interaction from TAP data by Gilchrist <it>et al. </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp> that calculated maximum-likelihood estimates of false negative error rate, false positive error rate and prior probability of interaction, but which cannot compute protein complexes. Our model uses their observation model, but we also compute likely protein complexes along with maximum-likelihood estimates of error rates.</p>
         <p>There are two extreme cases in the interpretation of purification experiments. One is the minimally connected spoke model, which converts the purification results into pairwise interactions between bait and preys only. The other is the maximally connected matrix model, which assumes all proteins to be connected to all others in a given purification <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. While the real topology of the set of proteins must lie between these two extremes, most previous works focused on the spoke model of interaction <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr></abbrgrp>. From a sampling perspective, each purification given a certain bait protein and its preys can be seen as a trial to gather information on which of these proteins interact. For illustration, we use the example given in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> for a scenario involving four proteins <it>v</it>, <it>w</it>, <it>x</it>, <it>y </it>(Figure <figr fid="F1">1</figr>). Assuming the spoke model and choosing <it>v </it>as a bait protein, we can view this experiment as a trial to observe three interactions between <it>v </it>and the proteins <it>w</it>, <it>x</it>, <it>y</it>. In repeating this experiment, we would have a second trial to observe these three interactions. A third experiment, now using protein <it>w </it>as a bait, provides a third trial to observe an interaction between <it>v </it>and <it>w</it>, as well as the first trial to observe an interaction between <it>w </it>and proteins <it>x </it>or <it>y</it>. Combining these three experiments, we have three trials for observing an interaction between <it>v </it>and <it>w</it>, two trials for observing an interaction between <it>v </it>and <it>x </it>and no trials for observing an interaction between <it>x </it>and <it>y </it>(see Figure <figr fid="F1">1</figr>). We define <it>t </it>as the number of trials in which we might observe an interaction between two proteins. For example, from these three experiments and assuming the spoke model, <it>t </it>is equal to 3, 2, 1 and 0 for the protein pairs (<it>v</it>, <it>w</it>), (<it>v</it>, <it>x</it>), (<it>w</it>, <it>x</it>) and (<it>x</it>, <it>y</it>), respectively. Assuming the matrix model, <it>t </it>is equal to 3 for all protein pairs. Notice that in the matrix model the pair (<it>x</it>, <it>y</it>) is tested 3 times while in the spoke model this pair is not tested at all (<it>t </it>= 0).</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Observational model for three hypothetical trials</p>
            </caption>
            <text>
               <p><b>Observational model for three hypothetical trials</b>. Two proteins are connected by an edge if their interaction is tested by a trial. The last row shows the observation from the three trials in their (<it>t</it>, <it>s</it>) values assuming the spoke and matrix model. The spoke model counts pairwise interactions only between bait and preys. The matrix model counts all pairs of proteins in a purification. It follows that the matrix model creates more unsuccessful trials.</p>
            </text>
            <graphic file="1471-2105-8-482-1"/>
         </fig>
         <p>However, in each trial we may or may not observe an interaction. Consequently, we define <it>s </it>(for success) as the number of experiments in which we observe two proteins to interact (0 &#8804; <it>s </it>&#8804; <it>t</it>). In Figure <figr fid="F1">1</figr>, using the spoke and matrix model respectively, we illustrate how the experimental results from the three experiments can be summarized as a set of observation (<it>t</it>, <it>s</it>) values for each possible pair of proteins, which form the basis of our observation. After the transformation, an interaction probability can be calculated using a statistical model of interaction <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. In this work, we will directly use these counts to build a Markov random field (MRF) model of protein complexes and estimate the number of clusters as well as false negative and false positive rates.</p>
         <p>Markov random fields have been successfully applied as a probabilistic model in many research areas, e.g. as a model for image segmentation in image processing <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. In biological network analysis, MRF were used to model protein-protein interaction networks to predict protein functions of unknown proteins from proteins with known functions <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. They were also used to discover molecular pathways, for example by combining an MRF model of the protein-interaction graph with gene expression data <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Our model differs from these previous works in that we use MRFs to model protein complexes without an intermediate interaction graph and model the observational error directly. We incorporate the observation error into the formulation of the model and apply Mean Field Annealing to estimate the assignment of proteins to complexes.</p>
         <p>For estimating protein-protein interaction graphs, several protein-protein interaction databases are available, in particular for the yeast proteome. They mostly rely on data from the yeast two-hybrid system <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp> and the tandem affinity purification-mass spectrometry analysis of protein complexes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B15">15</abbr></abbrgrp> and individual studies that focus on particular aspects <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Creating a protein interaction network from high-throughput experiments is difficult due to high error rates. Therefore, with present techniques, the resulting networks are often not accurate <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Current approaches merge the results of different types of experiments such as two-hybrid systems, mRNA co-expression and co-immunoprecipitation such as TAP-MS. In that, much information on experimental details is lost, which we would like to exploit. We therefore focus on TAP-MS results as experimental data source, which outperforms other techniques in accuracy and coverage in yeast <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>.</p>
         <p>In the following, we introduce two computational methods previously described that predict protein complexes given pairwise protein-protein interactions, which are most comparable and relevant to our approach <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B21">21</abbr></abbrgrp>. Molecular Complex Detection (MCODE) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> detects densely connected regions in a protein-interaction graph. First it assigns a weight to each vertex computed by its local neighborhood density, a measure related to a clustering coefficient of a vertex. Then, starting from a vertex with the highest density, it recursively expands a cluster by including neighboring vertices whose vertex weights are above a given threshold. Vertices with weights lower than the threshold are not considered by MCODE. The method can retrieve overlapping complexes, but in practice many proteins are left unassigned by MCODE.</p>
         <p>Another popular approach applies the Markov Clustering algorithm (MCL) <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp> to predict protein complexes, usually after low quality interactions are removed from the data set. In the application of MCL used by Krogan <it>et al. </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, first several machine learning techniques are combined to model interaction probability from mass spectrometry results. In the next step, an intermediate interaction graph is generated by removing interactions with probability lower than a given threshold. MCL is then applied on the resulting graph to predict complexes. MCL simulates a flow on the graph by calculating powers of the transition matrix associated with the interaction graph. Its two parameters are the expansion and inflation values, the latter influencing the number of clusters. MCL produces non-overlapping clusters.</p>
         <p>Following the statistical approach to model protein interaction <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, we consider each purification experiment to be an independent set of observations of the interaction or non-interaction of proteins. We model the assignment of proteins to complexes as a Markov random field (MRF). The model incorporates the observational error as false positive and false negative error rates, which are assumed to be identical for all purifications. The cluster assignment is computed using Mean Field Annealing (MFA), which requires two input parameters, the number of clusters <it>K </it>and the log-ratio of error rates <it>&#968;</it>. We systematically estimate both the cluster assignment of proteins and the false positive and false negative error rates using maximum likelihood. We explore both spoke and matrix model and compare the solutions to other published solution of protein complexes. Data sets and the detailed description of methods can be found in the Methods section.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Performance on simulated data</p>
            </st>
            <p>To test convergence of our algorithm irrespective of the starting point, we first ran it on simulated data. We created the data from a set of <it>N </it>nodes, which we randomly assigned to <it>K </it>clusters. The number of trials <it>t </it>was the same for each pair of nodes, with the number of successes <it>s </it>reflecting the specified values of the false negative rate <it>&#957; </it>and the false positive rate <it>&#966;</it>. We ran the algorithm multiple times with different random starting points and initial values for <it>&#968;</it>. We tested the algorithm on two problem sizes: (1) a small size <it>N </it>= 500, <it>K </it>= 11 and (2) a large size <it>N </it>= 3000, <it>K </it>= 500. We set <it>&#966; </it>to be 0.005, which is similar to the MIPS data (Table <tblr tid="T1">1</tblr>) and tested two values of <it>&#957;</it>: 0.2 and 0.5 <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. We computed the average minimum cost at a given number of clusters, as shown in Figures <figr fid="F2">2(a)</figr> and <figr fid="F2">2(c)</figr>. Figures <figr fid="F2">2(b)</figr> and <figr fid="F2">2(d)</figr> depict the quality of our solution as the geometric average of sensitivity (SN) and specificity (SP).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Maximum likelihood solution for the spoke model (<it>&#968; </it>= 3.5) and the matrix model (<it>&#957; </it>= 10.0). We choose the number of clusters that maximizes the likelihood by searching over a range of values of <it>K</it>. The estimated the false negative rate is denoted by <it>&#957;</it>* and the estimated false positive rate by <it>&#966;</it>*. For comparison we show the error estimates based on the MIPS complexes, <it>&#957;</it><sub><it>MIPS </it></sub>and <it>&#966;</it><sub><it>MIPS</it></sub>, restricted to proteins with MIPS annotation. See also Table 2.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Dataset</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>
                           <it>K</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p><it>&#957;</it>*</p>
                     </c>
                     <c ca="right">
                        <p><it>&#966;</it>*</p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>&#957;</it>
                           <sub>
                              <it>MIPS</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="right">
                        <p>
                           <it>&#966;</it>
                           <sub>
                              <it>MIPS</it>
                           </sub>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin02</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Spoke model</p>
                     </c>
                     <c ca="right">
                        <p>393</p>
                     </c>
                     <c ca="right">
                        <p>0.423</p>
                     </c>
                     <c ca="right">
                        <p>1.3 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.598</p>
                     </c>
                     <c ca="right">
                        <p>6.5 &#215; 10<sup>-3</sup></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Matrix model</p>
                     </c>
                     <c ca="right">
                        <p>310</p>
                     </c>
                     <c ca="right">
                        <p>0.752</p>
                     </c>
                     <c ca="right">
                        <p>1.7 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.717</p>
                     </c>
                     <c ca="right">
                        <p>5.2 &#215; 10<sup>-3</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin06</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>Spoke model</p>
                     </c>
                     <c ca="right">
                        <p>698</p>
                     </c>
                     <c ca="right">
                        <p>0.547</p>
                     </c>
                     <c ca="right">
                        <p>2.4 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.637</p>
                     </c>
                     <c ca="right">
                        <p>8.3 &#215; 10<sup>-3</sup></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Matrix model</p>
                     </c>
                     <c ca="right">
                        <p>550</p>
                     </c>
                     <c ca="right">
                        <p>0.807</p>
                     </c>
                     <c ca="right">
                        <p>2.7 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c ca="right">
                        <p>0.901</p>
                     </c>
                     <c ca="right">
                        <p>6.4 &#215; 10<sup>-3</sup></p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>MRF on simulated data</p>
               </caption>
               <text>
                  <p><b>MRF on simulated data</b>. We tested two sets of simulated data: (1) <it>N </it>= 500, <it>K </it>= 11 and (2) <it>N </it>= 3000, <it>K </it>= 500 and the false positive rate <it>&#966; </it>is set to 0.005 and the false negative rates <it>&#957; </it>is 0.2 or 0.5. With <it>&#957; </it>= 0.2 (2(a), 2(b)), MRF can recover the true clustering with the minimum negative log-likelihood which is taken on for 11 clusters. Notice that any more clusters do not reduce the cost any further; additional clusters simply remain empty. For <it>&#957; </it>= 0.5, the accuracy is worse and needs more empty clusters to reach convergence. In 2(c) and 2(d) the convergence rate fluctuates more.</p>
               </text>
               <graphic file="1471-2105-8-482-2"/>
            </fig>
            <p>For the small problem size, Figures <figr fid="F2">2(a)</figr> and <figr fid="F2">2(b)</figr> show that the algorithm converges to the correct solution, with correct cluster assignments as well as correct estimates of the model parameters, <it>&#957; </it>and <it>&#966;</it>. With the high false negative rate of 0.5, the algorithm needs more clusters, some of which remain empty, to arrive at the correct solution. For the larger problem size of <it>K </it>= 500, we searched all <it>K </it>from 400 to 600 in steps of 20. The estimate of the error rate is approximately correct and the likelihood takes a minimum around <it>K </it>= 480 (see Figure <figr fid="F2">2(c)</figr>), but we only come close to the correct cluster assignment, with about 85% of all pairs correctly identified.</p>
            <p>Ideally, we can estimate the number of clusters <it>K </it>from the likelihood of the solution for each <it>K</it>. When increasing <it>K</it>, the likelihood of the computed solution is increasing as long as the added clusters are used for a better cluster assignment of proteins. The likelihood is going to reach its maximum if all proteins are correctly assigned. Any additional clusters will remain empty, and the likelihood will increase no further (Figure <figr fid="F2">2(a)</figr>). In reality, with large problem sizes, the solution does not converge to the optimum cluster assignment, in particular when noise is present. The flattening of the likelihood however indicates that the correct number of clusters has been reached (Figure <figr fid="F2">2(c)</figr>).</p>
         </sec>
         <sec>
            <st>
               <p>Clustering of data sets obtained in high-throughput experiments</p>
            </st>
            <p>For clustering proteins, we compute clusters for two types of observation models: the spoke model and the matrix model of protein interactions. To find a maximum likelihood solution, we first use a large number of clusters to search for a <it>&#968; </it>maximizing the likelihood. For that <it>&#968;</it>, we then run the optimization for different cluster sizes. We do three runs per cluster size to control for influences of the optimization starting point, and use the one with the highest likelihood. The maximum likelihood solutions are shown in Table <tblr tid="T1">1</tblr>. The estimated false positive rate <it>&#966;</it>* of our clustering solution is on the order of 10<sup>-3 </sup>agrees with previously published results <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Note that by our definition, the false positive rate is the fraction of interactions observed between distinct complexes of the model divided by the number of all tested interactions between distinct complexes, which are present in the observation. For example, given our cluster solution for the spoke model, there are approximately 6 million trials between distinct complexes (2760 proteins) and among them, we observe about 14100 false positives. The number of trials within complexes is much smaller, about 14000 trials in total, but only about half of them are observed, resulting in a false negative rate of approximately 0.5. Based on the experimentally observed interactions, about 70% are false positive. However, this is not the definition of the error rates used by our model.</p>
            <p>We have also calculated the error rates based on the MIPS data <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The false negative rate is very close to the one we estimated for our solution. The false positive rate is still of the same magnitude, but 2 to 5 times larger than the false positive rate computed for our solution. The decisions underlying the manually curated MIPS dataset were similarly conservative in assigning proteins to the same cluster as our algorithm. We discuss a method to distinguish reliable from less reliable clusters in our solution later. False positive rates in TAP-MS experiments are much lower than for other experimental techniques as has been reported earlier <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>The approach presented here does not rely on a benchmark set. However, to evaluate the performance of the algorithm to extract relevant information from high-throughput data sets we compared it to the results of other algorithms (MCL, MCODE) and the protein complexes accompanying publications of the data sets. We use two data sets, <it>Gavin02 </it>and <it>Gavin06 </it><abbrgrp><abbr bid="B2">2</abbr><abbr bid="B15">15</abbr></abbrgrp>, to compare the results to earlier studies. The first data set was used in previous works to benchmark the predictions <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and is basically a subset of the second. See Table <tblr tid="T2">2</tblr> and the Methods section for the description of the data sets.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Data set and results sizes. MCL and MRF consider the same number of proteins: all proteins in the experiments. However, their clustering solutions are different; MCL will produce more singletons than MRF.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="left">
                        <p>Dataset</p>
                     </c>
                     <c ca="right">
                        <p>Num. Proteins</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>MCL</p>
                     </c>
                     <c ca="right">
                        <p>MRF</p>
                     </c>
                     <c ca="right">
                        <p>MCODE</p>
                     </c>
                     <c ca="right">
                        <p>Gavin06 (all)</p>
                     </c>
                     <c ca="right">
                        <p>Gavin06 (core)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin02</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>1390</p>
                     </c>
                     <c ca="center">
                        <p>Proteins clustered</p>
                     </c>
                     <c ca="right">
                        <p>1390</p>
                     </c>
                     <c ca="right">
                        <p>1390</p>
                     </c>
                     <c ca="right">
                        <p>112</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>with MIPS</p>
                     </c>
                     <c ca="right">
                        <p>494</p>
                     </c>
                     <c ca="right">
                        <p>494</p>
                     </c>
                     <c ca="right">
                        <p>53</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>with Reguly</p>
                     </c>
                     <c ca="right">
                        <p>136</p>
                     </c>
                     <c ca="right">
                        <p>136</p>
                     </c>
                     <c ca="right">
                        <p>20</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                     <c ca="right">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin06</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>2760</p>
                     </c>
                     <c ca="center">
                        <p>Proteins clustered</p>
                     </c>
                     <c ca="right">
                        <p>2760</p>
                     </c>
                     <c ca="right">
                        <p>2760</p>
                     </c>
                     <c ca="right">
                        <p>243</p>
                     </c>
                     <c ca="right">
                        <p>1488</p>
                     </c>
                     <c ca="right">
                        <p>1147</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>with MIPS</p>
                     </c>
                     <c ca="right">
                        <p>819</p>
                     </c>
                     <c ca="right">
                        <p>819</p>
                     </c>
                     <c ca="right">
                        <p>141</p>
                     </c>
                     <c ca="right">
                        <p>633</p>
                     </c>
                     <c ca="right">
                        <p>492</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>with Reguly</p>
                     </c>
                     <c ca="right">
                        <p>520</p>
                     </c>
                     <c ca="right">
                        <p>520</p>
                     </c>
                     <c ca="right">
                        <p>120</p>
                     </c>
                     <c ca="right">
                        <p>429</p>
                     </c>
                     <c ca="right">
                        <p>336</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Because MCL and MCODE require an interaction graph as input we construct one using a spoke model for each data sets. MCL accepts both weighted and unweighted graphs as an input. For the weighted interaction graph, we compute the interaction probability using the statistical model in <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> without a threshold.</p>
            <p>To set the inflation parameter for MCL, we find that the optimal setting as published in <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> is suitable for the smaller data set (<it>Gavin02</it>), but yields a biologically implausible small number of clusters for the larger <it>Gavin06 </it>data set. Therefore, we have explored several inflation parameters from the recommended range of 1.1 to 5.0. We found the inflation parameter of 3.0 to result in a number of clusters containing more than 2 proteins, which is close to the published number of 487 complexes <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. The trade-off in sensitivity and specificity from exploring the inflation parameters is shown in Figure <figr fid="F3">3</figr>. We summarize the parameter setting for all three algorithms in Table <tblr tid="T3">3</tblr>. For comparison of the clustering algorithms, we compare the performance measures to evaluate the clustering solutions for the MIPS and Reguly data sets <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B25">25</abbr></abbrgrp>. We compare these measures for clustering and random complexes and observe good separation. For the evaluation, we do not consider singletons as valid clusters and exclude them from the distribution of cluster sizes, see Table <tblr tid="T4">4</tblr> and Table <tblr tid="T5">5</tblr>. We summarize the measurements in Table <tblr tid="T6">6</tblr> for the <it>Gavin02 </it>data set and the <it>Gavin06 </it>data set.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Parameter settings for MCL, MRF and MCODE.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Dataset</p>
                     </c>
                     <c ca="center">
                        <p>MCL</p>
                     </c>
                     <c ca="center">
                        <p>MCL with interaction prob. [9]</p>
                     </c>
                     <c ca="center">
                        <p>MRF</p>
                     </c>
                     <c ca="center">
                        <p>MCODE</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin02</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>From [24]</p>
                     </c>
                     <c ca="center">
                        <p>Inflation = 1.8</p>
                     </c>
                     <c ca="center">
                        <p><it>&#968; </it>= 3.5</p>
                     </c>
                     <c ca="center">
                        <p>From [24]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Spoke model</p>
                     </c>
                     <c ca="center">
                        <p>Inflation = 1.8</p>
                     </c>
                     <c ca="center">
                        <p><it>&#957; </it>= 0.346</p>
                     </c>
                     <c ca="center">
                        <p>Maximum likelihood</p>
                     </c>
                     <c ca="center">
                        <p>Node score percentage = 0.0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#966; </it>= 1.07 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Complex fluff = 0.2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#961; </it>= 1.88 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Depth = 100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin06</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Inflation = 3.0</p>
                     </c>
                     <c ca="center">
                        <p>Inflation = 3.0</p>
                     </c>
                     <c ca="center">
                        <p><it>&#968; </it>= 3.5</p>
                     </c>
                     <c ca="center">
                        <p>From [24]</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Spoke model</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#957; </it>= 0.407</p>
                     </c>
                     <c ca="center">
                        <p>Maximum likelihood</p>
                     </c>
                     <c ca="center">
                        <p>Node score percentage = 0.0</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#966; </it>= 1.35 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Complex fluff = 0.2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>&#961; </it>= 3.89 &#215; 10<sup>-3</sup></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Depth = 100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin06</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p><it>&#968; </it>= 10.0</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Matrix model</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Maximum likelihood</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Distribution of cluster sizes for the <it>Gavin02 </it>data</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>MCL</p>
                     </c>
                     <c ca="right">
                        <p>MCL with inter. prob.</p>
                     </c>
                     <c ca="right">
                        <p>MRF (spoke)</p>
                     </c>
                     <c ca="right">
                        <p>MRF (matrix)</p>
                     </c>
                     <c ca="right">
                        <p>MCODE</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Num. of clusters</p>
                     </c>
                     <c ca="right">
                        <p>351</p>
                     </c>
                     <c ca="right">
                        <p>352</p>
                     </c>
                     <c ca="right">
                        <p>393</p>
                     </c>
                     <c ca="right">
                        <p>310</p>
                     </c>
                     <c ca="right">
                        <p>24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Num. of singletons</p>
                     </c>
                     <c ca="right">
                        <p>177</p>
                     </c>
                     <c ca="right">
                        <p>178</p>
                     </c>
                     <c ca="right">
                        <p>226</p>
                     </c>
                     <c ca="right">
                        <p>79</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Size &#8805; 2</p>
                     </c>
                     <c ca="right">
                        <p>174</p>
                     </c>
                     <c ca="right">
                        <p>174</p>
                     </c>
                     <c ca="right">
                        <p>167</p>
                     </c>
                     <c ca="right">
                        <p>231</p>
                     </c>
                     <c ca="right">
                        <p>24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mean</p>
                     </c>
                     <c ca="right">
                        <p>6.97</p>
                     </c>
                     <c ca="right">
                        <p>6.97</p>
                     </c>
                     <c ca="right">
                        <p>6.97</p>
                     </c>
                     <c ca="right">
                        <p>5.67</p>
                     </c>
                     <c ca="right">
                        <p>4.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Median</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1st quantile</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3rd quantile</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>10</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>90%</p>
                     </c>
                     <c ca="right">
                        <p>15</p>
                     </c>
                     <c ca="right">
                        <p>14</p>
                     </c>
                     <c ca="right">
                        <p>14</p>
                     </c>
                     <c ca="right">
                        <p>13</p>
                     </c>
                     <c ca="right">
                        <p>7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>99%</p>
                     </c>
                     <c ca="right">
                        <p>42</p>
                     </c>
                     <c ca="right">
                        <p>40</p>
                     </c>
                     <c ca="right">
                        <p>34</p>
                     </c>
                     <c ca="right">
                        <p>36</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Largest cluster</p>
                     </c>
                     <c ca="right">
                        <p>51</p>
                     </c>
                     <c ca="right">
                        <p>45</p>
                     </c>
                     <c ca="right">
                        <p>36</p>
                     </c>
                     <c ca="right">
                        <p>44</p>
                     </c>
                     <c ca="right">
                        <p>11</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Distribution of cluster sizes for the <it>Gavin06 </it>data.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>MCL</p>
                     </c>
                     <c ca="right">
                        <p>MCL with inter. prob.</p>
                     </c>
                     <c ca="right">
                        <p>MRF</p>
                     </c>
                     <c ca="right">
                        <p>MRF (matrix)</p>
                     </c>
                     <c ca="right">
                        <p>Gavin06 (all)</p>
                     </c>
                     <c ca="right">
                        <p>Gavin06 (core)</p>
                     </c>
                     <c ca="right">
                        <p>MCODE</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Num. of clusters</p>
                     </c>
                     <c ca="right">
                        <p>781</p>
                     </c>
                     <c ca="right">
                        <p>732</p>
                     </c>
                     <c ca="right">
                        <p>698</p>
                     </c>
                     <c ca="right">
                        <p>550</p>
                     </c>
                     <c ca="right">
                        <p>487</p>
                     </c>
                     <c ca="right">
                        <p>477</p>
                     </c>
                     <c ca="right">
                        <p>55</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Num. singletons</p>
                     </c>
                     <c ca="right">
                        <p>331</p>
                     </c>
                     <c ca="right">
                        <p>269</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>55</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Size &#8805; 2</p>
                     </c>
                     <c ca="right">
                        <p>450</p>
                     </c>
                     <c ca="right">
                        <p>463</p>
                     </c>
                     <c ca="right">
                        <p>694</p>
                     </c>
                     <c ca="right">
                        <p>548</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                     <c ca="right">
                        <p>422</p>
                     </c>
                     <c ca="right">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mean</p>
                     </c>
                     <c ca="right">
                        <p>5.39</p>
                     </c>
                     <c ca="right">
                        <p>5.38</p>
                     </c>
                     <c ca="right">
                        <p>3.97</p>
                     </c>
                     <c ca="right">
                        <p>5.03</p>
                     </c>
                     <c ca="right">
                        <p>13.46</p>
                     </c>
                     <c ca="right">
                        <p>3.33</p>
                     </c>
                     <c ca="right">
                        <p>4.42</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Median</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>9</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>1st quantile</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>2</p>
                     </c>
                     <c ca="right">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>3rd quantile</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                     <c ca="right">
                        <p>18</p>
                     </c>
                     <c ca="right">
                        <p>4</p>
                     </c>
                     <c ca="right">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>90%</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>7</p>
                     </c>
                     <c ca="right">
                        <p>7</p>
                     </c>
                     <c ca="right">
                        <p>8</p>
                     </c>
                     <c ca="right">
                        <p>33</p>
                     </c>
                     <c ca="right">
                        <p>6</p>
                     </c>
                     <c ca="right">
                        <p>7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>99%</p>
                     </c>
                     <c ca="right">
                        <p>36</p>
                     </c>
                     <c ca="right">
                        <p>29</p>
                     </c>
                     <c ca="right">
                        <p>32</p>
                     </c>
                     <c ca="right">
                        <p>31</p>
                     </c>
                     <c ca="right">
                        <p>66</p>
                     </c>
                     <c ca="right">
                        <p>12</p>
                     </c>
                     <c ca="right">
                        <p>16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Largest cluster</p>
                     </c>
                     <c ca="right">
                        <p>561</p>
                     </c>
                     <c ca="right">
                        <p>607</p>
                     </c>
                     <c ca="right">
                        <p>65</p>
                     </c>
                     <c ca="right">
                        <p>49</p>
                     </c>
                     <c ca="right">
                        <p>96</p>
                     </c>
                     <c ca="right">
                        <p>23</p>
                     </c>
                     <c ca="right">
                        <p>16</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Clustering performance of MCODE, MCL and MRF: comparison with the MIPS annotations. We use all proteins in the experiment with annotation.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>Dataset</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>MCODE</p>
                     </c>
                     <c ca="right">
                        <p>MCL</p>
                     </c>
                     <c ca="right">
                        <p>MCL with inter. prob.</p>
                     </c>
                     <c ca="right">
                        <p>MRF (spoke)</p>
                     </c>
                     <c ca="right">
                        <p>MRF (matrix)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin02</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>CO</p>
                     </c>
                     <c ca="right">
                        <p>29.0</p>
                     </c>
                     <c ca="right">
                        <p>61.5</p>
                     </c>
                     <c ca="right">
                        <p>62.6</p>
                     </c>
                     <c ca="right">
                        <p>64.4</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>66.4</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>73.6</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>71.3</p>
                     </c>
                     <c ca="right">
                        <p>71.7</p>
                     </c>
                     <c ca="right">
                        <p>73.5</p>
                     </c>
                     <c ca="right">
                        <p>66.9</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Acc</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>46.2</p>
                     </c>
                     <c ca="right">
                        <p>66.2</p>
                     </c>
                     <c ca="right">
                        <p>67.0</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>68.8</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>66.6</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>All pairs</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SN</p>
                     </c>
                     <c ca="right">
                        <p>2.3</p>
                     </c>
                     <c ca="right">
                        <p>68.6</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>68.9</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>66.7</p>
                     </c>
                     <c ca="right">
                        <p>62.6</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>92.5</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>78.7</p>
                     </c>
                     <c ca="right">
                        <p>82.4</p>
                     </c>
                     <c ca="right">
                        <p>87.9</p>
                     </c>
                     <c ca="right">
                        <p>64.7</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Geo. average</p>
                     </c>
                     <c ca="right">
                        <p>14.7</p>
                     </c>
                     <c ca="right">
                        <p>73.0</p>
                     </c>
                     <c ca="right">
                        <p>75.4</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>76.6</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>63.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Gavin06</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>CO</p>
                     </c>
                     <c ca="right">
                        <p>33.7</p>
                     </c>
                     <c ca="right">
                        <p>64.0</p>
                     </c>
                     <c ca="right">
                        <p>65.7</p>
                     </c>
                     <c ca="right">
                        <p>66.0</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>67.7</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PPV</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>79.0</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>62.6</p>
                     </c>
                     <c ca="right">
                        <p>68.6</p>
                     </c>
                     <c ca="right">
                        <p>70.4</p>
                     </c>
                     <c ca="right">
                        <p>67.3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Acc</it>
                        </p>
                     </c>
                     <c ca="right">
                        <p>51.6</p>
                     </c>
                     <c ca="right">
                        <p>63.3</p>
                     </c>
                     <c ca="right">
                        <p>67.2</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>68.2</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>67.5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>All pairs</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SN</p>
                     </c>
                     <c ca="right">
                        <p>4.9</p>
                     </c>
                     <c ca="right">
                        <p>44.1</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>44.7</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>37.2</p>
                     </c>
                     <c ca="right">
                        <p>38.2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>79.6</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>18.0</p>
                     </c>
                     <c ca="right">
                        <p>22.5</p>
                     </c>
                     <c ca="right">
                        <p>70.0</p>
                     </c>
                     <c ca="right">
                        <p>66.1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Geo. average</p>
                     </c>
                     <c ca="right">
                        <p>19.7</p>
                     </c>
                     <c ca="right">
                        <p>28.2</p>
                     </c>
                     <c ca="right">
                        <p>31.7</p>
                     </c>
                     <c ca="right">
                        <p>
                           <b>51.0</b>
                        </p>
                     </c>
                     <c ca="right">
                        <p>50.2</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Comparison of sensitivity and specificity for all clustering solutions on Gavin06</p>
               </caption>
               <text>
                  <p><b>Comparison of sensitivity and specificity for all clustering solutions on Gavin06</b>. Only proteins with annotation from MIPS (a) and Reguly (b) are considered. The curve for MRF is generated as we filter out clusters with high observed errors. The curve for MCL is generated for different inflation parameters, [1.2, 0.2, 5.0], which are recommended by the MCL program. Highly specific solutions are the MCODE and the <it>Gavin06 </it>(core) solutions with show low sensitivity due to many proteins left unassigned. MRF maintains better sensitivity while losing only a few percent in specificity. In this respect, MRF performs better than MCL because it maintains high specificity without losing sensitivity.</p>
               </text>
               <graphic file="1471-2105-8-482-3"/>
            </fig>
            <p>For each data set, we use the set of annotated and clustered proteins for the evaluation. Note that this can lower sensitivity and complex-coverage in the results of algorithms such as MCODE that leave proteins unassigned. The results are shown in Table <tblr tid="T6">6</tblr> and the ROC curves in Figure <figr fid="F3">3</figr>. As expected, we find clustering solutions of MCODE to have low sensitivity (low complex-coverage) and high specificity because it assigns only few proteins and ignores the majority of proteins present in the experiment. We set the parameters of MCODE as described by Broh&#233;e and van Helden <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. When we changed the setting of MCODE to include more clusters and assign more proteins, we significantly lose accuracy in all measures.</p>
         </sec>
         <sec>
            <st>
               <p>Testing</p>
            </st>
            <p>To extract relevant information from our clusters, we compare the results to the MIPS and Reguly data sets. We apply two evaluation procedures: one based on a set of benchmark procedures recently introduced by Broh&#233;e and van Helden <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and the other based on the pair-wise comparisons of proteins.</p>
            <p>Comparing a clustering result with annotated complexes using the evaluation procedure of Broh&#233;e and van Helden <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> starts with building a contingency table. With <it>n </it>complexes and <it>m </it>clusters, the contingency table <it>T </it>is an <it>n &#215; m </it>matrix whose entry <it>T</it><sub><it>ij </it></sub>is the number of proteins in common to the <it>i</it>th complex and the <it>j</it>th cluster. Given a contingency table <it>T</it>, overall accuracy and separation value can be computed to measure the correspondence between clustering result and the annotated complexes <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The separation measure yields undesirable effects when the reference data set contains overlapping complexes because according to its definition <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, a good match of a cluster to more than one complex will result in a low separation value. This situation arises for the MIPS and Reguly benchmark, which are overlapping, while the computed results of MCL and MRF are not. Furthermore, when matching the reference data set to itself, we found that its separation value can be less than that of some clustering solutions. For these reasons, we do not apply the separation measure. The definitions related to benchmarks are summarized in the Methods section.</p>
            <sec>
               <st>
                  <p>Quality of clusters</p>
               </st>
               <p>In any given solution, some clusters will have more support from the observation than other clusters. Support for a cluster is high if proteins in this clusters are less likely to be part of false positive or false negative observations. So we can compute a cluster quality metric as the difference between the <it>actual </it>number of false positives and false negatives and their <it>expected </it>number, based on the number of trials involving proteins of this cluster. Let <it>Q</it><sub><it>i </it></sub>be the cluster assignment for protein <it>i</it>, <it>&#957; </it>* the estimated false negative rate and <it>&#966; </it>* the estimated false positive rate. Then the difference between actual and expected errors <it>E</it>(<it>k</it>) for each cluster <it>k </it>is</p>
               <p>
                  <display-formula>
                     <graphic file="1471-2105-8-482-i1.gif"/>
                  </display-formula>
               </p>
               <p>where <inline-formula><m:math name="1471-2105-8-482-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>E</m:mi><m:mrow><m:mi>f</m:mi><m:mi>n</m:mi></m:mrow></m:msub><m:mo stretchy="false">(</m:mo><m:mi>k</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:msup><m:mi>&#957;</m:mi><m:mo>&#8727;</m:mo></m:msup><m:mstyle displaystyle="true"><m:munder><m:mo>&#8721;</m:mo><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi><m:mo stretchy="false">)</m:mo><m:mo>:</m:mo><m:msub><m:mi>Q</m:mi><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:msub><m:mi>Q</m:mi><m:mi>j</m:mi></m:msub><m:mo>=</m:mo><m:mi>k</m:mi></m:mrow></m:munder><m:mrow><m:msub><m:mi>t</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyrau0aaSbaaSqaaiabdAgaMjabd6gaUbqabaGccqGGOaakcqWGRbWAcqGGPaqkcqGH9aqpiiGacqWF9oGBdaahaaWcbeqaaiabgEHiQaaakmaaqafabaGaemiDaq3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaabaGaeiikaGIaemyAaKMaeiilaWIaemOAaOMaeiykaKIaeiOoaOJaemyuae1aaSbaaWqaaiabdMgaPbqabaWccqGH9aqpcqWGrbqudaWgaaadbaGaemOAaOgabeaaliabg2da9iabdUgaRbqab0GaeyyeIuoaaaa@4C6E@</m:annotation></m:semantics></m:math></inline-formula> and <inline-formula><m:math name="1471-2105-8-482-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:msub><m:mi>E</m:mi><m:mrow><m:mi>f</m:mi><m:mi>p</m:mi></m:mrow></m:msub><m:mo stretchy="false">(</m:mo><m:mi>k</m:mi><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:msup><m:mi>&#981;</m:mi><m:mo>&#8727;</m:mo></m:msup><m:mstyle displaystyle="true"><m:munder><m:mo>&#8721;</m:mo><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>i</m:mi><m:mo>,</m:mo><m:mi>j</m:mi><m:mo stretchy="false">)</m:mo><m:mo>:</m:mo><m:msub><m:mi>Q</m:mi><m:mi>i</m:mi></m:msub><m:mo>&#8800;</m:mo><m:msub><m:mi>Q</m:mi><m:mi>j</m:mi></m:msub><m:mo>=</m:mo><m:mi>k</m:mi></m:mrow></m:munder><m:mrow><m:msub><m:mi>t</m:mi><m:mrow><m:mi>i</m:mi><m:mi>j</m:mi></m:mrow></m:msub></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyrau0aaSbaaSqaaiabdAgaMjabdchaWbqabaGccqGGOaakcqWGRbWAcqGGPaqkcqGH9aqpiiGacqWFvpGzdaahaaWcbeqaaiabgEHiQaaakmaaqafabaGaemiDaq3aaSbaaSqaaiabdMgaPjabdQgaQbqabaaabaGaeiikaGIaemyAaKMaeiilaWIaemOAaOMaeiykaKIaeiOoaOJaemyuae1aaSbaaWqaaiabdMgaPbqabaWccqGHGjsUcqWGrbqudaWgaaadbaGaemOAaOgabeaaliabg2da9iabdUgaRbqab0GaeyyeIuoaaaa@4D43@</m:annotation></m:semantics></m:math></inline-formula>.</p>
               <p>Figure <figr fid="F4">4</figr> shows the distribution of <it>E</it>(<it>k</it>) for the spoke and matrix models. The score is positive for some clusters and negative for others, with the mode around zero. So rather than giving an absolute measure of quality for the whole solution, the measure indicates, within a given solution, clusters with high confidence and those with low confidence. Figure <figr fid="F4">4</figr> shows that there is no correlation between the score <it>E</it>(<it>k</it>) and cluster sizes. They also show that we have discovered quite reliable observations for some large clusters. MRF has also identified some outliers with extremely high error score; they consist of abundant proteins that are found unspecifically with many purifications, typically more than 50.</p>
               <fig id="F4">
                  <title>
                     <p>Figure 4</p>
                  </title>
                  <caption>
                     <p>Cluster quality</p>
                  </caption>
                  <text>
                     <p><b>Cluster quality</b>. The distribution of the quality of clusters as predicted by MRF. Note that negative values for the quality score indicate that a cluster is observed better supported by the data than expected. The zero-line indicates that the observed error corresponds to the expectation. (a) shows that most predicted clusters fit the model for the spoke model. Outliers with high error are points on the top of the figure; the clusters contain largely artifacts. (b): Also for the matrix model, MFA is robust against the high false negative rate. For the list of clusters, refer to the supplementary material.</p>
                  </text>
                  <graphic file="1471-2105-8-482-4"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Complex-size distribution</p>
               </st>
               <p>Principle properties and potential artifacts are visible in a simple plot of the population of proteins by cluster size (see Figures <figr fid="F5">5</figr> and <figr fid="F6">6</figr>). In Figure <figr fid="F5">5</figr>, we only consider proteins with MIPS complexes assigned from the <it>Gavin06 </it>data set, ignoring singletons; this results in 819 proteins. For each clustering solution, we compute the cluster size distribution of MIPS proteins which have cluster assignments. It is worth to note that there is an absence of MIPS complexes in the range from 20 to 30. Obviously, the proteins in the largest complex of size 60 all correspond to a single complex (the ribosome), whereas the 60 proteins in clusters of size 12 correspond to 5 different clusters. In Figure <figr fid="F6">6</figr>, when considering all proteins, all clustering solutions substantially deviate from the MIPS size distribution. MCL has a large cluster containing 607 proteins, likely an artifact. The Gavin core set is only a subset and contains a substantial number of small elements and fewer complexes than the MIPS solution, prominently the mitochondrial ribosome and mediator complex. The larger, complete solution (Gavin06 (all)) contains few small clusters; although this solution contains larger clusters (size &#8804; 50), they do not accurately map to larger complexes. In Figure <figr fid="F6">6</figr>, our MRF solution for the spoke model contains more clusters of size 2 than the matrix model, but otherwise both have similar size distribution with more small clusters than large ones.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Cluster sizes by MIPS proteins on the Gavin06 set</p>
                  </caption>
                  <text>
                     <p><b>Cluster sizes by MIPS proteins on the Gavin06 set</b>. The x-axis shows cluster sizes in log-scale. The y-axis shows the number of proteins in a cluster of certain size by proteins found in MIPS complexes. Note, that singletons and proteins not contained in the MIPS set are not considered. Each column also shows the total number of proteins. Cluster sizes are taken from either the primary data source &#8211; MIPS(a), Gavin06 (core) (e) and Gavin06 (all) (f) &#8211; or solutions obtained on the <it>Gavin06 </it>data set &#8211; MCL(b), MRF (spoke)(c), MRF (matrix) (d).</p>
                  </text>
                  <graphic file="1471-2105-8-482-5"/>
               </fig>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Cluster sizes by all proteins of the Gavin06 set</p>
                  </caption>
                  <text>
                     <p><b>Cluster sizes by all proteins of the Gavin06 set</b>. When considering all proteins, not only those contained in the MIPS set, all clustering solutions deviate substantially from the MIPS set's size distribution. A cluster from MCL with 607 proteins is a giant component which merges smaller MIPS complexes with many other proteins.</p>
                  </text>
                  <graphic file="1471-2105-8-482-6"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Cluster visualization</p>
            </st>
            <p>For each clustering solution, we can visualize matches to the MIPS complexes by generating a contingency table whose rows are complexes and columns are clusters. For each cell in the table, we calculate the Simpson coefficient <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and order the diagonal of the table by increasing matching sizes. Clusters without any matches to annotated complexes are not part of the table, neither are complexes without a match to any cluster. In Figure <figr fid="F7">7</figr>, we summarize the mapping of MRF (spoke model), MCL and the core Gavin06 solutions. For more visualization of other clustering solutions and mapping to the Reguly benchmark, refer to the supplementary material. We also visualize how well each solution maps to the complex-size distribution. For each clustering solution, we plot the histogram of cluster size distribution on the log-scale.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Mapping to the MIPS complexes</p>
               </caption>
               <text>
                  <p><b>Mapping to the MIPS complexes</b>. Visualization of the best mapping to the MIPS complexes on the <it>Gavin06 </it>data set. Figures are contingency tables where each cell is the Simpson coefficient with values from [0, 1]. We show three solutions from MRF (spoke) (a), MCL (b) and the core set of <it>Gavin06 </it>(c). Rows are MIPS complexes and columns are clusters obtained using the respective algorithm. The order of complexes and clusters differ between figures. Complexes without mapping to any cluster are not part of a table and likewise for clusters without mapping to any complex. Each figure has a different range of the x-axis and y-axis, because each solution has a different number of clusters mapping to a different subset of the MIPS complexes. The <it>Gavin06 </it>(core) solution maps to fewer complexes because it assigns fewer proteins.</p>
               </text>
               <graphic file="1471-2105-8-482-7"/>
            </fig>
            <p>Figure <figr fid="F5">5</figr> shows cluster sizes by proteins found in MIPS complexes, while Figure <figr fid="F6">6</figr> uses all proteins. Note the largest cluster in the MCL solution, which contains a very diverse range of proteins, is likely to be an artifact.</p>
         </sec>
         <sec>
            <st>
               <p>Examples</p>
            </st>
            <p>A positive evaluation of a clustering procedure by internal and by clustering indices does not necessarily mean that the results are useful and match a user's expectation. Above, we compared the results on a large scale, here we inspect the solutions in detail. When selecting biological examples, the MRF solution under the spoke model seems to produce better results for smaller complexes. Note for example the underrepresentation of size 2 complexes under the matrix model in Figure <figr fid="F5">5</figr>. (Table <tblr tid="T5">5</tblr>). The high false negative rate of the matrix model could imply that it is less capable than the spoke model. Nonetheless, it recovers meaningful clusters, showing that it is robust against such high error rate as can be seen in the benchmark in Figure <figr fid="F3">3</figr>.</p>
            <p>Our MRF solution for the spoke model contains two largest complexes of size > 60 of presumably high quality. Contradicting the observation that the spoke model appears to produce better results for larger complexes (Figure <figr fid="F4">4</figr>), manual inspection suggests that these structures are not similar to complexes like the ribosome or the proteasome but a rather spurious collection of proteins that interact. The two largest clusters in the spoke model do not constitute known complexes and highlight a peculiar property of the TAP-MS data set. Apparently, high-quality results are sporadically obtained for rather well characterized proteins that seem to link very different pathways and cellular localizations. Although the two clusters have high quality with respect to the model, we find that there are not enough repetitions for these proteins and in practice we must interpret their interactions as of medium confidence only.</p>
            <p>There is no general agreement in the field how a protein complex should be defined biochemically. Many factors &#8211; binding constants, protein concentration and localization, different purification protocols &#8211; lead to different associations of proteins into aggregates that we consider complexes. Moreover, paralogous proteins that lead to variant complexes complicate the distinction of similar complexes. Disagreeing solutions for protein complexes offered by different methods do not necessarily indicate that either solution is wrong. For some complexes, all methods compared in this context lead to the identical solutions, such as the Arp2/3 complex (MRF259) or the origin of replication complex (MRF567). These complexes that are similarly found by all methods generally receive good (negative) quality scores <it>E</it>(<it>k</it>) in our model, indicating that all methods work for simple cases.</p>
            <p>For larger complexes that can be well studied such as the Proteasome, the results appear fairly consistent across the different solutions. The best MCL solution according to our benchmark only splits Pre8, an element of the 20S subunit, into a separate complex and assigns several components of the 19S subunit to a giant cluster together with many unrelated proteins. The MRF solution appears slightly superior to the predicted cores in Gavin <it>et al. </it><abbrgrp><abbr bid="B2">2</abbr></abbrgrp> in that no components of the 19S subunit are assigned to other elements. A complex that appears not represented well in our set are the RNA Polymerases, three complexes of 12&#8211;14 proteins that share 5 proteins. An ideal solution would either place all elements of the complexes into one or into three clusters. While the Gavin solution neatly separates the complexes, the MRF solution only places several elements of the RNA Polymerases into clusters of low quality. The high quality cluster containing most elements of RNAPII, the best characterized complex of the three by experimental data, is "contaminated" with specific members of the other two complexes. The MCL solution displays similar problems.</p>
            <p>One solution to the clustering that we find superior in our results is complex 239 from the spoke model, consisting of Sol2, Ade16, Ade17, Ste23, Sol1, Rtt101, and Yol063c. Sol1 and Sol2 are not part of the same complex in the Gavin set of complexes or the MCL solution, and do not interact observably, but are homologues. The isoenzymes Ade16 and Ade17 are not part of the same complex in the solution in Gavin <it>et al. </it><abbrgrp><abbr bid="B2">2</abbr></abbrgrp> but can be assumed to have the same binding partners.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Before we discuss the details of the results, we would like to point out that MRF is essentially a parameter-free method. Although MFA requires two inputs, <it>&#968; </it>and the number of clusters <it>K</it>, we provide a systematic way to estimate them using maximum likelihood. Methods such as MCODE require more parameters without a systematic way to select them other than trying out several values and comparing the results to benchmark data. If there is no such data set available, these methods cannot asses the quality of their solution, while the value of the likelihood function can be used for our MRF approach. MCL suffers from the same problem of parameter selection and essentially has three parameters, the expansion and inflation values and the number of clusters. So to choose a solution from MCL we must not only compare with the benchmark, but also decide if the number of clusters is biologically plausible. With regard to the number of predicted clusters, it is not surprising that MRF estimates higher number of clusters because it does not eliminate proteins prior to clustering, unlike other solutions <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
         <p>Although we recommend the spoke model over the matrix model due to lower false negative rate, it is noteworthy that the solution of the matrix model is also biological meaningful when compared to the MIPS data set, although with slightly lower specificity than the spoke model (on the <it>Gavin06 </it>data set comparing to the MIPS data set). In reality, the model of interaction likely lies in between the two extremes. With regard to the quality of the clusters, we observe that almost all predicted clusters fit the model except some outliers that should not be regarded as complexes due to extremely high observed errors (shown as data points on the top of Figure <figr fid="F4">4(a)</figr> and <figr fid="F4">4(b)</figr>). Closer inspection reveals that they are clusters consisting mostly of proteins that are systematic contaminants; one would not assign them to any complex manually. By giving these "junk" clusters the worst quality score, MRF can separate them from the rest of other complexes. For MCL, there is no such indicator.</p>
         <p>The performance of MCL and MRF on the <it>Gavin02 </it>data set is comparable as both achieve high accuracy. This is the result of the lower level of noise in the <it>Gavin02 </it>data, which was filtered for abundant proteins. Error modeling does not necessarily yield more accuracy. Note also the similar distribution of cluster sizes (see Table <tblr tid="T4">4</tblr>).</p>
         <p>The performance gain from error modeling is more noticeable in the larger <it>Gavin06 </it>data set which is not filtered and likely contains more errors. The accuracy <it>Acc </it>is the average of the agreement of a cluster to a complex. It penalizes complexes that are split more than complexes that are merged. To see if complexes are merged, we have to consider at the all pairs comparison for high sensitivity with low specificity. Due to complexes merged in a giant component, MCL performs quite well on <it>Gavin06 </it>measured by the accuracy value, but not when we consider the all-pairs sensitivity (SN) and specificity (SP) and comparing to the MIPS data set. To avoid the giant component, the inflation parameter of MCL must be set to the maximum level recommended (inflation = 5.0) which reduces sensitivity (Figure <figr fid="F3">3</figr>). MRF in contrast can maintain high specificity without sacrificing sensitivity nor does it produce giant components. When comparing to highly specific solutions such as MCODE or <it>Gavin06 </it>(core) which assign fewer proteins, MRF loses only a few percent (less than 10%) in specificity, but gains about 30% in sensitivity and while clustering more proteins (Table <tblr tid="T6">6</tblr>).</p>
         <p>In general, both MCL and MRF perform better when compared to the MIPS benchmark than to the Reguly data, with MRF performing better than MCL at matching both benchmark sets on the <it>Gavin06 </it>data set. Many complexes in the Reguly set are redundant and overlap, some even completely which no method possibly could recover from data. Hence, MCL and MRF will never be able to fully reconstruct the Reguly data set as they assume no overlap between protein complexes. On the MIPS complexes and based on all-pair comparison, MRF outperforms MCL. This indicates that in general the assumption of complex formation based on only pairwise interaction is a reasonable one producing few false positive errors. We can observe the giant component of the MCL solution in Figure Z(b) as the first column including several complexes. A perfect mapping would be displayed as a diagonal line with no off-diagonal entries. The results show that no solution provides the best mapping. Although the core solution of Gavin06 appears to have the cleanest mapping with few off-diagonal entries, it only contains 1147 proteins, while our solution includes all 2760 proteins. When comparing all solutions to the MIPS-size distribution in Figure F, we clearly see that MCL is particularly far off due to the giant component which assigns about 140 proteins from different MIPS complexes into the same cluster. The solution from MRF appears to be the closest match in this regard, although it still cannot reconstruct MIPS-complexes larger than 30. Other solutions also have the same problem; the Gavin06 (core) solution only maps to small complexes (size &#8804; 20). MRF replaces large complexes by producing more smaller clusters than MIPS (size &#8804; 10).</p>
         <p>In summary, if the data has already been filtered as in the <it>Gavin02 </it>data set, MRF does not have an advantage over MCL and is computationally more expensive. When clustering large and noisy data set, the evaluation demonstrates that MRF is a more suitable method, due to its rigorous framework allowing parameter selection using maximum likelihood.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We introduce a probabilistic model based on Markov random fields to identify protein complexes from data produced by large-scale purification experiments using tandem affinity purification and mass spectrometric identification. Unlike previous work, our model incorporates observational errors, which enables us to directly use the experimental data without requiring an intermediate interaction graph and without prior elimination of proteins from the sets. The assignment to clusters corresponding to protein complexes are computed with the Mean Field Annealing algorithm. Because there are proteins which cannot be well clustered, we also provide a model-based quality score for each predicted complex. Our method does not rely on heuristics, which is particular important for applications on protein complex studies in organisms that do not have an established reference frame. The model has two parameters, which are estimated from the experimental data using maximum likelihood, providing an elegant solution to the problem. Our results compare favorably on reference data sets, notably for the larger unfiltered data sets.</p>
         <p>For future work, the hard assignments imposed by our model can be relaxed to capture overlapping complexes, but the model and minimization algorithm must be changed.</p>
         <p>It would also be useful to have a quantitative estimate of the number of clusters <it>K</it>. One would need to trade off the increase in likelihood against the increase in the number of clusters, in effect finding the smallest number of clusters with almost maximal likelihood. One approach would be the minimum description length (MDL) criterion <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, a rigorous technique to assign costs to both observation likelihood as well as the number of clusters.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data sets</p>
            </st>
            <sec>
               <st>
                  <p>Experimental data sets</p>
               </st>
               <p>We focused on the data published by Gavin <it>et al. </it>in 2002 (<it>Gavin02</it>) and 2006 (<it>Gavin06</it>) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B15">15</abbr></abbrgrp>, which was found to be of high quality <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. The experimental data sets were downloaded and parsed from the respective supplementary information that accompanied the original publication. We found further data sets <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp> less suitable for benchmarking because the baits used in these studies were chosen to address specific questions. Hence they do not constitute representative samples. Another recent large scale screen in yeast did not publish the individual, repeated purifications, making it impossible to estimate the error model used here <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Protein complex annotation</p>
               </st>
               <p>&#8226; <b>MIPS</b>: The MIPS data set <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> is a standard data set for benchmarking methods for protein complex prediction. Note that it was largely created before high throughput data sets were published.</p>
               <p>&#8226; <b>Reguly</b>: A manually curated dataset of protein-protein interactions encompasses protein complexes taken from the literature <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. It is less selective than the MIPS benchmark, and has several complexes that overlap significantly due to differences between individual description of complexes.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>A model of protein complexes using Markov random fields</p>
            </st>
            <p>We assume that clusters do not overlap and each protein <it>i </it>belongs to exactly one cluster <it>Q</it><sub><it>i </it></sub>&#8712; {1, ..., <it>K</it>}, where <it>K </it>is the number of clusters. We expect proteins in the same cluster to interact, and proteins belonging to different clusters not to interact. Our observation contains errors, with a false negative error rate <it>&#957; </it>that proteins of the same cluster are not observed to interact, and a false positive error rate <it>&#966;</it>, that proteins belonging to different clusters are observed to interact. These error rates are assumed to be the same for all interactions. We estimate them while computing the cluster assignments of proteins.</p>
            <p>Define <it>S</it><sub><it>ij </it></sub>to be the event that proteins <it>i </it>and <it>j </it>are observed to interact, and, likewise, <it>F</it><sub><it>ij </it></sub>the event that they are not observed to interact. The probabilities of these two events, given <it>&#957;</it>, <it>&#966; </it>and <it>Q</it>, are</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-482-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:msub>
                              <m:mi>S</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:mi>&#957;</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#981;</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>Q</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable columnalign="right">
                                    <m:mtr columnalign="right">
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#957;</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>:</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>=</m:mo>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="right">
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:mi>&#981;</m:mi>
                                             <m:mo>:</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8800;</m:mo>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo>,</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacbeGae8huaaLaei4waSLaem4uam1aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGG8baFiiGacqGF9oGBcqGGSaalcqGFvpGzcqGGSaalcqWGrbqucqGGDbqxcqGH9aqpdaGabaqaauaabiqaciaaaeaacqGGOaakcqaIXaqmcqGHsislcqGF9oGBcqGGPaqkcqGG6aGoaeaacqWGrbqudaWgaaWcbaGaemyAaKgabeaakiabg2da9iabdgfarnaaBaaaleaacqWGQbGAaeqaaaGcbaGae4x1dyMaeiOoaOdabaGaemyuae1aaSbaaSqaaiabdMgaPbqabaGccqGHGjsUcqWGrbqudaWgaaWcbaGaemOAaOgabeaakiabcYcaSaaaaiaawUhaaaaa@55A5@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>and</p>
            <p>
               <display-formula>
                  <m:math name="1471-2105-8-482-i5" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">[</m:mo>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mrow>
                                 <m:mi>i</m:mi>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                           </m:msub>
                           <m:mo>|</m:mo>
                           <m:mi>&#957;</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#981;</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>Q</m:mi>
                           <m:mo stretchy="false">]</m:mo>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable columnalign="right">
                                    <m:mtr columnalign="right">
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:mi>&#957;</m:mi>
                                             <m:mo>:</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>=</m:mo>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="right">
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:mo stretchy="false">(</m:mo>
                                             <m:mn>1</m:mn>
                                             <m:mo>&#8722;</m:mo>
                                             <m:mi>&#981;</m:mi>
                                             <m:mo stretchy="false">)</m:mo>
                                             <m:mo>:</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="right">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>&#8800;</m:mo>
                                             <m:msub>
                                                <m:mi>Q</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo>.</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaacbeGae8huaaLaei4waSLaemOray0aaSbaaSqaaiabdMgaPjabdQgaQbqabaGccqGG8baFiiGacqGF9oGBcqGGSaalcqGFvpGzcqGGSaalcqWGrbqucqGGDbqxcqGH9aqpdaGabaqaauaabiqaciaaaeaacqGF9oGBcqGG6aGoaeaacqWGrbqudaWgaaWcbaGaemyAaKgabeaakiabg2da9iabdgfarnaaBaaaleaacqWGQbGAaeqaaaGcbaGaeiikaGIaeGymaeJaeyOeI0Iae4x1dyMaeiykaKIaeiOoaOdabaGaemyuae1aaSbaaSqaaiabdMgaPbqabaGccqGHGjsUcqWGrbqudaWgaaWcbaGaemOAaOgabeaakiabc6caUaaaaiaawUhaaaaa@558F@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>A single purification experiment generates a set of such observations. Over the course of multiple purification experiments, each pair of proteins may be observed multiple times. We define <it>t</it><sub><it>ij </it></sub>to be the total number of observations made for the protein pair (<it>i</it>, <it>j</it>), and <it>s</it><sub><it>ij </it></sub>to be the number of these observations where an interaction was observed.</p>
            <p>Then, given <it>&#957;</it>, <it>&#966; </it>and a configuration <it>Q</it>, the likelihood of observing a particular sequence of experimental outcomes (<it>t</it><sub><it>ij</it></sub>, <it>s</it><sub><it>ij</it></sub>) for all pairs (<it>i</it>, <it>j</it>) is</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-8-482-i6" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mi>P</m:mi>
                                       <m:mo stretchy="false">[</m:mo>
                                       <m:mo>{</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>t</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo>,</m:mo>
                                       <m:msub>
                                          <m:mi>s</m:mi>
                                          <m:mrow>
                                             <m:mi>i</m:mi>
                                             <m:mi>j</m:mi>
                                          </m:mrow>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>}</m:mo>
                                       <m:mo>|</m:mo>
                                       <m:mi>&#957;</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>&#981;</m:mi>
                                       <m:mo>,</m:mo>
                                       <m:mi>Q</m:mi>
                                       <m:mo stretchy="false">]</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mo>=</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munder>
                                             <m:mo>&#8719;</m:mo>
                                             <m:mrow>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>j</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                             </m:mrow>
                                          </m:munder>
                                          <m:mrow>
                                             <m:mi>P</m:mi>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">[</m:mo>
                                                   <m:msub>
                                                      <m:mi>S</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                   <m:mo>|</m:mo>
                                                   <m:mi>&#957;</m:mi>
                                                   <m:mo>,</m:mo>
                                                   <m:mi>&#981;</m:mi>
                                                   <m:mo stretchy="false">]</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>s</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:msup>
                                             <m:mi>P</m:mi>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">[</m:mo>
                                                   <m:msub>
                                                      <m:mi>F</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                   <m:mo>|</m:mo>
                                                   <m:mi>&#957;</m:mi>
                                                   <m:mo>,</m:mo>
                                                   <m:mi>&#981;</m:mi>
                                                   <m:mo stretchy="false">]</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>t</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:msub>
                                                      <m:mi>s</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:msup>
                                          </m:mrow>
                                       </m:mstyle>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow/>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:mo>=</m:mo>
                                       <m:mstyle displaystyle="true">
                                          <m:munder>
                                             <m:mo>&#8719;</m:mo>
                                             <m:mrow>
                                                <m:mo stretchy="false">(</m:mo>
                                                <m:mi>i</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:mi>j</m:mi>
                                                <m:mo stretchy="false">)</m:mo>
                                                <m:mo>:</m:mo>
                                                <m:msub>
                                                   <m:mi>Q</m:mi>
                                                   <m:mi>i</m:mi>
                                                </m:msub>
                                                <m:mo>=</m:mo>
                                                <m:msub>
                                                   <m:mi>Q</m:mi>
                                                   <m:mi>j</m:mi>
                                                </m:msub>
                                             </m:mrow>
                                          </m:munder>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:mn>1</m:mn>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mi>&#957;</m:mi>
                                                   <m:mo stretchy="false">)</m:mo>
                                                </m:mrow>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>s</m:mi>
                                                      <m:mrow>
                                                         <m:mi>i</m:mi>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:msup>
                                             <m:msup>
                                                <m:mi>&#957;</m:mi>
                                                <m:mrow>
                                                   <m:mo stretchy="false">(</m:mo>
                                                   <m:msub>
                                                      <m:mi>t</m:mi>
                           