<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-119</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Are scale-free networks robust to measurement errors?</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Lin</snm>
               <fnm>Nan</fnm>
               <insr iid="I1"/>
               <email>nlin@math.wustl.edu</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Zhao</snm>
               <fnm>Hongyu</fnm>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>hongyu.zhao@yale.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Mathematics, Washington University in St. Louis, St. Louis, MO 63143, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Genetics, Yale University, New Haven, CT 06520, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>119</fpage>
         <url>http://www.biomedcentral.com/1471-2105/6/119</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15904487</pubid>
               <pubid idtype="doi">10.1186/1471-2105-6-119</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>31</day>
               <month>10</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>5</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>5</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Lin and Zhao; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Many complex random networks have been found to be scale-free. Existing literature on scale-free networks has rarely considered potential false positive and false negative links in the observed networks, especially in biological networks inferred from high-throughput experiments. Therefore, it is important to study the impact of these measurement errors on the topology of the observed networks.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>This article addresses the impact of erroneous links on network topological inference and explores possible error mechanisms for scale-free networks with an emphasis on <it>Saccharomyces cerevisiae </it>protein interaction networks. We study this issue by both theoretical derivations and simulations. We show that the ignorance of erroneous links in network analysis may lead to biased estimates of the scale parameter and recommend robust estimators in such scenarios. Possible error mechanisms of yeast protein interaction networks are explored by comparisons between real data and simulated data.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our studies show that, in the presence of erroneous links, the connectivity distribution of scale-free networks is still scale-free for the middle range connectivities, but can be greatly distorted for low and high connecitivities. It is more appropriate to use robust estimators such as the least trimmed mean squares estimator to estimate the scale parameter <it>&#947; </it>under such circumstances. Moreover, we show by simulation studies that the scale-free property is robust to some error mechanisms but untenable to others. The simulation results also suggest that different error mechanisms may be operating in the yeast protein interaction networks produced from different data sources. In the MIPS gold standard protein interaction data, there appears to be a high rate of false negative links, and the false negative and false positive rates are more or less constant across proteins with different connectivities. However, the error mechanism of yeast two-hybrid data may be very different, where the overall false negative rate is low and the false negative rates tend to be higher for links involving proteins with more interacting partners.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Recent studies have found that many complex networks, ranging from the World-Wide Web <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and the scientific collaboration network <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> to biological systems such as the yeast protein interaction network <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, are scale-free. The scale-free property states that the distribution of the connectivity <it>k </it>(number of links per node) in a network can be described by the power law, i.e.,</p>
         <p><it>P</it>(<it>k</it>) = <it>ck</it><sup>-<it>&#947;</it></sup>, <it>c </it>> 0, <it>&#947; </it>> 0. &#160;&#160;&#160; (1)</p>
         <p>A visual diagnosis of the scale-free behavior can be made through the log-log plot of the connectivity distribution, in which a straight line with slope -<it>&#947; </it>is expected. In scale-free networks, the nodes are not randomly or evenly connected with some highly connected nodes ("hubs"). The ratio of the number of "hubs" to that of nodes in the rest of the network remains constant as the network changes in size. One attractive feature is that scale-free networks are more resistant to random failures compared with random networks due to the existence of a few highly connected "hubs" <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Remarkably, it has been observed that the scale parameter <it>&#947; </it>varied only in the narrow range of 2.1 &#8211; 4 in the aforementioned real-world networks. All existing studies on scale-free networks assumed that the observed links represented the underlying structure of the network, but paid little attention to the fact that the observed links often involved errors, namely, false positives and false negatives. For example, Jeong <it>et al</it>. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> considered the <it>Saccharomyces cerevisiae </it>protein interaction network inferred from yeast two-hybrid (Y2H) experiments. It is well-known that the Y2H system has many false positives as well as false negatives <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. A natural question to ask is whether a scale-free network is still observed as scale-free in the presence of errors. And if it is, what are the possible underlying error mechanisms and how variable is the observed scale parameter <it>&#947;</it>? Answering these questions may lead to further insight to the scale-free property, better understanding and correct usage of the observed network data. For convenience, we will call networks observed with erroneous links as perturbed networks in the rest of this article.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>In this article, we address the above questions by both theoretical derivations and simulation studies using the yeast protein interaction network as a prototype. However, the results apply to general scale-free networks.</p>
         <sec>
            <st>
               <p>Connectivity distribution of scale-free networks with erroneous links under a simple model</p>
            </st>
            <p>We first study how the connectivity distribution of a scale-free network is affected when errors are present. Following previous studies on the reliability of protein interaction networks <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, we assume a simple error mechanism in which the false positive rate (<it>r</it><sub><it>FP</it></sub>) and false negative rate (<it>r</it><sub><it>FN</it></sub>) are the same for all node pairs, and false positives and false negatives are independently generated. The false positive rate and false negative rate of a node pair refer to the probability that the pair of nodes is observed as linked when they are actually not and the probability that the pair of nodes is observed as unlinked when they are actually linked. Under this assumption, every truly linked pair of nodes has a probability <it>r</it><sub><it>FN </it></sub>to be observed as unlinked nodes, and every truly unlinked pair of nodes has a probability <it>r</it><sub><it>FP </it></sub>to be observed as linked nodes.</p>
            <p>The above assumption is similar to the grand canonical ensembles of random networks in Chapter 4 of Dorogovtsev and Mendes <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, in which networks evolve by removing existing edges and adding new edges with certain probabilities. We can also view the perturbed network as obtained by removing edges (false negative) and adding edges (false positive) from the underlying network. The probability of adding an edge between two non-linked nodes is the false positive rate <it>r</it><sub><it>FP</it></sub>, and the probability of removing the edge between two linked nodes is the false negative rate <it>r</it><sub><it>FN</it></sub>. However, while Dorogovtsev and Mendes mostly discussed the connectivity distribution of equilibrium networks (networks obtained after infinite times edge adding and removing), we focus on the connectivity distribution of the observed network that are obtained by considering removing every existing edge and adding non-existing edges just once.</p>
            <sec>
               <st>
                  <p>Connectivity distribution of the perturbed network</p>
               </st>
               <p>In the following, we will derive the distribution of the observed connectivities for a scale-free network of size <it>n </it>for given values of <it>r</it><sub><it>FP </it></sub>and <it>r</it><sub><it>FN</it></sub>. Let <it>N</it><sub><it>P </it></sub>and <it>N</it><sub><it>T </it></sub>denote the observed and true connectivity of a node, respectively. Then the probability to observe a node with <it>k </it>links is</p>
               <p>
                  <graphic file="1471-2105-6-119-i1.gif"/>
               </p>
               <p>The minimum and maximum connectivity of a node, <it>T</it><sub><it>min </it></sub>and <it>T</it><sub><it>max</it></sub>, are assumed to be the same for all the nodes in the network, and their values depend on the specific network. In general, we set <it>T</it><sub><it>min </it></sub>= 0 and <it>T</it><sub><it>max </it></sub>= <it>n </it>- 1 when expert knowledge is not available, where <it>n </it>denotes the size of the network, i.e., the total number of nodes in the network. The following elucidates how to calculate (2) analytically. Let <it>N</it><sub><it>FP</it></sub>, <it>N</it><sub><it>TP</it></sub>, <it>N</it><sub><it>FN</it></sub>, <it>N</it><sub><it>TN</it></sub>, and <it>N</it><sub><it>N </it></sub>be the numbers of false positive links (observed as linked but actually not), true positive links (observed as linked and actually linked), false negative links (observed as unlinked but actually linked), true negative (observed as unlinked and actually unlinked) and negative links (actually unlinked) associated with the node, respectively. Since the observed links of a node consist of both false positive and true positive ones, and the true links consist of true positive and false negative ones, we have <it>N</it><sub><it>P </it></sub>= <it>N</it><sub><it>FP </it></sub>+ <it>N</it><sub><it>TP</it></sub>, <it>N</it><sub><it>T </it></sub>= <it>N</it><sub><it>FN </it></sub>+ <it>N</it><sub><it>TP</it></sub>, <it>N</it><sub><it>N </it></sub>= <it>N</it><sub><it>FP </it></sub>+ <it>N</it><sub><it>TN</it></sub>, and <it>T</it><sub><it>max </it></sub>= <it>N</it><sub><it>T </it></sub>+ <it>N</it><sub><it>N</it></sub>. Furthermore, underour assumed error mechanism, following similar derivations as shown in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, <it>N</it><sub><it>FP </it></sub>and <it>N</it><sub><it>FN </it></sub>follow the binomial distributions <it>Bin</it>(<it>T</it><sub><it>max </it></sub>- <it>N</it><sub><it>T</it></sub>, <it>r</it><sub><it>FP</it></sub>) and <it>Bin</it>(<it>N</it><sub><it>T</it></sub>, <it>r</it><sub><it>FN</it></sub>), respectively, for a given value of <it>N</it><sub><it>T</it></sub>. This implies that <it>r</it><sub><it>FP </it></sub>= <it>E</it>(<it>N</it><sub><it>FP</it></sub>)/(<it>T</it><sub><it>max </it></sub>- <it>N</it><sub><it>T</it></sub>) = <it>E</it>(<it>N</it><sub><it>FP</it></sub>)/(<it>N</it><sub><it>FP </it></sub>+ <it>N</it><sub><it>TN</it></sub>) and <it>r</it><sub><it>FP </it></sub>= <it>E</it>(<it>N</it><sub><it>FN</it></sub>)/<it>N</it><sub><it>T </it></sub>= <it>E</it>(<it>N</it><sub><it>FN</it></sub>)/(<it>N</it><sub><it>TP </it></sub>+ <it>N</it><sub><it>FN</it></sub>), where <it>E</it>(<it>X</it>) denotes the expectation of random variable <it>X</it>. Then the conditional probability <it>P</it>(<it>N</it><sub><it>P </it></sub>= <it>k</it>|<it>N</it><sub><it>T </it></sub>= <it>j</it>) in (2) can be written as follows.</p>
               <p>
                  <graphic file="1471-2105-6-119-i2.gif"/>
               </p>
               <p>where <it>dBin</it>(<it>k</it>; <it>p</it>, <it>n</it>) = <it>P</it>(<it>X </it>= <it>k</it>) with <it>X </it>~ <it>Bin</it>(<it>n</it>, <it>p</it>). Moreover, the power law of the scale-free network implies that <it>P</it>(<it>N</it><sub><it>T </it></sub>= <it>j</it>) = <it>cj</it><sup>-<it>&#947;</it></sup>. Hence, the observed connectivity distribution can be calculated by</p>
               <p>
                  <graphic file="1471-2105-6-119-i3.gif"/>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Simulations</p>
               </st>
               <p>We next explore the impact of the erroneous links on the topology of the scale-free networks. With an emphasis on the yeast protein interaction network, we compute the distribution of the observed connectivity of scale-free networks with the false positive rate (<it>r</it><sub><it>FP</it></sub>) and false negative rate (<it>r</it><sub><it>FN</it></sub>) similar to the yeast protein interaction network under the assumption of the aforementioned simple error mechanism. We set the scale parameter <it>&#947; </it>= 3, the size of the network <it>n </it>= 1000 or 7000, and vary <it>r</it><sub><it>FP </it></sub>from 0.0001 to 0.0003 and <it>r</it><sub><it>FN </it></sub>from 0.1 to 0.9 on 9 equally spaced values. These ranges of <it>r</it><sub><it>FP </it></sub>and <it>r</it><sub><it>FN </it></sub>are based on Deng <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, in which the authors estimated the false positive rate and false negative rate to be less than 0.000285 and greater than 0.64, respectively, based on the Y2H data. We consider a larger range of <it>r</it><sub><it>FP </it></sub>to cover other data sources, such as the MIPS complex data, where false positives are less frequent. In the calculations, we use <it>T</it><sub><it>min </it></sub>= 1 and <it>T</it><sub><it>max </it></sub>= <it>n </it>- 1.</p>
               <p>In the log-log plot (Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr>) of the observed connectivity distribution of the perturbed networks when (<it>r</it><sub><it>FP </it></sub>= 0.0001, <it>r</it><sub><it>FN </it></sub>= 0.3) and (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.8), it can be seen that the connectivity distribution after perturbation still maintains the scale-free property in the middle range of the connectivity, but deviates from the original linear pattern at both the small and large connectivity regions. The slope of the linear part is close to the true value -3 (see Tables A.1 and A.2 in <supplr sid="S1">Additional file 1</supplr>). The deviation is more significant in the large connectivity region than that in the small connectivity region. This deviation pattern is consistent across networks of different sizes considered in our calculations (data not shown). Comparisons among the observed connectivity distributions (figures not shown) of perturbed networks with different values of r<sub><it>FP </it></sub>and <it>r</it><sub><it>FN </it></sub>suggest that the deviation depends little on <it>r</it><sub><it>FP </it></sub>but largely on <it>r</it><sub><it>FN</it></sub>. As <it>r</it><sub><it>FN </it></sub>increases, the deviation of the tail probability becomes more significant. This deviation is also more obvious in a smaller network.</p>
               <fig id="F1">
                  <title>
                     <p>Figure 1</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of the perturbed scale-free networks (<it>r</it><sub><it>FP </it></sub>= 0.0001, <it>r</it><sub><it>FN </it></sub>= 0.3)</p>
                  </caption>
                  <text>
                     <p><b>Connectivity distribution of the perturbed scale-free networks (<it>r</it><sub><it>FP </it></sub>= 0.0001, <it>r</it><sub><it>FN </it></sub>= 0.3)</b>. This picture shows the connectivity distribution of the the perturbed networks using (3) provided that <it>r</it><sub><it>FP </it></sub>= 0.0001 and <it>r</it><sub><it>FN </it></sub>= 0.3. Figure 1(b) and 1(d) are the linear parts of Figure 1(a) and 1(c), respectively, imposed with the regression lines fitted by the OLS.</p>
                  </text>
                  <graphic file="1471-2105-6-119-1"/>
               </fig>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of the perturbed scale-free networks (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.8)</p>
                  </caption>
                  <text>
                     <p><b>Connectivity distribution of the perturbed scale-free networks (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.8)</b>. This picture shows the connectivity distribution of the the perturbed networks using (3) provided that <it>r</it><sub><it>FP </it></sub>= 0.00015 and <it>r</it><sub><it>FN </it></sub>= 0.8. Figure 2(b) and 2(d) are the linear parts of Figure 2(a) and 2(c), respectively, imposed with the regression lines fitted by the OLS.</p>
                  </text>
                  <graphic file="1471-2105-6-119-2"/>
               </fig>
               <suppl id="S1">
                  <title>
                     <p>Additional File 1</p>
                  </title>
                  <text>
                     <p>Tables of the estimates of the scale parameter <it>&#947;</it>.</p>
                  </text>
                  <file name="1471-2105-6-119-S1.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
            </sec>
            <sec>
               <st>
                  <p>Estimation of <it>&#947;</it></p>
               </st>
               <p>The connectivity distribution of the perturbed network suggests a cautious use of the observed link data, especially on estimating <it>&#947;</it>. The scaling parameter <it>&#947;</it>, an important characteristic measure of the scale-free network, is commonly estimated using the ordinary least squares (OLS) in the linear model from the log transformation of (1).</p>
               <p>log <it>P</it>(<it>k</it>) = log <it>c </it>- <it>&#947; </it>log <it>k</it>. &#160;&#160;&#160; (4)</p>
               <p>It is well known that the OLS estimator can be very sensitive to even a small number of outliers. For example, applying the OLS estimator in Figure 1(a) will not be able to capture the linear trend if the point at the last end is included in the estimation. Therefore, robust estimators, such as the M-estimator and the least trimmed squares (LTS) estimator <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> are more proper choices in such situations due to their resistance to outliers. Our simulations suggest that the LTS estimator can correctly capture the linear trend without visual diagnosis of the connectivity distribution, while the OLS and M-estimator often fail to estimate the slope of the linear part correctly. Therefore, we will use the LTS estimator in our following simulation studies.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Exploring error mechanisms of yeast protein interaction networks by simulations</p>
            </st>
            <p>In the previous section, we found that the scale-free property can be conserved to a large extent under a simple error mechanism. However, the error mechanisms of the real data are often more complicated. For more complicated error mechanisms, theoretical derivations of the connectivity distribution of the perturbed networks are often intractable. But it is also important to know how the empirical connectivity distributions of real networks are affected by the erroneous links. Therefore, we conduct extensive simulation studies to investigate the finite-sample impact of the error mechanisms on the connectivity distribution. Our study focuses on the yeast protein-interaction network data.</p>
            <p>For real network data, no matter whether erroneous links are involved or not, the empirical connectivity distribution will not display a linear pattern as clear as the ones in Figure <figr fid="F1">1</figr> due to sampling variations and its discrete approximation to the tiny probability of nodes with large connectivities. For example, Figure <figr fid="F3">3</figr> shows the connectivity distribution of a simulated scale-free network <it>Net</it><sub>0 </sub>and Figure <figr fid="F4">4</figr> shows the connectivity distribution of <it>Net</it><sub>0 </sub>after perturbation by the simple error mechanism discussed above. In Figure <figr fid="F4">4</figr>, we observe a much larger curvature deviation from the linear trend at the small connectivity region than that in Figures <figr fid="F1">1</figr> and <figr fid="F2">2</figr>. It is not clear why the empirical distributions of the simulated networks are so different from the theoretical calculations, but this observation demonstrates that simulation studies are necessary to complement the findings from the theoretical calculations. In addition, simulation studies can also explore possible error mechanisms by comparing the connectivity distributions of simulated perturbed scale-free networks with the observed networks by assuming that their underlying structure are indeed scale-free.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Connectivity distribution of <it>Net</it><sub>0</sub></p>
               </caption>
               <text>
                  <p><b>Connectivity distribution of <it>Net</it><sub>0</sub></b>. This picture shows the connectivity distribution of the simulated scale-free random network <it>Net</it><sub>0 </sub>imposed with regression lines given by different methods (dashed line: OLS; dotted line: M-estimation; solid line: LTS).</p>
               </text>
               <graphic file="1471-2105-6-119-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Connectivity distribution of <it>Net</it><sub>0 </sub>after perturbation (<it>r</it><sub><it>FP </it></sub>= 0.0002, <it>r</it><sub><it>FN </it></sub>= 0.7)</p>
               </caption>
               <text>
                  <p><b>Connectivity distribution of <it>Net</it><sub>0 </sub>after perturbation (<it>r</it><sub><it>FP </it></sub>= 0.0002, <it>r</it><sub><it>FN </it></sub>= 0.7)</b>. This picture shows the connectivity distribution of the simulated scale-free random network <it>Net</it><sub>0 </sub>perturbed by the simple error mechanism using <it>r</it><sub><it>FP </it></sub>= 0.0002 and <it>r</it><sub><it>FN </it></sub>= 0.7. Regression lines given by different methods are also imposed (dashed line: OLS; dotted line: M-estimation; solid line: LTS).</p>
               </text>
               <graphic file="1471-2105-6-119-4"/>
            </fig>
            <p>In the following, we investigate the error mechanisms of two real yeast protein interaction network data sets used in Jeong <it>et al</it>. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and Deng <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> by comparing the connectivity distribution of these two networks with that of the simulated network perturbed by different error mechanisms. We assume that the true underlying topology of the yeast protein interaction network is scale-free <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Then if we perturb the simulated scale-free network by the error mechanisms similar to the ones of the real yeast protein interaction networks, the resulting connectivity distribution should be similar to the ones of the real networks.</p>
            <sec>
               <st>
                  <p>MIPS and Y2H yeast protein networks</p>
               </st>
               <p>Jeong <it>et al</it>. derived the yeast protein network from combined, non-overlapping Y2H data <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. This network has 1,870 proteins as nodes, connected by 2,240 identified direct physical interactions <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The other network was obtained from the gold standard of yeast protein interactions based on the MIPS complex data <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. This gold standard data set has 1,376 proteins and 2,876 interacting protein pairs, out of which 2,559 are also recorded in the Yeast Proteome Database (YPD) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The YPD subset has 1,373 proteins. Estimates of <it>&#947; </it>from the Y2H network, the gold standard data and the YPD subset are 2.396, 2.721 and 2.870, respectively. The connectivity distributions of these two networks are shown in Figure <figr fid="F5">5</figr> and Figure <figr fid="F6">6</figr>, respectively.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of the Y2H yeast protein interaction network</p>
                  </caption>
                  <text>
                     <p><b>Connectivity distribution of the Y2H yeast protein interaction network</b>. This picture shows the connectivity distribution of the protein interaction network in Jeong <it>et al</it>. [3] inferred from the Y2H data. The imposed regression line is fitted by the LTS method.</p>
                  </text>
                  <graphic file="1471-2105-6-119-5"/>
               </fig>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of the MIPS yeast protein interaction network</p>
                  </caption>
                  <text>
                     <p><b>Connectivity distribution of the MIPS yeast protein interaction network</b>. This picture shows the connectivity distribution of the protein interaction network in Deng <it>et al</it>. [6] inferred from the MIPS gold standard data. The imposed regression line is fitted by the LTS method.</p>
                  </text>
                  <graphic file="1471-2105-6-119-6"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>Error mechanisms</p>
               </st>
               <p>We consider different error mechanisms in terms of different types of false positive rates (<it>p</it><sub><it>ij </it></sub>= <it>P </it>(<it>x</it><sub><it>i </it></sub>and <it>x</it><sub><it>j </it></sub>are observed linked|<it>x</it><sub><it>i </it></sub>and <it>x</it><sub><it>j </it></sub>are actually unlinked)) and false negative rates (<it>q</it><sub><it>ij </it></sub>= <it>P </it>(<it>x</it><sub><it>i </it></sub>and <it>x</it><sub><it>j </it></sub>are observed unlinked|<it>x</it><sub><it>i </it></sub>and <it>x</it><sub><it>j </it></sub>are actually linked)) for node pair (<it>x</it><sub><it>i</it></sub>, <it>x</it><sub><it>j</it></sub>), <it>i </it>= 1,..., <it>n</it>, <it>j </it>= 1,..., <it>n</it>, <it>i </it>&#8800; <it>j</it>. Assume that the overall false positive rate and false negative rate are <it>r</it><sub><it>FP </it></sub>and <it>r</it><sub><it>FN</it></sub>, in the sense that the expected number of false positive links and false negative links are <it>E</it>(<it>N</it><sub><it>FP</it></sub>) = <it>r</it><sub><it>FP </it></sub><it>N</it><sub><it>N </it></sub>and <it>E</it>(<it>N</it><sub><it>FN</it></sub>) = <it>r</it><sub><it>FN </it></sub><it>N</it><sub><it>P</it></sub>. We consider nine different error mechanisms by letting <it>p</it><sub><it>ij </it></sub>and <it>q</it><sub><it>ij </it></sub>be one of the following three different types:</p>
               <p>1. <b>constant</b>: <it>p</it><sub><it>ij </it></sub>= <it>r</it><sub><it>FP </it></sub>and <it>q</it><sub><it>ij </it></sub>= <it>r</it><sub><it>FN </it></sub>for all (<it>x</it><sub><it>i</it></sub>, <it>x</it><sub><it>j</it></sub>);</p>
               <p>2. <b>increasing (with connectivity)</b>:</p>
               <p>
                  <graphic file="1471-2105-6-119-i4.gif"/>
               </p>
               <p>3. <b>decreasing (with connectivity)</b>:</p>
               <p>
                  <graphic file="1471-2105-6-119-i5.gif"/>
               </p>
               <p>where <it>L</it>(<it>x</it>) denotes the true connectivity of node <it>x</it>. For <it>Net</it><sub>0</sub>, <it>N</it><sub><it>P </it></sub>= 49, 007 and <it>N</it><sub><it>N </it></sub>= 24, 503, 521. The combinations of different structures on false positive rates and false negative rates produce nine error mechanisms in Table <tblr tid="T1">1</tblr>.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Nine error mechanisms.</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="center">
                           <p>Error mechanism</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>p</it>
                              <sub>
                                 <it>ij</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>q</it>
                              <sub>
                                 <it>ij</it>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>1</p>
                        </c>
                        <c ca="center">
                           <p>constant</p>
                        </c>
                        <c ca="center">
                           <p>constant</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>2</p>
                        </c>
                        <c ca="center">
                           <p>constant</p>
                        </c>
                        <c ca="center">
                           <p>increasing</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>3</p>
                        </c>
                        <c ca="center">
                           <p>constant</p>
                        </c>
                        <c ca="center">
                           <p>decreasing</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>4</p>
                        </c>
                        <c ca="center">
                           <p>increasing</p>
                        </c>
                        <c ca="center">
                           <p>constant</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>5</p>
                        </c>
                        <c ca="center">
                           <p>increasing</p>
                        </c>
                        <c ca="center">
                           <p>increasing</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>6</p>
                        </c>
                        <c ca="center">
                           <p>increasing</p>
                        </c>
                        <c ca="center">
                           <p>decreasing</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>7</p>
                        </c>
                        <c ca="center">
                           <p>decreasing</p>
                        </c>
                        <c ca="center">
                           <p>constant</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>8</p>
                        </c>
                        <c ca="center">
                           <p>decreasing</p>
                        </c>
                        <c ca="center">
                           <p>increasing</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>S</it>9</p>
                        </c>
                        <c ca="center">
                           <p>decreasing</p>
                        </c>
                        <c ca="center">
                           <p>decreasing</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Simulation studies</p>
               </st>
               <p>We simulate a scale-free network <it>Net</it><sub>0 </sub>using the preferential attachment growth model <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. In this algorithm, we start from <it>m</it><sub>0 </sub>= 7 isolated nodes and add <it>m </it>= 7 links to the existing nodes with probability proportional to their connectivity in each of the <it>T </it>= 7, 000 evolving steps. <it>Net</it><sub>0 </sub>has <it>L </it>= 49, 007 links and <it>n </it>= 7, 008 nodes. The mean-field theory <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> suggests that the theoretical value of <it>&#947; </it>for <it>Net</it><sub>0 </sub>is 3, which agrees well with the estimates in Table <tblr tid="T2">2</tblr>.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>Parameter estimates for <it>Net</it><sub>0</sub>.</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c ca="center">
                           <p>Parameter</p>
                        </c>
                        <c ca="center">
                           <p>OLS</p>
                        </c>
                        <c ca="center">
                           <p>M-estimation</p>
                        </c>
                        <c ca="center">
                           <p>LTS</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>log <it>c</it></p>
                        </c>
                        <c ca="center">
                           <p>1.4600</p>
                        </c>
                        <c ca="center">
                           <p>1.7846</p>
                        </c>
                        <c ca="center">
                           <p>4.008</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>&#947;</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>2.0918</p>
                        </c>
                        <c ca="center">
                           <p>2.1769</p>
                        </c>
                        <c ca="center">
                           <p>2.803</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>We always assume that false positives and false negatives are independently generated. In the simulations, a link is added (false positive) between every two unlinked nodes (<it>x</it><sub><it>i</it></sub>, <it>x</it><sub><it>j</it></sub>) in <it>Net</it><sub>0 </sub>with probability <it>p</it><sub><it>ij</it></sub>, and the link is removed (false negative) between two linked nodes (<it>x</it><sub><it>i</it></sub>, <it>x</it><sub><it>j</it></sub>) in <it>Net</it><sub>0 </sub>with probability <it>q</it><sub><it>ij</it></sub>. We also consider these error mechanisms under high and low overall false positive (<it>r</it><sub><it>FP</it></sub>) and false negative rates (<it>r</it><sub><it>FN</it></sub>). The connectivity distributions of <it>Net</it><sub>0 </sub>after perturbation are shown in Figures <figr fid="F7">7</figr>, <figr fid="F8">8</figr>, <figr fid="F9">9</figr>, <figr fid="F10">10</figr> for different values of <it>r</it><sub><it>FP </it></sub>and <it>r</it><sub><it>FN</it></sub>: (0.00025, 0.5), (0.00025, 0.8), (0.00015, 0.5), (0.00025, 0.8).</p>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00025, <it>r</it><sub><it>FN </it></sub>= 0.5)</p>
                  </caption>
                  <text>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00025, <it>r</it><sub><it>FN </it></sub>= 0.5).</p>
                  </text>
                  <graphic file="1471-2105-6-119-7"/>
               </fig>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00025, <it>r</it><sub><it>FN </it></sub>= 0.8)</p>
                  </caption>
                  <text>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00025, <it>r</it><sub><it>FN </it></sub>= 0.8).</p>
                  </text>
                  <graphic file="1471-2105-6-119-8"/>
               </fig>
               <fig id="F9">
                  <title>
                     <p>Figure 9</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.5)</p>
                  </caption>
                  <text>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.5).</p>
                  </text>
                  <graphic file="1471-2105-6-119-9"/>
               </fig>
               <fig id="F10">
                  <title>
                     <p>Figure 10</p>
                  </title>
                  <caption>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.8)</p>
                  </caption>
                  <text>
                     <p>Connectivity distribution of <it>Net</it><sub>0 </sub>perturbed by different error mechanisms (<it>r</it><sub><it>FP </it></sub>= 0.00015, <it>r</it><sub><it>FN </it></sub>= 0.8).</p>
                  </text>
                  <graphic file="1471-2105-6-119-10"/>
               </fig>
               <p>Under the nine different error mechanisms, the connectivity distribution of the perturbed <it>Net</it><sub>0 </sub>can be dramatically different. Under error mechanisms <it>S</it>2, <it>S</it>5, <it>S</it>6 and <it>S</it>9, the perturbed networks contain a small proportion of nodes with low connectivity, which differs greatly from the observed yeast protein interaction networks (Figures <figr fid="F5">5</figr> and <figr fid="F6">6</figr>). This finding suggests that these four mechanisms are far different from the true error structure, and we will not discuss them in the following. We also observe that changes in <it>r</it><sub><it>FP </it></sub>render little impact on the connectivity distribution under all error mechanisms, but a higher value of <it>r</it><sub><it>FN </it></sub>increases the probability of nodes with small connectivity under <it>S</it>1, <it>S</it>3 and <it>S</it>8. And mechanisms <it>S</it>4 and <it>S</it>7 are highly stable structures, that is, the connectivity distribution changes little in response to changes in <it>r</it><sub><it>FP </it></sub>or <it>r</it><sub><it>FN </it></sub>under these two error mechanisms. This suggests that scale-free networks with constant false negative rates can still provide very credible information about its topological structure. This finding is also confirmed by the fact that the estimates of <it>&#947; </it>vary little when <it>r</it><sub><it>FN </it></sub>changes (see Tables A.5 and A.6 in <supplr sid="S1">Additional file 1</supplr>). The estimated values of <it>&#947; </it>vary only from 2.61 to 3.03 with a standard error of 0.125 under <it>S</it>4 and only from 2.56 to 3.31 with a standard error of 0.161 under <it>S</it>7, whereas the estimate of <it>&#947; </it>clearly decreases as <it>r</it><sub><it>FN </it></sub>increases under <it>S</it>3 and <it>S</it>8 (Tables A. 4 and A. 7 in <supplr sid="S1">Additional file 1</supplr>). Under <it>S</it>1, there is no clear pattern on the estimated <it>&#947; </it>as <it>r</it><sub><it>FN </it></sub>changes (Table A.3 in <supplr sid="S1">Additional file 1</supplr>), but the estimates of <it>&#947; </it>vary in a much wider range (1.16 &#8211; 4.35) compared with those under <it>S</it>3 and <it>S</it>8. It is worth noting that our conclusions are restricted to the particular range of <it>r</it><sub><it>FP </it></sub>and <it>r</it><sub><it>FN </it></sub>we have studied, however these ranges are believed to be reasonable to describe the Y2H systems.</p>
               <p>The simple error mechanism <it>S</it>1 with a high false negative rate produces patterns (Figures 8(a) and 10 (a)) similar to that of the gold standard data (Figure <figr fid="F6">6</figr>). For the Y2H yeast protein interaction network (Figure <figr fid="F5">5</figr>), <it>S</it>4 gives the best approximation, but still differs slightly in the probabilities of nodes with small connectivity. This suggests that the real error structure of the Y2H analyses may be more complicated than all the simple proposals we have considered.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>This article first investigates the impact of erroneous links on network topological inference. From our theoretical and simulation results, we find that, under a simple error mechanism, the scale-free property is preserved for moderate connectivities. But the linear pattern is distorted at both the small and large connectivity regions. Accordingly, we recommend to use robust estimators (e.g. LTS) that are more resistant to the outliers at both ends of the distribution to estimate the scale parameter <it>&#947;</it>.</p>
         <p>Moreover, we have also explored possible error mechanisms of the yeast protein interaction data by simulations considering nine different error mechanisms. The results suggest that changes in the overall false positive rates have little impact on the resulting connectivity distribution, but increasing the overall false negative rates can increase the probability of nodes with small connectivities under some error mechanisms, and hence decrease the scale parameter <it>&#947;</it>. The connectivity distribution can be very stable under several error mechanisms when the overall false positive rates and false negative rates change, which suggests that in certain situations the observed data can provide suffcient topological information on the underlying network structure even when the false negative rates are quite high.</p>
         <p>The simple error mechanism that assumes that the false positive rate and false negative rate of each protein pair are constants agrees well with the MIPS gold standard data when the false negative rate is high. A different error mechanism is suggested for the Y2H data, where more connected protein pairs tend to have higher false positive rates and lower false negative rates. As this error mechanism provides only a reasonable approximation to the Y2H data, more sophisticated mechanisms might be needed to better capture its error structure.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Preferential attachment growth model</p>
            </st>
            <p>In a series of papers <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, Barab&#225;si <it>et al</it>. demonstrated that a scale-free network could be obtained by growing from a small number of isolated nodes by preferential attachment. The simulation scheme is defined in two steps:</p>
            <p>1. <it>Growth</it>: starting with a small number (<it>m</it><sub>0</sub>) of nodes, add a new node at every time step and connect it to <it>m </it>(&#8804; <it>m</it><sub>0</sub>) nodes already present in the system</p>
            <p>2. <it>Preferential attachment</it>: The new node is more likely to connect to nodes with larger connectivity. The probability &#928;<sub><it>i </it></sub>that a new node will be connected to node <it>i </it>depends on its connectivity <it>k</it><sub><it>i</it></sub>, such that <graphic file="1471-2105-6-119-i6.gif"/>.</p>
         </sec>
         <sec>
            <st>
               <p>Least Trimmed Squares (LTS)</p>
            </st>
            <p>The basic idea of LTS estimation is to minimize the sum of <it>h </it>smallest squared residuals instead of all squared residuals in the OLS to achieve robustness and also maintain good effciency. Please refer to <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> for more details of the algorithm, such as practical choices of <it>h</it>. In this article, the LTS estimation is performed using the lqs() function implemented in R <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>HZ had the initial idea and initiated the study. NL conducted the data analyses, and created all tables and figures, under the supervision of HZ. Both authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was supported in part by NSF grant DMS 0241160 and NIH grant R01 GM59507.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Scale-free characteristics of random networks: The topology of the World Wide Web</p>
            </title>
            <aug>
               <au>
                  <snm>Albert</snm>
                  <mi>RAL</mi>
                  <fnm>Barab&#225;si</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Physica A</source>
            <pubdate>2000</pubdate>
            <volume>281</volume>
            <fpage>69</fpage>
            <lpage>77</lpage>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Evolution of the social network of scientific collaborations</p>
            </title>
            <aug>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>N&#233;da</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Revasz</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Schubert</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vicsek</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Physica A</source>
            <pubdate>2002</pubdate>
            <volume>311</volume>
            <fpage>590</fpage>
            <lpage>614</lpage>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Lethality and centrality in protein networks</p>
            </title>
            <aug>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mason</snm>
                  <fnm>SP</fnm>
               </au>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Oltvai</snm>
                  <fnm>ZN</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>411</volume>
            <fpage>41</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35075138</pubid>
                  <pubid idtype="pmpid" link="fulltext">11333967</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Error and attach tolerance of complex networks</p>
            </title>
            <aug>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>378</fpage>
            <lpage>382</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35019019</pubid>
                  <pubid idtype="pmpid" link="fulltext">10935628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Yeast two-hybrid: State of the art</p>
            </title>
            <aug>
               <au>
                  <snm>Criekinge</snm>
                  <fnm>WV</fnm>
               </au>
               <au>
                  <snm>Beyaert</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Biol Proced Online</source>
            <pubdate>1999</pubdate>
            <volume>2</volume>
            <fpage>1</fpage>
            <lpage>38</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">140126</pubid>
                  <pubid idtype="pmpid">12734586</pubid>
                  <pubid idtype="doi">10.1251/bpo16</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Assessment of the reliability of protein-protein interactions and protein function prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2003</pubdate>
            <fpage>140</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12603024</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <aug>
               <au>
                  <snm>Dorogovtsev</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Mendes</snm>
                  <fnm>JFF</fnm>
               </au>
            </aug>
            <source>Evolution of Networks</source>
            <publisher>New York: Oxford University Press</publisher>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Inferring domain-domain interactions from protein-protein interactions</p>
            </title>
            <aug>
               <au>
                  <snm>Deng</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mehta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2002</pubdate>
            <fpage>1540</fpage>
            <lpage>1548</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187530</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368246</pubid>
                  <pubid idtype="doi">10.1101/gr.153002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <aug>
               <au>
                  <snm>Rousseeuw</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Leroy</snm>
                  <fnm>AM</fnm>
               </au>
            </aug>
            <source>Robust regression and outlier detection</source>
            <publisher>New York: Wiley</publisher>
            <pubdate>1987</pubdate>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A comprehensive analysis of protein-protein interactions in <it>Saccharomyces cerevisiae</it></p>
            </title>
            <aug>
               <au>
                  <snm>Uetz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Giot</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cagney</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Mansfield</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Judson</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Knight</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Lockshon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Narayan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Srinivasan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pochart</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>601</fpage>
            <lpage>603</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35001165</pubid>
                  <pubid idtype="pmpid" link="fulltext">10688178</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>DIP: the database of interacting proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Salwinski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Baron</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>289</fpage>
            <lpage>291</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102387</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592249</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Y2H protein interaction network data</p>
            </title>
            <url>http://www.nd.edu/~networks/database/protein/bo.dat.gz</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>MIPS gold standard protein interaction network data</p>
            </title>
            <url>http://hto-b.usc.edu/~msms/AssessInteraction/MIPSMatchYPD.txt</url>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Yeast Proteome Database</p>
            </title>
            <url>http://www.proteome.com/YPDhome.html</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Mean-field theory for scale-free random networks</p>
            </title>
            <aug>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jeong</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Physica A</source>
            <pubdate>1999</pubdate>
            <volume>272</volume>
            <fpage>173</fpage>
            <lpage>187</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Topology of evolving networks: local events and universality</p>
            </title>
            <aug>
               <au>
                  <snm>Albert</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Barab&#225;si</snm>
                  <fnm>AL</fnm>
               </au>
            </aug>
            <source>Phys Rev Lett</source>
            <pubdate>2000</pubdate>
            <volume>85</volume>
            <fpage>5234</fpage>
            <lpage>5237</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1103/PhysRevLett.85.5234</pubid>
                  <pubid idtype="pmpid" link="fulltext">11102229</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>R: A language and environment for statistical computing</p>
            </title>
            <aug>
               <au>
                  <cnm>R Development Core Team</cnm>
               </au>
            </aug>
            <source>R Foundation for Statistical Computing, Vienna, Austria</source>
            <pubdate>2004</pubdate>
            <url>http://www.R-project.org</url>
         </bibl>
      </refgrp>
   </bm>
</art>

