<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-13-S16-S1</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>A statistical model-building perspective to identification of MS/MS spectra with PeptideProphet</p>
         </title>
         <aug>
            <au id="A1"><snm>Ma</snm><fnm>Kelvin</fnm><insr iid="I1"/><email>kkma@purdue.edu</email></au>
            <au id="A2"><snm>Vitek</snm><fnm>Olga</fnm><insr iid="I1"/><insr iid="I2"/><email>ovitek@stat.purdue.edu</email></au>
            <au ca="yes" id="A3"><snm>Nesvizhskii</snm><mi>I</mi><fnm>Alexey</fnm><insr iid="I3"/><email>nesvi@umich.edu</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Department of Statistics, Purdue University, 250 N. University Street, West Lafayette, Indiana, USA</p></ins>
            <ins id="I2"><p>Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, Indiana, USA</p></ins>
            <ins id="I3"><p>Department of Pathology, University of Michigan, 4237 Medical Science I, Ann Arbor, Michigan, USA</p></ins>
         </insg>
         <source>BMC Bioinformatics</source>
         
         <supplement><title><p>Statistical mass spectrometry-based proteomics</p></title><editor>Predrag Radivojac and Olga Vitek</editor><note>Research and reviews</note></supplement><issn>1471-2105</issn>
         <pubdate>2012</pubdate>
         <volume>13</volume>
         <issue>Suppl 16</issue>
         <fpage>S1</fpage>
         <url>http://www.biomedcentral.com/1471-2105/13/S16/S1</url>
         <xrefbib><pubidlist><pubid idtype="pmpid">23176103</pubid><pubid idtype="doi">10.1186/1471-2105-13-S16-S1</pubid></pubidlist></xrefbib>
      </bibl>
      <history><pub><date><day>5</day><month>11</month><year>2012</year></date></pub></history>
      <cpyrt><year>2012</year><collab>Ma et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>PeptideProphet is a post-processing algorithm designed to evaluate the confidence in identifications of MS/MS spectra returned by a database search. In this manuscript we describe the "what and how" of PeptideProphet in a manner aimed at statisticians and life scientists who would like to gain a more in-depth understanding of the underlying statistical modeling. The theory and rationale behind the mixture-modeling approach taken by PeptideProphet is discussed from a statistical model-building perspective followed by a description of how a model can be used to express confidence in the identification of individual peptides or sets of peptides. We also demonstrate how to evaluate the quality of model fit and select an appropriate model from several available alternatives. We illustrate the use of PeptideProphet in association with the Trans-Proteomic Pipeline, a free suite of software used for protein identification.</p>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>In mass-spectrometry shotgun proteomics, the first phase of analysis is the identification of peptides in complex biological mixtures digested by enzymes such as trypsin. Dependent on the peptides in the biological mixture, an experiment will produce a certain number of spectra (call it <it>N</it>). MS/MS spectra are individually matched to peptides by searching through a database of peptides predicted from the genome of the organism. The way the searches are performed can be constrained using different search parameters, such as the number of tryptic termini (NTT), number of missed cleavages (NMC) or the mass difference of the observed precursor ion mass and the weight of the theoretical peptide (&#916;<it>M</it>).</p>
         <p>We will discuss PeptideProphet in the context of two database search algorithms: SEQUEST <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and Tandem with the k-score plugin <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. SEQUEST attempts to determine a direct correlation between an observed spectrum and sequences of amino acids in a protein sequence database. Typical quantities associated with SEQUEST include: <it>XCorr</it>, &#916;<it>Cn</it>, <it>SpRank</it>. Typical quantities associated with Tandem with the k-score plugin include: <it>logDot </it>(logarithm of dot product between observed and theoretical spectrum) and &#916;<it>Dot</it>. PeptideProphet can be used with any database search algorithm that returns a quantitative score.</p>
         <p>Given a database search algorithm, every spectrum that is observed will be scored against the peptides in the database. For each spectrum, the highest scoring peptide (depending on the scoring criterion) is typically chosen as the best match. The best match is the potential peptide sequence that generated its corresponding observed spectrum. Thus, we have <it>N </it>spectra that have been matched to a peptide and we will refer to these spectra as identified spectra.</p>
         <p>The necessity of PeptideProphet arises because the spectra are subject to noise making it difficult to determine if the peptide that it is matched to is correct. The spectrum itself is generated from a peptide sequence and peaks can be missing or reduced in intensity. Because the spectrum that is being generated is subject to noise the database-based criterion will vary when comparing theoretical spectra to observed spectra. Additionally, when searching the database, the correct peptide sequence may be absent. Because of this noise, how do we determine confidence in an identified spectrum? Traditional standards (such as just accepting all above <it>XCorr </it>&gt; 2.5) does not reflect the quality of the identification. Such a rule may accept too many incorrectly identified spectra. Thus, statistical inference is needed to model the presence of noise.</p>
         <p>PeptideProphet <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> is a post-processing and rescoring algorithm for determining confidence in identified spectra found using a database search. PeptideProphet is one of the first methods for the assessment of confidence. It is based on a probability model and an Empirical Bayesian approach to model fitting. It is now not a single model, but a family of models <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>The overview of PeptideProphet is as follows:</p>
         <p indent="1">1. Rescoring: produce a score which reflects the quality of an identified spectrum, while summarizing multiple quantities, such as <it>XCorr </it>and &#916;<it>Cn </it>or <it>logDot </it>and &#916;<it>Dot</it>. The rescoring separates incorrectly and correctly identified spectra scores as much as possible.</p>
         <p indent="1">2. Modeling: produce a probability-based model for the distribution of correctly and incorrectly identified spectra. The model must be then fit to the scores of all identified spectra.</p>
         <p indent="1">3. Evaluation of the Quality of Fit: determine how well the scores fit the probability-based model.</p>
         <p indent="1">4. Inference</p>
         <p indent="2">(a) Evaluation of confidence in individual identified spectra using the posterior probability.</p>
         <p indent="2">(b) Evaluation of confidence in sets of identified spectra: produce a cutoff on the scores to determine a set of correctly identified spectra while controlling the False-Discovery Rate, defined as the expected proportion of false positives.</p>
         <p>We will first discuss the basic version of PeptideProphet and then discuss the three extensions.</p>
      </sec>
      <sec>
         <st>
            <p>Materials</p>
         </st>
         <sec>
            <st>
               <p>Human plasma dataset</p>
            </st>
            <p>This dataset uses the first LC-MS/MS replicate file from the Western Consortium of the National Cancer Institute's Mouse Models of Human Cancer <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The data was obtained using the Multiple Affinity Removal System and was matched using a semitryptic SEQUEST search against an IPI human protein database allowing a 3 Dalton mass tolerance and 0-1 missed cleavage sites. More details on the spectra can be seen in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Controlled mixture</p>
            </st>
            <p>This dataset uses spectra generated from a linear ion trap Fourier transform instrument that was published in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. In particular the spectra from Mixture 3 was used, where 16 known trypsin-digested proteins from different mammals were analyzed. Spectra were also matched using a semitryptic SEQUEST search against a database file with the 16 known proteins concatenated with human influenza proteins allowing a 3 Dalton mass tolerance and 0-2 missed cleavage sites. Matches to human influenza proteins are known to be incorrect. More details on the dataset can be seen in <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Statement of the problem from a statistical perspective, and terminology</p>
            </st>
            <p>Every statistical approach requires the definition of the following components in the problem:</p>
            <p indent="1">1. PeptideProphet works with the observed spectra as the <it>experimental unit </it>where we have <it>N </it>observed spectra with <it>N </it>being generally large (in the thousands or more). Since the number of spectra <it>N </it>is typically very large, the identified spectra can be viewed as the underlying <it>population</it>.</p>
            <p indent="1">2. An observed score is interpreted as a test statistic. In statistics the summarized score <it>S </it>is called a <it>test statistic </it>because it is the function of the observed experimental unit that is being used to answer our hypotheses.</p>
            <p indent="1">3. PeptideProphet assumes that the test statistic comes from a mixture of two distributions: one from the distribution of correct identifications, and the other from the distribution of the incorrect identifications. The distributions may be characterized by a few parameters (parametric) or many parameters (semi or non-parametric).</p>
            <p indent="1">4. The goal of PeptideProphet is to test two competing <it>hypotheses </it>for each identified spectrum. Let <it>T<sub>i </sub></it>be the true status of identified spectrum <it>i </it>where <it>T<sub>i </sub></it>= 0 indicates that the identified spectrum was incorrectly identified and where <it>T<sub>i </sub></it>= 1 indicates that the identified spectrum was correctly identified. We then wish to compare:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i1"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>H</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">:</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>T</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mn>0</m:mn>
   <m:mspace class="thinspace" width="0.3em"/>
   <m:mspace class="thinspace" width="0.3em"/>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">null</m:mtext>
         </m:mstyle>
         <m:mspace class="thinspace" width="0.3em"/>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">hypothesis)</m:mtext>
         </m:mstyle>
         <m:mspace class="thinspace" width="0.3em"/>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">versus&#160;</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mi>H</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">1</m:mtext>
               </m:mstyle>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">:</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
         <m:mspace class="thinspace" width="0.3em"/>
         <m:mspace class="thinspace" width="0.3em"/>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">alterative</m:mtext>
               </m:mstyle>
               <m:mspace class="thinspace" width="0.3em"/>
               <m:mspace class="thinspace" width="0.3em"/>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">hypothesis</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mrow>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p indent="1">5. Inference: confidence is determined for individual spectra or sets of spectra.</p>
            <p indent="2">&#8226; If the researcher is interested in a set of spectrum identifications, the False Discovery Rate should be controlled.</p>
            <p indent="2">We determine the confidence in a set of spectra by controlling the False Discovery Rate. The False Discovery Rate, given a cutoff <it>&#948;</it>, is the expected proportion of all scores <it>S<sub>i </sub></it>&gt;<it>&#948; </it>that are truly incorrect (the proportion of accepted identified spectra that are false positives). This situation is synonymous to performing <it>N </it>multiple hypothesis tests where <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i2"><m:mi>F</m:mi>
<m:mi>D</m:mi>
<m:mi>R</m:mi>
<m:mo class="MathClass-rel">=</m:mo>
<m:mi>E</m:mi>
<m:mspace class="thinspace" width="0.3em"/>
<m:mrow>
   <m:mo class="MathClass-open">[</m:mo>
   <m:mrow>
      <m:mfrac>
         <m:mrow>
            <m:mi>V</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>R</m:mi>
         </m:mrow>
      </m:mfrac>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:mi>R</m:mi>
      <m:mo class="MathClass-rel">&gt;</m:mo>
      <m:mn>0</m:mn>
   </m:mrow>
   <m:mo class="MathClass-close">]</m:mo>
</m:mrow>
<m:mspace class="thinspace" width="0.3em"/>
<m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>R</m:mi>
      <m:mo class="MathClass-rel">&gt;</m:mo>
      <m:mn>0</m:mn>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> using the values in Table <tblr tid="T1">1</tblr>. <it>P </it>(<it>R </it>&gt; 0) is assumed to be 1 when we perform many tests (<it>N </it>is large). The False Discovery Rate is the expected proportion of incorrectly rejected null hypotheses out of the total rejected hypotheses. For a given cutoff if we were to repeat the experiment an infinite number of times and use the same cutoff each time the expected False Discovery Rate is the average proportion of incorrectly identified and accepted spectra out of the total number of incorrectly identified spectra.</p>
            <tbl id="T1"><title><p>Table 1</p></title><caption><p>Table of multiple hypothesis testing quantities</p></caption><tblbdy cols="4">
      <r>
         <c>
            <p/>
         </c>
         <c ca="center">
            <p>
               <b># Not Rejected</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b># Rejected</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p># True Nulls</p>
         </c>
         <c ca="center">
            <p>
               <it>U</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>V</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>N</it>
               <sub>0</sub>
            </p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p># True Alternatives</p>
         </c>
         <c ca="center">
            <p>
               <it>T</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>S</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>N - N</it>
               <sub>0</sub>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>Total</p>
         </c>
         <c ca="center">
            <p>
               <it>N - R</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>R</it>
            </p>
         </c>
         <c ca="center">
            <p>
               <it>N</it>
            </p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Table 1: <it>U</it>, <it>V</it>, <it>T</it>, and <it>S </it>correspond to the number of true negatives, false positives, false negatives, and true positives respectively.</p>
   </tblfn></tbl>
            <p indent="2">An alternative confidence rate that is rarely used is the False Positive Rate (FPR). The False Positive Rate, given a cutoff <it>&#948; </it>is the expected proportion of all truly incorrectly identified spectra that are considered to be correctly identified. From the terms in Table <tblr tid="T1">1</tblr> it is represented by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i3"><m:mi>F</m:mi>
<m:mi>P</m:mi>
<m:mi>R</m:mi>
<m:mo class="MathClass-rel">=</m:mo>
<m:mi>E</m:mi>
<m:mspace class="thinspace" width="0.3em"/>
<m:mrow>
   <m:mo class="MathClass-open">[</m:mo>
   <m:mrow>
      <m:mfrac>
         <m:mrow>
            <m:mi>V</m:mi>
         </m:mrow>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>N</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mn>0</m:mn>
               </m:mrow>
            </m:msub>
         </m:mrow>
      </m:mfrac>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>N</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">&gt;</m:mo>
      <m:mn>0</m:mn>
   </m:mrow>
   <m:mo class="MathClass-close">]</m:mo>
</m:mrow>
<m:mspace class="thinspace" width="0.3em"/>
<m:mi>P</m:mi>
<m:mspace class="thinspace" width="0.3em"/>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>N</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">&gt;</m:mo>
      <m:mn>0</m:mn>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula></p>
            <p indent="2">Many users prefer the q-value which is the minimum False Discovery Rate required for a score <it>s<sub>i </sub></it>to be considered significant. It is represented by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i4"><m:mi>q</m:mi>
<m:mi>v</m:mi>
<m:mi>a</m:mi>
<m:mi>l</m:mi>
<m:mi>u</m:mi>
<m:mi>e</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mi>i</m:mi>
<m:mi>n</m:mi>
<m:msub>
   <m:mrow>
      <m:mi>f</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mrow>
         <m:mo class="MathClass-open">{</m:mo>
         <m:mrow>
            <m:mi>&#915;</m:mi>
            <m:mo class="MathClass-rel">:</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">&#8712;</m:mo>
            <m:mi>&#915;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-close">}</m:mo>
      </m:mrow>
   </m:mrow>
</m:msub>
<m:mi>F</m:mi>
<m:mi>D</m:mi>
<m:mi>R</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>&#915;</m:mi>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>, where &#915; represents the set of all possible cutoff scores <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. This confidence measure is used to describe a score <it>s<sub>i </sub></it>at a single point but examines the False Discovery Rate of all possible scores. Unlike the False Discovery Rate, the q-value is a monotonic quantity with respect to the score cutoff.</p>
            <p indent="2">&#8226; If the researcher is interested in specific spectrum identifications the posterior error probability is most commonly used as it quantifies the confidence of a single identified spectrum.</p>
            <p indent="2">The posterior error probability represents <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i5"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> which we also denote as <it>PEP</it>. In other words using a probability model for <it>S<sub>i</sub></it>, we can find the probability of an identified spectrum being incorrect given its test statistic. Note that we can also calculate <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i6"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>1</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mn>1</m:mn>
<m:mo class="MathClass-bin">-</m:mo>
<m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> which is the probability of an identified spectrum being correct given its test statistic. The posterior error probability is also called the local false discovery rate (locfdr) <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>.</p>
            <p indent="2">Alternatively the p-value can be used. If <it>s<sub>i </sub></it>is the <it>i</it>th observed score then the p-value represents <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i7"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">&#8805;</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>H</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mn>0</m:mn>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>, or the probability of observing a score equal to or greater than <it>s<sub>i </sub></it>assuming that the <it>i</it>th identified spectrum was incorrectly identified. The p-value is similar to the FPR in that the p-value is the probability of observing a score equal to or greater than <it>s<sub>i </sub></it>assuming that it is one of the <it>N</it><sub>0 </sub>truly null hypotheses.</p>
         </sec>
         <sec>
            <st>
               <p>For each spectrum, PeptideProphet establishes a score reflecting the quality of an identified spectrum</p>
            </st>
            <p>First each spectrum (experimental unit) is observed and potentially identified using a database-based criterion (<it>XCorr</it>, &#916;<it>Cn</it>, <it>logDot</it>, &#916;<it>dot</it>, etc.), PeptideProphet rescores the identified peptide with a discriminant function, using the database-based criterion as the covariates for fitting the discriminant function. The goal is to fit a function that separates correct scores from incorrect scores. If <it>S<sub>i </sub></it>is the summarized score for the <it>i</it>th identified spectrum from a SEQUEST search result, a discriminant function produces a linear function <it>f</it>:</p>
            <p>
               <display-formula id="M1">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i8"><m:mrow>
   <m:mi>S</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>S</m:mi>
         <m:mi>E</m:mi>
         <m:mi>Q</m:mi>
         <m:mi>U</m:mi>
         <m:mi>E</m:mi>
         <m:mi>S</m:mi>
         <m:mi>T</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>X</m:mi>
         <m:mi>C</m:mi>
         <m:mi>o</m:mi>
         <m:mi>r</m:mi>
         <m:mi>r</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>&#916;</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>C</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>n</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>S</m:mi>
         <m:mi>p</m:mi>
         <m:mi>R</m:mi>
         <m:mi>a</m:mi>
         <m:mi>n</m:mi>
         <m:mi>k</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:mi>X</m:mi>
   <m:mi>C</m:mi>
   <m:mi>o</m:mi>
   <m:mi>r</m:mi>
   <m:mi>r</m:mi>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msub>
   <m:mi>&#916;</m:mi>
   <m:msub>
      <m:mrow>
         <m:mi>C</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>n</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>3</m:mn>
      </m:mrow>
   </m:msub>
   <m:mi>S</m:mi>
   <m:mi>p</m:mi>
   <m:mi>R</m:mi>
   <m:mi>a</m:mi>
   <m:mi>n</m:mi>
   <m:mi>k</m:mi>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>such that <it>S </it>&gt; 0 for correctly identified spectra and <it>S </it>&lt; 0 for incorrectly identified spectra.</p>
            <p>If <it>S<sub>i </sub></it>is the summarized score for the <it>i</it>th identified spectrum from a Tandem search result, a linear discriminant function is used but with different coefficients:</p>
            <p>
               <display-formula id="M2">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i9"><m:mrow>
   <m:mi>S</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mi>A</m:mi>
         <m:mi>N</m:mi>
         <m:mi>D</m:mi>
         <m:mi>E</m:mi>
         <m:mi>M</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>X</m:mi>
         <m:mi>C</m:mi>
         <m:mi>o</m:mi>
         <m:mi>r</m:mi>
         <m:mi>r</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>&#916;</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>C</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>n</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>S</m:mi>
         <m:mi>p</m:mi>
         <m:mi>R</m:mi>
         <m:mi>a</m:mi>
         <m:mi>n</m:mi>
         <m:mi>k</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:mi>l</m:mi>
   <m:mi>o</m:mi>
   <m:mi>g</m:mi>
   <m:mi>D</m:mi>
   <m:mi>o</m:mi>
   <m:mi>t</m:mi>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msub>
   <m:mi>&#916;</m:mi>
   <m:mi>D</m:mi>
   <m:mi>o</m:mi>
   <m:mi>t</m:mi>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>In the basic version of PeptideProphet the <it>&#946;</it>'s are estimated empirically from a controlled mixture and are dependent on the precursor ion charge (i.e. a separate discriminant function was trained for 1+, 2+, 3+ precursor ion charges).</p>
         </sec>
         <sec>
            <st>
               <p>PeptideProphet relates observable and unobservable quantities via a joint probability distribution</p>
            </st>
            <p>PeptideProphet relates scores <it>S<sub>i </sub></it>to parameters via a <it>sampling distribution </it>of the test statistic under <it>H</it><sub>0<it>i </it></sub>and <it>H<sub>ai</sub></it>. All scores <it>S<sub>i</sub></it>'s are independent and identically distributed (iid). The sampling distribution of <it>S<sub>i </sub></it>is assumed to follow a <it>Normal</it>(<it>&#956;</it>, <it>&#963;</it>) distribution if the identified spectrum is correct (<it>T<sub>i </sub></it>= 1) and <it>Gamma</it>(<it>&#945;</it>, <it>&#946;</it>, <it>&#947;</it>) distribution if the identified spectrum is incorrect (<it>T </it>= 0). Notationally we have that <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i10"><m:mi>p</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">~</m:mo>
<m:mi>G</m:mi>
<m:mi>a</m:mi>
<m:mi>m</m:mi>
<m:mi>m</m:mi>
<m:mi>a</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>&#945;</m:mi>
      <m:mo class="MathClass-punc">,</m:mo>
      <m:mi>&#946;</m:mi>
      <m:mo class="MathClass-punc">,</m:mo>
      <m:mi>&#948;</m:mi>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> and that <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i11"><m:mi>p</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>1</m:mn>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel"> ~ </m:mo>
<m:mi>N</m:mi>
<m:mi>o</m:mi>
<m:mi>r</m:mi>
<m:mi>m</m:mi>
<m:mi>a</m:mi>
<m:mi>l</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>&#956;</m:mi>
      <m:mo class="MathClass-punc">,</m:mo>
      <m:mi>&#963;</m:mi>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>. Note that other forms of the distribution of scores for incorrect identifications such as the Gumbel distribution are often used with no effect on the theory presented here. Among all identified spectra an additional parameter <it>&#960;</it><sub>0 </sub>is used to represent the overall proportion of incorrect identifications of identified spectra in the population. This formulation results in a 2-group mixture model similar to what is established by Efron <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> where we may write that</p>
            <p>
               <display-formula id="M3">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i12"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>S</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">~</m:mo>
   <m:mi>P</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mi>p</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>S</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:mi>P</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mi>p</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>S</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#960;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mn>1</m:mn>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:msub>
            <m:mrow>
               <m:mi>&#960;</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>The last equality is due to the fact that all scores are independent and identically distributed (iid). Due to different discriminant functions being used for each charge, a different sampling distribution and set of parameters are produced for each precursor ion charge (we will refer to this simply as the charge).</p>
            <p>There may be additional information available, such as the NTT (number of tryptic termini), NMC (number of missed cleavages), and &#916;<it>M </it>(delta mass) that can be used to improve the estimation of the sampling distribution of the identified spectra <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. For example, the use of NTT = 0 in unconstrained searches often leads to improved estimation of the parameters even in lower quality datasets <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. This is incorporated into the model above by assuming the existence of additional distributions for incorrect and correct identifications:</p>
            <p>
               <display-formula id="M4">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i13"><m:mrow>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>S</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>N</m:mi>
         <m:mi>T</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>N</m:mi>
         <m:mi>M</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>C</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>&#948;</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>M</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">~</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#960;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mo class="MathClass-punc">,</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>N</m:mi>
         <m:mi>T</m:mi>
         <m:mi>T</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>N</m:mi>
         <m:mi>M</m:mi>
         <m:mi>C</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mo class="MathClass-punc">,</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>&#916;</m:mi>
         <m:mi>M</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>&#960;</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mo class="MathClass-punc">,</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>N</m:mi>
         <m:mi>T</m:mi>
         <m:mi>T</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mo class="MathClass-punc">,</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>N</m:mi>
         <m:mi>M</m:mi>
         <m:mi>C</m:mi>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mi>f</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
   </m:msub>
   <m:msub>
      <m:mrow>
         <m:mo class="MathClass-punc">,</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>&#916;</m:mi>
         <m:mi>M</m:mi>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>Note that the density functions of <it>f<sub>T</sub></it><sub>=0</sub>, <it><sub>NTT</sub></it>, <it>f<sub>T</sub></it><sub>=0, <it>NMC</it></sub>, <it>f<sub>T</sub></it><sub>=0</sub>, <sub>&#916;<it>M</it></sub>, <it>f<sub>T</sub></it><sub>=1, <it>NTT</it></sub>, <it>f<sub>T</sub></it><sub>=1</sub>, <sub><it>NMC</it></sub>, and <it>f<sub>T</sub></it><sub>=1</sub>, <sub>&#916;<it>M </it></sub>are discrete. It is assumed, conditional on the identified spectrum being incorrect or correct, that the members of (<it>S<sub>i</sub></it>, <it>NTT<sub>i</sub></it>, <it>NMC<sub>i</sub></it>, <it>&#948;M<sub>i</sub></it>) are independent, as shown above.</p>
         </sec>
         <sec>
            <st>
               <p>PeptideProphet estimates parameters of interest in an Empirical Bayesian approach</p>
            </st>
            <p>PeptideProphet is considered an Empirical Bayesian approach because it uses each identified spectrum twice: once to estimate via the Expectation-Maximimzation <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> algorithm the parameters of the sampling distribution (&#960;<sub>0</sub>, <it>&#956;</it>, <it>&#963;</it>, <it>&#945;</it>, <it>&#946;</it>, and <it>&#947;</it>) and second to estimate the confidence in the correctness of an identified spectrum. The EM-algorithm iterates between two steps, called the E-step and the M-step in order to estimate the value of model parameters. With a large enough set of identified spectra (say 100), the EM-algorithm will always converge <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The algorithm starts with initial values of model parameters &#960;<sub>0</sub>, <it>&#956;</it>, <it>&#963;</it>, <it>&#945;</it>, <it>&#946;</it>, and <it>&#947;</it>.</p>
            <p>In the E-step, given the estimated values of the model parameters, the probability of each score being correct (or incorrect) is calculated. Given a single observed score <it>s<sub>i </sub></it>and its correctness status <it>T<sub>i</sub></it>, usage of Bayes Theorem yields <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i14"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mi>p</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>S</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
   <m:mrow>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mi>p</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>S</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mo class="MathClass-bin">+</m:mo>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mi>p</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>S</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula> which corresponds to the ratio of the Gamma density scaled by <it>&#960;</it><sub>0 </sub>over the sum of the Gamma and Normal densities scaled by <it>&#960;</it><sub>0 </sub>and 1 - <it>&#960;</it><sub>0 </sub>at score <it>s<sub>i</sub></it>.</p>
            <p>In the M-step, given estimated membership probabilities <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i15"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:msub>
   <m:mrow>
      <m:mi>p</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
</m:math></inline-formula> for each score <it>s<sub>i</sub></it>, the model parameters are re-estimated by finding the values with the maximum likelihood. The estimate of <it>&#960;</it><sub>0 </sub>is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i16"><m:mfrac>
   <m:mrow>
      <m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-op">&#8721;</m:mo>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mrow>
            <m:mi>N</m:mi>
         </m:mrow>
      </m:msubsup>
      <m:msub>
         <m:mrow>
            <m:mi>p</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mrow>
      <m:mi>N</m:mi>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula>. For the Normal distribution the estimates of <it>&#956; </it>and <it>&#963;</it><sup>2 </sup>are:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i17"><m:mrow>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>&#956;</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op">^</m:mo>
   </m:mover>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>p</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>s</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>p</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i18"><m:mrow>
   <m:msup>
      <m:mrow>
         <m:mover accent="true">
            <m:mrow>
               <m:mi>&#963;</m:mi>
            </m:mrow>
            <m:mo class="MathClass-op">^</m:mo>
         </m:mover>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msup>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>p</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:msup>
            <m:mrow>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>s</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                     </m:msub>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:mo>^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>p</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>For the Gamma distribution, the estimate of <it>&#947; </it>is simply the minimum of the scores <it>s<sub>i</sub></it>, <it>i </it>= 1, ..., <it>N</it>. In order to estimate <it>&#945; </it>and <it>&#946; </it>let <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i19"><m:msub>
   <m:mrow>
      <m:mi>m</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mn>1</m:mn>
   </m:mrow>
</m:msub>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-op">&#8721;</m:mo>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mrow>
            <m:mi>N</m:mi>
         </m:mrow>
      </m:msubsup>
      <m:msub>
         <m:mrow>
            <m:mi>p</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-bin">-</m:mo>
            <m:mover accent="true">
               <m:mrow>
                  <m:mi>&#947;</m:mi>
               </m:mrow>
               <m:mo class="MathClass-op">^</m:mo>
            </m:mover>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
   <m:mrow>
      <m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-op">&#8721;</m:mo>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mrow>
            <m:mi>N</m:mi>
         </m:mrow>
      </m:msubsup>
      <m:msub>
         <m:mrow>
            <m:mi>p</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula> and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i20"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>m</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:msub>
            <m:mrow>
               <m:mi>p</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:msup>
            <m:mrow>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>s</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                     </m:msub>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#947;</m:mi>
                        </m:mrow>
                        <m:mo>^</m:mo>
                     </m:mover>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:msub>
                        <m:mrow>
                           <m:mi>m</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mn>1</m:mn>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msup>
      </m:mrow>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>N</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:msub>
            <m:mrow>
               <m:mi>p</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math></inline-formula>. Then the estimates of <it>&#945; </it>and <it>&#946; </it>are</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i21"><m:mrow>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>&#945;</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op">^</m:mo>
   </m:mover>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mi>m</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msubsup>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>m</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i22"><m:mrow>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>&#946;</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op">^</m:mo>
   </m:mover>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>m</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>1</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>m</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>2</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math>
               </display-formula>
            </p>
            <p>Due to the speed of the algorithm in working with only two mixture components, the process of the E and M-step can be iterated repeatedly until the model parameters do not change by a specified <it>&#949; </it>where <it>&#949; </it>is a small number, such as 0.0001. The algorithm then outputs estimated parameters of <it>&#945;</it>, <it>&#946;</it>, <it>&#948;</it>, <it>&#956; </it>and <it>&#963;</it>, as well as the estimate of <it>&#960;</it><sub>0 </sub>(denoted with hats when estimates). The algorithm is detailed in Figure <figr fid="F1">1</figr>. Figures <figr fid="F2">2b</figr> and <figr fid="F2">2a</figr> shows two fits of PeptideProphet to the Human Plasma dataset of charges 2 and 3. Note that the EM algorithm can be substituted for alternative algorithms such as the Method of Moments.</p>
            <fig id="F1"><title><p>Figure 1</p></title><caption><p>Pseudocode of the EM-algorithm for iteratively estimating model parameters and membership probabilities</p></caption><text>
   <p><b>Pseudocode of the EM-algorithm for iteratively estimating model parameters and membership probabilities</b>.</p>
</text><graphic file="1471-2105-13-S16-S1-1"/></fig>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p>PeptideProphet fits on the Human Plasma Dataset</p></caption><text>
   <p><b>PeptideProphet fits on the Human Plasma Dataset</b>. PeptideProphet fits on the Human Plasma Dataset with Tandem Scores on charges 2 (left) and 3 (right). The blue and red curves correspond to the fitted frequency curves of the correct (Normal) and incorrect (Gamma) distributions. The Charge 2 fits yields a mixture distribution with a much stronger separation than the fit to Charge 3.</p>
</text><graphic file="1471-2105-13-S16-S1-2"/></fig>
         </sec>
         <sec>
            <st>
               <p>Evaluation of the quality of fit of PeptideProphet</p>
            </st>
            <p>Deviations of the assumptions, or a low number of identified spectra can lead to an inadequate or unstable model fit and incorrect conclusions. This can be diagnosed by visual inspection, and also by the bootstrap. We recommend using visual inspection over goodness of fit tests as tests do not explore the specific fitting issues that may influence subsequent inference of the identified spectra. In fact goodness of fit tests simply attempt to summarize the goodness of fit into one summary statistic whereas we are typically interested in the fit at certain locations of the mixture distribution. There are several visual attributes of the mixture distribution that researchers should be aware of and some remedies for them.</p>
            <p><it>Do the empirical scores follow the fitted curves well? </it>Particular attention needs to be paid to the tails of the distributions, especially the right tail of the distribution of scores of incorrect identifications (red) and the left tail of the distribution of scores of correct identifications (blue). This is often of most interest to researchers as the identified spectra in these regions are considered to be borderline correct or incorrect. In the case of Figure <figr fid="F2">2b</figr> the curves fit the histogram well but in Figure <figr fid="F2">2a</figr> there are many mismatches in the bars and the fitted curves. The culprit of these mismatches is likely due to the small number of spectra. The right portion of the Normal distribution is fit with approximately only 30 spectra. If the data is comprised of a large number of spectra but is deviating from the fitted curves, robust procedures can also be considered and will be discussed later.</p>
            <p><it>Do the curves highly overlap? </it>Although high overlap does not necessarily indicate a poor fit it will lead to smaller sets of confidently identified spectra. Overlaps that occur in situations of highly constrained searches can be remedied with techniques in later sections. Overlap in the case of a small number of spectra (Figure <figr fid="F2">2a</figr>) may be remedied by artificially adding observations using decoys which will also be subsequently demonstrated.</p>
            <p>An issue that is not commonly addressed however is the number of identified spectra available to fit the mixture model. The number of identified spectra required to fit a reliable model depends highly on the separation and the form of the observed scores. A statistical approach to examine the stability of the fitted model can be done via the bootstrap.</p>
            <p>Bootstrapping can be performed by sampling with replacement <it>B </it>samples (spectra) where each is of size <it>N </it>from the original dataset. At least 100 to 500 bootstrapped samples are recommended. For each bootstrapped sample <it>b</it>, we can refit the PeptideProphet model to receive bootstrapped estimates of <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i23"><m:msubsup>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#960;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-punc">,</m:mo>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo class="MathClass-bin">*</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula>, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i24"><m:msubsup>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#956;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo class="MathClass-bin">*</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula>, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i25"><m:msubsup>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#963;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo class="MathClass-bin">*</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula>, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i26"><m:msubsup>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#945;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo class="MathClass-bin">*</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula>, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i27"><m:msubsup>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#946;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo class="MathClass-bin">*</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula>, and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i28"><m:msubsup>
   <m:mrow>
      <m:mover accent="true">
         <m:mrow>
            <m:mi>&#947;</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op">^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mi>b</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mo class="MathClass-bin">*</m:mo>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula>. The bias, variance, and mean squared error (MSE) of the procedure used to estimate a parameter can be found using the bootstrapped estimates. In the case of <it>&#956;</it>, the bootstrap bias estimate is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i29"><m:mrow>
   <m:mover accent="false">
      <m:mrow>
         <m:mi>b</m:mi>
         <m:mi>i</m:mi>
         <m:mi>a</m:mi>
         <m:mi>s</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op"> ^</m:mo>
   </m:mover>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>B</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:msubsup>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>&#956;</m:mi>
                  </m:mrow>
                  <m:mo>^</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mo class="MathClass-bin">*</m:mo>
            </m:mrow>
         </m:msubsup>
      </m:mrow>
      <m:mrow>
         <m:mi>B</m:mi>
      </m:mrow>
   </m:mfrac>
   <m:mo class="MathClass-bin">-</m:mo>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>&#956;</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op">^</m:mo>
   </m:mover>
</m:mrow>
</m:math></inline-formula>. Large biases imply that the estimation procedure is systematically over or underestimating the true value of a parameter. Note that as <it>B </it>increases the bias does not move towards 0. The bootstrap variance estimate is defined as <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i30"><m:mrow>
   <m:mover accent="false">
      <m:mrow>
         <m:mi>v</m:mi>
         <m:mi>a</m:mi>
         <m:mi>r</m:mi>
         <m:mi>i</m:mi>
         <m:mi>a</m:mi>
         <m:mi>n</m:mi>
         <m:mi>c</m:mi>
         <m:mi>e</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op"> ^</m:mo>
   </m:mover>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msubsup>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:mi>b</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mrow>
               <m:mi>B</m:mi>
            </m:mrow>
         </m:msubsup>
         <m:mfenced close=")" open="(" separators="">
            <m:mrow>
               <m:msubsup>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#956;</m:mi>
                        </m:mrow>
                        <m:mo>^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>b</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-bin">*</m:mo>
                  </m:mrow>
               </m:msubsup>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:mfrac>
                  <m:mrow>
                     <m:msubsup>
                        <m:mrow>
                           <m:mo mathsize="big">&#8721;</m:mo>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>b</m:mi>
                           <m:mo class="MathClass-rel">=</m:mo>
                           <m:mn>1</m:mn>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>B</m:mi>
                        </m:mrow>
                     </m:msubsup>
                     <m:msubsup>
                        <m:mrow>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>&#956;</m:mi>
                              </m:mrow>
                              <m:mo>^</m:mo>
                           </m:mover>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>b</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mo class="MathClass-bin">*</m:mo>
                        </m:mrow>
                     </m:msubsup>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>B</m:mi>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
               </m:mfrac>
            </m:mrow>
         </m:mfenced>
      </m:mrow>
      <m:mrow>
         <m:mi>B</m:mi>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math></inline-formula>. Smaller variability is desired. The bias and variability of an estimation procedure is often summarized using the mean squared error, which is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i31"><m:mover accent="false">
   <m:mrow>
      <m:mi>M</m:mi>
      <m:mi>S</m:mi>
      <m:mi>E</m:mi>
   </m:mrow>
   <m:mo class="MathClass-op"> ^</m:mo>
</m:mover>
<m:mo class="MathClass-rel">=</m:mo>
<m:mover accent="false">
   <m:mrow>
      <m:mi>v</m:mi>
      <m:mi>a</m:mi>
      <m:mi>r</m:mi>
      <m:mi>i</m:mi>
      <m:mi>a</m:mi>
      <m:mi>n</m:mi>
      <m:mi>c</m:mi>
      <m:mi>e</m:mi>
   </m:mrow>
   <m:mo class="MathClass-op"> ^</m:mo>
</m:mover>
<m:mo class="MathClass-bin">+</m:mo>
<m:msup>
   <m:mrow>
      <m:mover accent="false">
         <m:mrow>
            <m:mi>b</m:mi>
            <m:mi>i</m:mi>
            <m:mi>a</m:mi>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mo class="MathClass-op"> ^</m:mo>
      </m:mover>
   </m:mrow>
   <m:mrow>
      <m:mn>2</m:mn>
   </m:mrow>
</m:msup>
</m:math></inline-formula>.</p>
            <p>Three hundred bootstrapped samples for the Human Plasma data for charges 2 and 3 were performed and the bootstrapped estimates for <it>&#960;</it><sub>0</sub>, <it>&#956;</it>, and <it>&#963; </it>are shown in Figure <figr fid="F3">3</figr>. Although the means of the bootstrapped distribution are close to the original estimates (marked in red) the bootstrapped distributions for these parameters are more skewed for Charge 2 than for Charge 3. Additionally the variance of the bootstrapped estimates is significantly greater in the Charge 2 case for <it>&#956; </it>and <it>&#963; </it>showing how unstable the estimates for the Charge 2 distribution given the small number of identified spectra.</p>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Bootstrapped samples of <b>&#960;<sub>0</sub>, <it>&#956; </it></b>, and <b><it>&#963; </it></b>for Charges 2 and 3 of the Human Plasma data</p></caption><text>
   <p><b>Bootstrapped samples of <b>&#960;<sub>0</sub>, <it>&#956; </it></b>, and <b><it>&#963; </it></b>for Charges 2 and 3 of the Human Plasma data</b>. The original estimates are marked by the vertical line. The length of the horizontal axes are equal for the plots of a particular parameter. The Charge 2 distributions are slightly skewed compared to Charge 3 distributions and the mean squared errors are much greater in Charge 2 distributions. The variability of the Charge 2 distributions are visibly much greater indicating unstable estimates.</p>
</text><graphic file="1471-2105-13-S16-S1-3"/></fig>
            <p>The mean squared error summarizes the overall deviation of parameter estimates from <it>B </it>bootstrapped samples to the original estimates. The experimenter may also view the deviations that occur between the original sample and a single bootstrapped sample. Although a histogram of both samples would suffice, a quantile-to-quantile plot is an easy-to-read plot that exemplifies the deviations between the two plots. The quantile-to-quantile plot plots the quantiles of one distribution versus the matched quantiles of the other. For example if there are 10 values in two datasets the quantile-to-quantile plot would display the 10, 20, 30,..., and 100th percentiles of one distribution matched with the respective 10, 20, 30, ..., and 100th percentiles of the second distribution. Distributions that are alike should result in a quantile-to-quantile plot that is linear. Deviations from linearity at different quantiles in the plot imply differences between the two distributions at those associated quantiles. Although no quantile-to-quantile plot will be perfectly linear the plot should not deviate much at the center and right portions of the plot as the accuracy of the estimated confidence of identified spectra relies heavily upon a good fit at these locations. The quantile-to-quantile plot for Charge 2 in Figure <figr fid="F4">4</figr> displays the deviation in quantiles of the original mixture distribution and the quantiles of a random bootstrapped sample. The deviations noticeably occur in the right half of the plot which corresponds to the right portion of the axis in Figure <figr fid="F2">2a</figr> indicating that the instability of the estimate is due to the right half of the plot. More specifically, it is due to the low number of identified spectra in this area of the plot.</p>
            <fig id="F4"><title><p>Figure 4</p></title><caption><p>Quantile-to-quantile plot comparing the quantiles of the original mixture distribution of the Human Plasma data</p></caption><text>
   <p><b>Quantile-to-quantile plot comparing the quantiles of the original mixture distribution of the Human Plasma data</b>. Quantile to quantile plot comparing the quantiles of the original mixture distribution of the Human Plasma data for Charges 2 (left) and 3 (right) compared to the quantiles of randomly bootstrapped samples. The quantile-to-quantile plot for Charge 2 shows more deviation in quantiles due to the low number of identified spectra in the score range between 2 and 8.</p>
</text><graphic file="1471-2105-13-S16-S1-4"/></fig>
         </sec>
         <sec>
            <st>
               <p>Estimating the confidence of spectrum identifications</p>
            </st>
            <sec>
               <st>
                  <p>Estimating the confidence of a set of spectrum identifications</p>
               </st>
               <p>In order to determine the correctness of the spectrum identifications, a decision rule is defined where any spectrum identification with a score above <it>&#948; </it>is concluded to be correct. In many experiments we are interested in the statistical properties of the list of spectrum identifications with scores above <it>&#948;</it>.</p>
               <p>In order to estimate the False Discovery Rate given a decision rule cutoff two approaches may be used. Because all scores are assumed to follow the same fitted distribution the False Discovery Rate can be estimated with <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i32"><m:mi>F</m:mi>
<m:mover accent="true">
   <m:mrow>
      <m:mi>D</m:mi>
   </m:mrow>
   <m:mo class="MathClass-op"> ^</m:mo>
</m:mover>
<m:mi>R</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>t</m:mi>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mover accent="true">
               <m:mrow>
                  <m:mi>&#960;</m:mi>
               </m:mrow>
               <m:mo class="MathClass-op">^</m:mo>
            </m:mover>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:mi>S</m:mi>
            <m:mo class="MathClass-rel">&gt;</m:mo>
            <m:mi>t</m:mi>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:mi>T</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mover accent="true">
               <m:mrow>
                  <m:mi>&#960;</m:mi>
               </m:mrow>
               <m:mo class="MathClass-op">^</m:mo>
            </m:mover>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:mi>S</m:mi>
            <m:mo class="MathClass-rel">&gt;</m:mo>
            <m:mi>t</m:mi>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:mi>T</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mo class="MathClass-bin">+</m:mo>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:mn>1</m:mn>
            <m:mo class="MathClass-bin">-</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mover accent="true">
                     <m:mrow>
                        <m:mi>&#960;</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-op">^</m:mo>
                  </m:mover>
               </m:mrow>
               <m:mrow>
                  <m:mn>0</m:mn>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:mi>S</m:mi>
            <m:mo class="MathClass-rel">&gt;</m:mo>
            <m:mi>t</m:mi>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:mi>T</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. This can be seen by using the areas under the colored curves in Figure <figr fid="F5">5</figr>. In a second approach, PeptideProphet traditionally estimates the False Discovery Rate by interpreting the posterior error probabilities as local false discovery rates <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. The estimated overall False Discovery Rate at point <it>t </it>is the average of the estimated local false discovery rates of identified spectra with scores greater than <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i33"><m:mi>t</m:mi>
<m:mo class="MathClass-rel">:</m:mo>
<m:mi>F</m:mi>
<m:mover accent="true">
   <m:mrow>
      <m:mi>D</m:mi>
   </m:mrow>
   <m:mo class="MathClass-op"> ^</m:mo>
</m:mover>
<m:mi>R</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>t</m:mi>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mo class="MathClass-op">&#8721;</m:mo>
         </m:mrow>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">&#8805;</m:mo>
            <m:mi>t</m:mi>
         </m:mrow>
      </m:msub>
      <m:mi>P</m:mi>
      <m:mi>E</m:mi>
      <m:msub>
         <m:mrow>
            <m:mi>P</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mrow>
      <m:mrow>
         <m:mo class="MathClass-open">{</m:mo>
         <m:mrow>
            <m:mi>#</m:mi>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">:</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">&#8805;</m:mo>
            <m:mi>t</m:mi>
         </m:mrow>
         <m:mo class="MathClass-close">}</m:mo>
      </m:mrow>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula>.</p>
               <fig id="F5"><title><p>Figure 5</p></title><caption><p>The PeptideProphet fit to the Human Plasma dataset of Tandem scores of Charge 2</p></caption><text>
   <p><b>The PeptideProphet fit to the Human Plasma dataset of Tandem scores of Charge 2</b>. The PeptideProphet fit to the Human Plasma dataset of Tandem scores of Charge 2 with fitted frequency curves from Figure 2b. The four confidence measures of the Posterior Error Probability (PEP), p-value, False Discovery Rate (FDR), and False Positive Rate (FPR) are shown at a score of 1. The Posterior Error Probability at 1 is 0.156 and the estimated False Discovery Rate is 0.083. The p-value and FPR are equivalent and equal to 0.004. In the formula for the estimated FDR, <it>red </it>is the estimate for <it>V </it>from Table 1 while <it>blue </it>combined with <it>red </it>is an estimate for <it>R </it>from Table 1.</p>
</text><graphic file="1471-2105-13-S16-S1-5"/></fig>
               <p>The False Positive Rate for a cutoff <it>t </it>can also be estimated using the area under the fitted frequency curve of the distribution of scores for incorrect identifications as seen in Figure <figr fid="F5">5</figr>. Mathematically this is equivalent to the p-value, or <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i34"><m:mi>F</m:mi>
<m:mover accent="true">
   <m:mrow>
      <m:mi>P</m:mi>
   </m:mrow>
   <m:mo class="MathClass-op"> ^</m:mo>
</m:mover>
<m:mi>R</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>t</m:mi>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mi>S</m:mi>
      <m:mo class="MathClass-rel">&gt;</m:mo>
      <m:mi>t</m:mi>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>H</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> since each incorrect score follows the same distribution. Note that the False Positive Rate ignores the distribution of scores for correct identifications.</p>
               <p>The estimation of the q-value at a specific point <it>&#961; </it>requires the estimation of the False Discovery Rate at every point <it>s<sub>i </sub></it>from <it>i </it>= 1, 2, ..., <it>N</it>. The q-value for a point <it>&#961; </it>is the minimum False Discovery Rate among all points <it>s<sub>i </sub></it>such that <it>s<sub>i </sub></it>&#8804; <it>&#961;</it>. The estimated False Discovery Rate can be found using the model-based estimates or by interpreting each posterior error probability as a local false discovery rate. The q-value is often useful if a monotonically increasing error rate is desired for decreasing cutoff values. For example, in the case of Figure <figr fid="F5">5</figr> suppose the experimenter was only interested in scores around 4. Using model-based estimates, the estimated False Discovery Rate with a cutoff at 4 is 0.01503874 but the estimate of the False Discovery Rate with a cutoff at 3.8 is 0.01489971 suggesting that the error rate is lower for a lower cutoff value. To avoid this issue, the q-value can be used as it finds the minimum False Discovery Rate at each cutoff value. The q-value at 4 is 0.01489939 (found using increments of 0.01 searching all FDR values from -4 to 4).</p>
            </sec>
            <sec>
               <st>
                  <p>Estimating the confidence of an individual spectrum identification</p>
               </st>
               <p>We now discuss the estimation of the posterior error probability and the p-value. These measures are properties of a single spectrum and are synonymous to performing a single hypothesis test. In Figure <figr fid="F5">5</figr> the posterior error probability and p-value only apply to spectra at a single point.</p>
               <p>According to Bayes Theorem the posterior probability of <it>T<sub>i </sub></it>= 0 (our hypotheses of interest) given its test statistic is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i35"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:mi>P</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mi>p</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>S</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">|</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
   <m:mrow>
      <m:mi>p</m:mi>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>S</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula>. Following the Empirical Bayesian step where parameters are estimated we have that <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i36"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
<m:mo class="MathClass-rel">=</m:mo>
<m:mfrac>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mover accent="true">
               <m:mrow>
                  <m:mi>&#960;</m:mi>
               </m:mrow>
               <m:mo class="MathClass-op">^</m:mo>
            </m:mover>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:msub>
         <m:mrow>
            <m:mi>f</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>T</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mover accent="true">
               <m:mrow>
                  <m:mi>&#960;</m:mi>
               </m:mrow>
               <m:mo class="MathClass-op">^</m:mo>
            </m:mover>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:msub>
         <m:mrow>
            <m:mi>f</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>T</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>0</m:mn>
         </m:mrow>
      </m:msub>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:mo class="MathClass-bin">+</m:mo>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:mn>1</m:mn>
            <m:mo class="MathClass-bin">-</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mover accent="true">
                     <m:mrow>
                        <m:mi>&#960;</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-op">^</m:mo>
                  </m:mover>
               </m:mrow>
               <m:mrow>
                  <m:mn>0</m:mn>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>f</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>T</m:mi>
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mn>1</m:mn>
         </m:mrow>
      </m:msub>
      <m:mrow>
         <m:mo class="MathClass-open">(</m:mo>
         <m:mrow>
            <m:msub>
               <m:mrow>
                  <m:mi>s</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>i</m:mi>
               </m:mrow>
            </m:msub>
         </m:mrow>
         <m:mo class="MathClass-close">)</m:mo>
      </m:mrow>
   </m:mrow>
</m:mfrac>
</m:math></inline-formula>. Because the posterior error probability is equivalent to the local false discovery rate we also have that <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i37"><m:mi>l</m:mi>
<m:mi>o</m:mi>
<m:mi>c</m:mi>
<m:mi>f</m:mi>
<m:mi>d</m:mi>
<m:mi>r</m:mi>
<m:mo class="MathClass-rel">=</m:mo>
<m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>T</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">=</m:mo>
      <m:mn>0</m:mn>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula>.</p>
               <p>The p-value is estimated as <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i38"><m:mi>P</m:mi>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:msub>
         <m:mrow>
            <m:mi>S</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">&gt;</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>s</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
      <m:mo class="MathClass-rel">|</m:mo>
      <m:msub>
         <m:mrow>
            <m:mi>H</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mn>0</m:mn>
            <m:mi>i</m:mi>
         </m:mrow>
      </m:msub>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> which is the right tail-end of the Gamma density past <it>s<sub>i</sub></it>.</p>
               <p>The posterior error probability may be preferred over the p-value because it also yields an estimate for the probability of an identified spectrum to being correct (1 - <it>PEP</it>). The advantage of the p-value is that it only requires the use of the distribution of scores for incorrect identifications as it ignores the distribution of scores for correct identifications. Notice that in Figure <figr fid="F5">5</figr> the p-value at a score of 1 is a low value of 0.004 but that the Posterior Error Probability at 1 is a much higher value at 0.156.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>PeptideProphet can use a decoy database to estimate the parameters of the distributions of scores for incorrect identifications</p>
            </st>
            <p>When there is significant overlap between the two density functions or a low number of identified spectra it is difficult for the EM-algorithm to estimate <it>&#960;</it><sub>0 </sub>and the parameters of the Gamma and Normal distributions. In this case PeptideProphet employs the Target-Decoy approach to better estimate the Gamma distribution. We first describe the two forms of Target-Decoy: the concatenated strategy and the separate strategy <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. The objective of both strategies is to introduce decoys in order to estimate the error rate since decoys are known to be incorrectly identified spectra. Reversed sequences (decoy sequences) are commonly generated by taking the target database and reversing each target sequence. Alternative methods are to use randomized sequences where amino acid sequences are generated using a pre-specified probability distribution <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
            <p>In the concatenated Target-Decoy strategy each spectrum is searched in a single database that is composed of both target and decoy sequences. This involves competition between the best correct peptide sequence, the best incorrect forward peptide sequence, and the best (incorrect) decoy peptide sequence. Hits where the best incorrect decoy peptide sequence is found to be the match are used to estimate the FDR.</p>
            <p>In the separate Target-Decoy strategy each spectrum is searched once in the forward database and searched again independently in the decoy database. The distribution of scores from the peptides identified via the decoy database is used to estimate the form of the distribution of incorrectly identified spectra. This approach ignores competition between forward and decoy sequences.</p>
            <p>The semisupervised version of PeptideProphet utilizes the concatenated Target-Decoy strategy by simply combining the target and decoy sequences into the <it>same </it>database. The decoy scores are forced to only contribute to the estimation of <it>&#945;</it>, <it>&#946;</it>, and <it>&#947; </it>of the Gamma distribution. PeptideProphet accomplishes this by assuming any decoy match has a posterior error probability of 1. In the EM-algorithm as described earlier, <it>p<sub>i </sub></it>for any decoy is assumed to be 1 at every iteration. The semisupervised version of PeptideProphet helps estimate the parameters of the Gamma distribution better and thus indirectly improves the estimation of <it>&#960;</it><sub>0</sub>, <it>&#956;</it>, and <it>&#963; </it>as well. As seen in Figure <figr fid="F6">6a</figr> for the case of the Human Plasma dataset the improved estimation of the distributions also increased the separation between the distributions. As seen in Figure <figr fid="F6">6b</figr> the use of decoys helped prevent the possible mistake of having high confidence in scores around the 0 to 1 range.</p>
            <fig id="F6"><title><p>Figure 6</p></title><caption><p>Semisupervised estimation of parameters</p></caption><text>
   <p><b>Semisupervised estimation of parameters</b>. Semisupervised estimation of parameters of the same distribution of scores as in Figures 2b and 2a. For Charge 3 the slight rightward shift of the Gamma distribution (distribution of scores for incorrect identifications) also encouraged a large rightward shift of the Normal Distribution (distribution of scores for correct identifications). The two vertical lines indicate the means of the Normal distributions. The addition of decoys for Charge 2 allowed to algorithm to learn that most of the identified spectra with scores from 0 to 1 are likely to be incorrect. Without decoys this may have been overlooked.</p>
</text><graphic file="1471-2105-13-S16-S1-6"/></fig>
         </sec>
         <sec>
            <st>
               <p>PeptideProphet can use a decoy database for semiparametric estimation of the probability distribution</p>
            </st>
            <p>The quality of fit of the Gamma and Normal distributions may rely on how the database is searched (constrained versus unconstrained search) or the search algorithm that is used <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. As is the case in many statistical modelings, there is no guarantee that the scores of the identified spectra necessarily follow the Gamma and Normal distributions. Previously, decoys were used to estimate parameters of pre-specified distributions. Now we will use decoys for data-dependent estimation of the distributions themselves.</p>
            <p>One approach is to estimate the distributional forms using a kernel density (semi-parametric) approach <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> as opposed to maximum likelihood estimation. Kernel density estimates first discretizes the horizontal axis into bins. For a specified bandwidth <it>h</it>, the distribution of scores for incorrect identifications is estimated using <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i39"><m:mrow>
   <m:mi>p</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>S</m:mi>
         <m:mo class="MathClass-rel">|</m:mo>
         <m:mi>h</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>n</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
         <m:mi>h</m:mi>
      </m:mrow>
   </m:mfrac>
   <m:msubsup>
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>i</m:mi>
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>n</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msubsup>
   <m:mi>K</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mfrac>
            <m:mrow>
               <m:mi>S</m:mi>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>S</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mrow>
               <m:mi>h</m:mi>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula> where <it>n</it><sub>0 </sub>is the number of decoys, <it>K </it>is the Normal density function, and <it>S<sub>i </sub></it>is the score of decoy <it>i</it>. The greater the <it>h </it>the smoother the function while the smaller the <it>h </it>the more rough the function. The parameter <it>h </it>can be specified using any method such as using the mean integrated square error. Cross-validation can be used as well. The distribution of scores for correct identifications is estimated in the same fashion as well but using only forward scores. Pseudocode of the semiparametric approach can be seen in Figure <figr fid="F8">8</figr>.</p>
            <fig id="F7"><title><p>Figure 7</p></title><caption><p>The Controlled Mixture dataset fit with the basic PeptideProphet and the semiparametric version</p></caption><text>
   <p><b>The Controlled Mixture dataset fit with the basic PeptideProphet and the semiparametric version</b>. The Controlled Mixture dataset fit with the basic PeptideProphet and the semiparametric version of PeptideProphet utilizing the kernel density estimator. The smoothed estimator allowed for a more fine-tuned fit to the estimated (asymmetric) distribution of the correctly identified spectra.</p>
</text><graphic file="1471-2105-13-S16-S1-7"/></fig>
            <fig id="F8"><title><p>Figure 8</p></title><caption><p>Pseudocode of the semiparametric version of PeptideProphet</p></caption><text>
   <p><b>Pseudocode of the semiparametric version of PeptideProphet</b>.</p>
</text><graphic file="1471-2105-13-S16-S1-8"/></fig>
            <p>An example of this approach can be seen in Figure <figr fid="F7">7</figr>. The parametric fit of the distribution of scores for correct identifications clearly deviates from the Normal curve as the mode of the correct hits is shifted to the right. The semiparametric approach produces a curve that more robustly fits the left-skewed distribution of scores for correct identifications.</p>
            <p>To avoid overfitting, this approach should only be used in the cases of strong deviations between the fitted distributions and the observed scores, such as the parametric fit (dashed-lines) in Figure <figr fid="F7">7</figr>. Overfitting typically occurs in experiments with a small number of spectra, such as in Figure <figr fid="F2">2a</figr>. Overfitting can be checked via bootstrapping by seeing if bootstrapped samples do not reflect the same need for a semiparametric fit at certain score values. This can be done via quantile to quantile plots or by checking mean squared errors. If users anticipate good separation, parametric PeptideProphet is often sufficient for practical purposes.</p>
         </sec>
         <sec>
            <st>
               <p>PeptideProphet can be extended to dynamically estimate the coefficients of the discriminant function from the data</p>
            </st>
            <p>Overlap in the distributions of scores of correct and incorrect identifications can be due to a suboptimal scoring function, which does not discriminate well between the properties of correct and incorrect identifications. This often occurs in cases of constrained searches where the database that is searched is much smaller than the unconstrained search space that was used to find the coefficients in the fixed discriminant function. For additional information on constrained versus unconstrained searches, see <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. A solution to this is to adapt the discriminant function to each experiment or search approach which can improve the separation between the distribution of scores for incorrect and correct identifications <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
            <p>Pseudocode of the adaptive version of PeptideProphet can be seen in Figure <figr fid="F10">10</figr>. The main step in the algorithm is to update <it>&#946;</it>'s from Equations 1 or 2 by extracting identified spectra with high posterior error probabilities and identified spectra with low posterior error probabilities. When retraining the <it>&#946;</it>'s the algorithm will randomly sample identified spectra with low posterior error probabilities <it>I </it>times and produce <it>I </it>different estimates. The average of these <it>I </it><it>&#946;</it>'s is the updated <it>&#946;</it>. This entire step is repeated by re-estimating posterior error probabilities and updating <it>&#946; </it>until the <it>&#946; </it>do not change by a small <it>&#949;</it>.</p>
            <fig id="F9"><title><p>Figure 9</p></title><caption><p>Semiparametric fits with dynamically estimated coefficients</p></caption><text>
   <p><b>Semiparametric fits with dynamically estimated coefficients</b>. Semiparametric fits of the distributions of scores for correct and incorrect identifications on the Controlled Mixture Dataset from a constrained search (tryptic peptides, narrow mass tolerance) using fixed discriminant coefficients (left) versus adaptive discriminant coefficients (center). The right tail of the distribution of scores for incorrect identifications can be seen penetrating the distribution of scores for correct identifications more deeply in the fixed case implying greater discriminative ability when using the adaptive discriminant function. The improved performance of adaptive coefficients can be seen in the plot of the estimated FDR versus the estimated number of significant correctly identified spectra (right). Recall that in this dataset, target scores are assumed correct. The estimated FDR here was estimated by the ratio of the number of decoys to target scores.</p>
</text><graphic file="1471-2105-13-S16-S1-9"/></fig>
            <fig id="F10"><title><p>Figure 10</p></title><caption><p>Pseudocode of the adaptive version of PeptideProphet</p></caption><text>
   <p><b>Pseudocode of the adaptive version of PeptideProphet</b>.</p>
</text><graphic file="1471-2105-13-S16-S1-10"/></fig>
            <p>The improvement of the adaptive discriminant function over the fixed discriminant function for the Controlled Mixture dataset in a constrained search space is displayed in Figure <figr fid="F9">9</figr>. Only tryptic peptides with a narrow mass tolerance were searched.</p>
            <p>This approach is also useful for incorporating lower ranked peptide matches (i.e. for a given spectrum, instead of only considering the <it>best </it>peptide sequence match, use the new discriminant function to also rescore peptide sequence matches that ranked close to the <it>best </it>peptide sequence match). Every time a new discriminant function is estimated (when the <it>I </it><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i46"><m:mrow>
   <m:msup>
      <m:mrow>
         <m:mover accent="false">
            <m:mrow>
               <m:mi>&#946;</m:mi>
            </m:mrow>
            <m:mo class="MathClass-op">^</m:mo>
         </m:mover>
      </m:mrow>
      <m:mrow>
         <m:mi>&#8242;</m:mi>
      </m:mrow>
   </m:msup>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">s</m:mtext>
   </m:mstyle>
</m:mrow>
</m:math></inline-formula> are averaged) a new summarized score is calculated for the top 5 (can be changed of course) Peptide Matches for every spectrum. The highest scoring peptide-spectrum-match is used in the training of the next discriminant function.</p>
         </sec>
         <sec>
            <st>
               <p>Implementation of the PeptideProphet in the Trans-Proteomic Pipeline</p>
            </st>
            <p>The Trans-Proteomic Pipieline (TPP) is an open source program developed at the Institute for Systems Biology designed for complete proteomic analysis starting from spectrum identification to protein identification and quantification and can be downloaded from <url>http://sourceforge.net/projects/sashimi/</url><abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In this section we assume that search results have already been converted to pepXML files, which is the standard input for PeptideProphet. A discussion of this can be found at (<url>http://tools.proteomecenter.org/wiki/index.php?title=TPP_Tutorial</url>).</p>
            <p>We present an example using the Human Plasma dataset where the spectra are searched through Tandem with the k-score plugin with TPP version 4.4. PeptideProphet automatically models all precursor ion charges and outputs the probability of correct identification. A mixture model using the Normal for the distribution of correct scores and a Gumbel distribution for the distribution of incorrect scores.</p>
            <p>In Figure <figr fid="F11">11</figr> of the 17543 identified spectra are listed. The first column lists the probability of correct identification (1 - <it>PEP</it>), so numbers close to 1 here are desirable. The remaining columns list, in order, the spectrum label, Tandem expect score, the fraction of ions matched, the peptide sequence match, the protein match, and the calculated neutral peptide mass. In this example any protein label with a "rev" is a decoy. Each hyperlink will lead to additional information. For example, clicking on a peptide sequence will lead to a BLAST search or clicking on the fraction of ions matched will display the observed spectrum.</p>
            <fig id="F11"><title><p>Figure 11</p></title><caption><p>pepXML viewer from TPP</p></caption><text>
   <p><b>pepXML viewer from TPP</b>. The output of PeptideProphet is stored in pepXML format. The pepXML viewer visualizes the content of pepXML and posterior probabilities associated with each identified spectrum.</p>
</text><graphic file="1471-2105-13-S16-S1-11"/></fig>
            <p>Clicking on 0.7664, or the ninth entry "2b_plasma_0mM_C1.00024.00024.1" on the identified spectra list, results in information of the model fit by PeptideProphet in Figure <figr fid="F12">12</figr> and the estimated parameter values for charge 2 in Figure <figr fid="F13">13</figr>.</p>
            <fig id="F12"><title><p>Figure 12</p></title><caption><p>Scoring results for identified spectra from a PeptideProphet fit in TPP</p></caption><text>
   <p><b>Scoring results for identified spectra from a PeptideProphet fit in TPP</b>. PeptideProphet output of sensitivity error analysis and figures of estimated mixture models. The bottom portion shows the fitted curves for different charges. The light blue curves represent the distribution of scores for incorrect identification, purple for correct identifications, and black the sum of the two distributions. The red vertical line also indicates the score for the identified spectra that we clicked on with its additional information at the bottom of the figure.</p>
</text><graphic file="1471-2105-13-S16-S1-12"/></fig>
            <fig id="F13"><title><p>Figure 13</p></title><caption><p>Parameter estimates for a PeptideProphet fit in TPP</p></caption><text>
   <p><b>Parameter estimates for a PeptideProphet fit in TPP</b>. Estimated parameter values of the PeptideProphet mixture model for charge 2. The parameters of accurate mass difference (&#916;<it>M</it>) are not fully displayed.</p>
</text><graphic file="1471-2105-13-S16-S1-13"/></fig>
            <p>We will now discuss how to use the information in Figures <figr fid="F12">12</figr> and <figr fid="F13">13</figr> to estimate the confidence measures discussed previously:</p>
            <p indent="1">1. False Discovery Rate: estimates of the False Discovery Rate can be obtained three ways. In the upper-right hand corner of Figure <figr fid="F12">12</figr> estimated False Discovery Rates under the "Error" column is given for 1 - <it>PEP </it>values under the "Min Prob" column. In other words, "Min Prob" represents the minimum posterior probability of being correct in order to conclude that an identified spectrum is correct. For example, a "Min Prob" of 0.95 implies that only identified spectra with PEP's lesser than 0.05 are considered correct or that (1 - <it>PEP</it>) must be greater than 0.95 to be considered correct.</p>
            <p indent="1">A second approach is to use the estimated model parameters in Figure <figr fid="F13">13</figr> to estimate the False Discovery Rate for identified spectra of charge 2. The estimate for (1 - &#960;<sub>0</sub>) is 0.04 which yields an estimate of &#960;<sub>0 </sub>as 0.96. The Normal's (Gaussian) estimated mean <it>&#956; </it>is 2.6 with an estimated standard deviation <it>&#963; </it>of 1.90. The Gumbel's estimated <it>&#956;<sub>G </sub></it>parameter is -1.16 with an estimated <it>&#946; </it>parameter as 0.76. Alternatively, the expected value (mean) of the Gumbel is -1.16 with a standard deviation of 0.98. If the experimenter is not interested in <it>NTT</it>, <it>NMC</it>, and &#916;<it>M</it>, for a cutoff score <it>t</it>, the estimated FDR can then be estimated by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i40"><m:mrow>
   <m:mi>F</m:mi>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>D</m:mi>
         <m:mi>R</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op"> ^</m:mo>
   </m:mover>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>t</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>p</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op"> ^</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
               </m:mstyle>
            </m:mrow>
         </m:msub>
         <m:mi>P</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>S</m:mi>
               <m:mo class="MathClass-rel">&gt;</m:mo>
               <m:mi>t</m:mi>
               <m:mo class="MathClass-rel">|</m:mo>
               <m:mi>T</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>0</m:mn>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mover accent="true">
                  <m:mrow>
                     <m:mi>p</m:mi>
                  </m:mrow>
                  <m:mo class="MathClass-op"> ^</m:mo>
               </m:mover>
            </m:mrow>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
               </m:mstyle>
            </m:mrow>
         </m:msub>
         <m:mi>P</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>S</m:mi>
               <m:mo class="MathClass-rel">&gt;</m:mo>
               <m:mi>t</m:mi>
               <m:mo class="MathClass-rel">|</m:mo>
               <m:mi>T</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>0</m:mn>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-bin">+</m:mo>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mn>1</m:mn>
               <m:mo class="MathClass-bin">-</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>p</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op"> ^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mstyle class="text">
                        <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                     </m:mstyle>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mi>P</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>S</m:mi>
               <m:mo class="MathClass-rel">&gt;</m:mo>
               <m:mi>t</m:mi>
               <m:mo class="MathClass-rel">|</m:mo>
               <m:mi>T</m:mi>
               <m:mo class="MathClass-rel">=</m:mo>
               <m:mn>1</m:mn>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math></inline-formula> where <it>P</it>(<it>S </it>&gt;<it>t</it>|<it>T </it>= 0) is found using the Normal distribution and <it>P</it>(<it>S </it>&gt;<it>t</it>|<it>T </it>= 1) is found using the Gumbel distribution. Suppose the experimenter wanted to restrict the FDR calculation to identified spectra with only 0 missed cleavages. According to the output in the distribution of correct scores, a randomly selected correctly identified spectra has a 0.926 probability of having 0 missed cleavages and a 0.074 probability of having 1 to 2 missed cleavages. For the distribution of incorrect scores, probabilities are 0.404 and 0.596 for 0 and 1 missed cleavages respectively. The estimated FDR would then be</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i41"><m:mtable class="align-star" columnalign="left">
   <m:mtr>
      <m:mtd class="align-odd" columnalign="right">
         <m:mi>F</m:mi>
         <m:mover accent="true">
            <m:mrow>
               <m:mi>D</m:mi>
               <m:mi>R</m:mi>
            </m:mrow>
            <m:mo class="MathClass-op"> ^</m:mo>
         </m:mover>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mtd>
      <m:mtd class="align-even">
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#960;</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op">^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mstyle class="text">
                        <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                     </m:mstyle>
                  </m:mrow>
               </m:msub>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">&gt;</m:mo>
                     <m:mi>t</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>0</m:mn>
                     <m:mo class="MathClass-punc">,</m:mo>
                     <m:mi>N</m:mi>
                     <m:mspace class="thinspace" width="0.3em"/>
                     <m:mi>M</m:mi>
                     <m:mspace class="thinspace" width="0.3em"/>
                     <m:mi>C</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#960;</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op">^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mstyle class="text">
                        <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                     </m:mstyle>
                  </m:mrow>
               </m:msub>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">&gt;</m:mo>
                     <m:mi>t</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>0</m:mn>
                     <m:mo class="MathClass-punc">,</m:mo>
                     <m:mi>N</m:mi>
                     <m:mspace class="thinspace" width="0.3em"/>
                     <m:mi>M</m:mi>
                     <m:mspace class="thinspace" width="0.3em"/>
                     <m:mi>C</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-bin">+</m:mo>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>1</m:mn>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:msub>
                        <m:mrow>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>&#960;</m:mi>
                              </m:mrow>
                              <m:mo class="MathClass-op">^</m:mo>
                           </m:mover>
                        </m:mrow>
                        <m:mrow>
                           <m:mstyle class="text">
                              <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                           </m:mstyle>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">&gt;</m:mo>
                     <m:mi>t</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>f</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>1</m:mn>
                     <m:mo class="MathClass-punc">,</m:mo>
                     <m:mi>N</m:mi>
                     <m:mspace class="thinspace" width="0.3em"/>
                     <m:mi>M</m:mi>
                     <m:mspace class="thinspace" width="0.3em"/>
                     <m:mi>C</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:mfrac>
         <m:mspace width="2em"/>
      </m:mtd>
      <m:mtd class="align-label" columnalign="right"/>
      <m:mtd class="align-label">
         <m:mspace width="2em"/>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="align-odd" columnalign="right"/>
      <m:mtd class="align-even">
         <m:mo class="MathClass-rel">=</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#960;</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op">^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mstyle class="text">
                        <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                     </m:mstyle>
                  </m:mrow>
               </m:msub>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">&gt;</m:mo>
                     <m:mi>t</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>0</m:mn>
                     <m:mi>.</m:mi>
                     <m:mn>404</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mover accent="true">
                        <m:mrow>
                           <m:mi>&#960;</m:mi>
                        </m:mrow>
                        <m:mo class="MathClass-op">^</m:mo>
                     </m:mover>
                  </m:mrow>
                  <m:mrow>
                     <m:mstyle class="text">
                        <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                     </m:mstyle>
                  </m:mrow>
               </m:msub>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">&gt;</m:mo>
                     <m:mi>t</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>0</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>0</m:mn>
                     <m:mi>.</m:mi>
                     <m:mn>404</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-bin">+</m:mo>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>1</m:mn>
                     <m:mo class="MathClass-bin">-</m:mo>
                     <m:msub>
                        <m:mrow>
                           <m:mover accent="true">
                              <m:mrow>
                                 <m:mi>&#960;</m:mi>
                              </m:mrow>
                              <m:mo class="MathClass-op">^</m:mo>
                           </m:mover>
                        </m:mrow>
                        <m:mrow>
                           <m:mstyle class="text">
                              <m:mtext class="textsf" mathvariant="sans-serif">0</m:mtext>
                           </m:mstyle>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mi>P</m:mi>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mi>S</m:mi>
                     <m:mo class="MathClass-rel">&gt;</m:mo>
                     <m:mi>t</m:mi>
                     <m:mo class="MathClass-rel">|</m:mo>
                     <m:mi>T</m:mi>
                     <m:mo class="MathClass-rel">=</m:mo>
                     <m:mn>1</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mn>0</m:mn>
                     <m:mi>.</m:mi>
                     <m:mn>926</m:mn>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
         </m:mfrac>
         <m:mspace width="2em"/>
      </m:mtd>
      <m:mtd class="align-label" columnalign="right"/>
      <m:mtd class="align-label">
         <m:mspace width="2em"/>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="align-odd" columnalign="right"/>
      <m:mtd class="align-even">
         <m:mspace width="2em"/>
      </m:mtd>
      <m:mtd class="align-label" columnalign="right"/>
   </m:mtr>
</m:mtable>
</m:math>
               </display-formula>
            </p>
            <p indent="1">The calculation takes into account that among the correctly identified spectra, it is estimated that a majority of the identified spectra have 0 missed cleavages.</p>
            <p indent="1">A third approach of estimating the False Discovery Rate is to download all posterior probabilities, convert them to posterior error probabilities (local false discovery rates) by taking the complement, define a cutoff point <it>t </it>and then to calculate <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i42"><m:mrow>
   <m:mi>F</m:mi>
   <m:mover accent="true">
      <m:mrow>
         <m:mi>D</m:mi>
      </m:mrow>
      <m:mo class="MathClass-op"> ^</m:mo>
   </m:mover>
   <m:mi>R</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>t</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mo mathsize="big">&#8721;</m:mo>
            </m:mrow>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>s</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-rel">&#8805;</m:mo>
               <m:mi>t</m:mi>
            </m:mrow>
         </m:msub>
         <m:mi>P</m:mi>
         <m:mi>E</m:mi>
         <m:msub>
            <m:mrow>
               <m:mi>P</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
      <m:mrow>
         <m:mrow>
            <m:mo class="MathClass-open">{</m:mo>
            <m:mrow>
               <m:mi>#</m:mi>
               <m:msub>
                  <m:mrow>
                     <m:mi>s</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-rel">:</m:mo>
               <m:msub>
                  <m:mrow>
                     <m:mi>s</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
               <m:mo class="MathClass-rel">&#8805;</m:mo>
               <m:mi>t</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">}</m:mo>
         </m:mrow>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math></inline-formula>.</p>
            <p indent="1">2. False Positive Rate or p-value: using the Gumbel's estimated parameters, the false positive rate can be found by looking at the tail area.</p>
            <p indent="1">3. q-value: the q-value at a specific point <it>&#948; </it>can be calculated by estimating the False Discovery Rate at the score value of every identified spectra and then by finding the minimum False Discovery Rate among all scores <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i43"><m:msub>
   <m:mrow>
      <m:mi>s</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
<m:mo class="MathClass-rel">&#8804;</m:mo>
<m:mi>&#948;</m:mi>
</m:math></inline-formula>.</p>
            <p indent="1">4. Posterior Error Probability and Local False Discovery Rate: these are most easily found by finding the complement of the values in the first column of Figure <figr fid="F11">11</figr> or by looking at the complement of "prob" at the bottom center of Figure <figr fid="F12">12</figr>. Note that these probabilities automatically incorporate <it>NTT</it>, <it>NMC</it>, and &#916;<it>M</it>. If the experimenter was interested in the posterior error probability of a score independent of <it>NTT</it>, <it>NMC</it>, and &#916;<it>M</it>, this can still be calculated using the estimated model parameters.</p>
            <p>All inference for semisupervised and semiparametric PeptideProphet cases are identical. Inference would be identical for the adaptive version of PeptideProphet but it is not implemented in TPP at this time but is available from the authors upon request.</p>
            <p>Following the execution of PeptideProphet the next step in analysis is often the identification of proteins present in the sample. In this different analysis, the experimental unit changes from being a spectrum to a peptide. TPP can be used to run ProteinProphet, a computational algorithm that can utilize PeptideProphet's estimated probabilities to determine the probability for the presence of proteins in two steps <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. In the first step the posterior probability of a peptide being correctly identified from PeptideProphet is decreased for peptides that are the only peptide linked to a protein and increased for peptides that are linked to proteins explained by many peptides. In the second step the probability of a protein being in the sample is calculated as the probability that at least one of its associated peptides were identified in the sample. This is <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i44"><m:mn>1</m:mn>
<m:mo class="MathClass-bin">-</m:mo>
<m:msub>
   <m:mrow>
      <m:mo class="MathClass-op">&#8719;</m:mo>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
</m:msub>
<m:mrow>
   <m:mo class="MathClass-open">(</m:mo>
   <m:mrow>
      <m:mn>1</m:mn>
      <m:mo class="MathClass-bin">-</m:mo>
      <m:msubsup>
         <m:mrow>
            <m:mi>p</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>i</m:mi>
         </m:mrow>
         <m:mrow>
            <m:mi>&#8242;</m:mi>
         </m:mrow>
      </m:msubsup>
   </m:mrow>
   <m:mo class="MathClass-close">)</m:mo>
</m:mrow>
</m:math></inline-formula> if <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-13-S16-S1-i45"><m:msubsup>
   <m:mrow>
      <m:mi>p</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>i</m:mi>
   </m:mrow>
   <m:mrow>
      <m:mi>&#8242;</m:mi>
   </m:mrow>
</m:msubsup>
</m:math></inline-formula> is the adjusted probability of a peptide being in the sample where <it>i </it>is indexed from 1 to the number of peptides linked to the protein in question.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>PeptideProphet is available for use on the Trans-Proteomic Pipeline with many other database search tools (X!Tandem, MASCOT, OMSSA, Phenyx, ProbID, InsPecT, MyriMatch). The statistical approach of PeptideProphet is generalizable to any database search algorithm that returns a quantitative score for each identified spectrum.</p>
         <p>Although we used the Gamma and Normal distributions to model the components of the PeptideProphet model, there are no limitation to the choice of parametric distribution for describing the distributions of scores for incorrect and correct identifications in PeptideProphet. The Gumbel distribution, with parameters <it>&#956; </it>and <it>&#946; </it>is another common distribution used for the distribution of scores of incorrect identifications. A generalization of the Gumbel distribution is the Extreme Value Distribution. Additional information, such as the NTT, may be incorporated into the summarized score by using a different machine learning approach instead of a discriminant function. Quantities like the NTT were left out of the summarized discriminant score due to its discrete nature. For example, the logistic regression function would allow discrete and continuous covariates to be transformed into a single summarized score while separating identified spectra with <it>T </it>= 0 from identified spectra with <it>T </it>= 1.</p>
         <p>The Target-Decoy approach used in this manuscript is an approach that pioneered the use of decoys for the estimation of the False Discovery Rate and its results are often compared to other techniques <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. For the estimation of the False Discovery Rate PeptideProphet and Target-Decoy methods in our experience produce similar results especially when the semisupervised version of PeptideProphet is used as its search approach is similar to the concatenated version of Target-Decoy. In fact, PeptideProphet can be considered as an extension of the concatenated version of Target-Prophet because of its additional modeling objectives. PeptideProphet simply has distributional assumptions and can be used to estimate confidence of individual spectrum identifications or sets of spectrum identifications (local and global FDR estimates) whereas target-decoy is limited to sets (global FDR estimate only). Also, if there is heavy overlap Target-Decoy will outperform basic PeptideProphet but Semisupervised PeptideProphet and Target-Decoy should be similar.</p>
         <p>An alternative approach which relaxes the parametric assumptions is the variable component approach which uses an unknown mixture of Gaussians to represent the incorrect and correct distributions of scores <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The correct distribution is represented by a mixture distribution of <it>k</it><sub>0 </sub>normal distributions (that may have different means and variances) and the incorrect distribution is represented by a separate mixture distribution of <it>k</it><sub>1 </sub>normal distributions. Parameters <it>k</it><sub>0 </sub>and <it>k</it><sub>1 </sub>are unknown. Each score <it>s<sub>i </sub></it>is a member of either the overall correct or incorrect distributions, but are then further assigned as a member to one of the sub-components of the mixture representing the correct or incorrect distribution. Gibbs sampling is used to estimate the forms of the sub-components (which also suggests the complexity of this approach). Although the variable component and kernel methods perform similarly there are minor computational and modeling issues to consider <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The advantages to the variable component method are that: (1) The model is still parametric, which may help reduce the chance of overfitting, (2) Kernel estimation may over fit, especially if the bandwidth is too low, and (3) It does not completely rely on decoys for the negative whereas kernel density estimation uses decoys <it>only </it>for estimating the negative distribution. The advantages of the kernel approach are that: (1) The variable component method is much more computationally intensive and more complicated (and thus the Kernel Estimation is less intensive), (2) The variable component method requires the specification of priors, and (3) Kernel estimation is very well known and commonly used.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>K.M. implemented the statistical analysis framework, analyzed the datasets and wrote the manuscript. O.V. supervised the statistical aspects of the work, and wrote the manuscript. A.N. supervised the statistical and the mass spectrometry-based aspects of the work.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Hyungwon Choi for providing R-code for the PeptideProphet model fits. The work was supported in part by the NSF CAREER award DBI-1054826 to OV, and by NIH grants R01-GM-094231 and R01-CA-126239 to AN.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 13 Supplement 16, 2012: Statistical mass spectrometry-based proteomics. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S16</url>.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database</p></title><aug><au><snm>Eng</snm><fnm>J</fnm></au><au><snm>McCormack</snm><fnm>A</fnm></au><au><snm>Yates</snm><fnm>J</fnm></au></aug><source>American Society for Mass Spectrometry</source><pubdate>1994</pubdate><volume>5</volume><fpage>976</fpage><lpage>989</lpage><xrefbib><pubid idtype="doi">10.1016/1044-0305(94)80016-2</pubid></xrefbib></bibl><bibl id="B2"><title><p>TANDEM: matching proteins with tandem mass spectra</p></title><aug><au><snm>Craig</snm><fnm>R</fnm></au><au><snm>Beavis</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>9</issue><fpage>1466</fpage><lpage>1467</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth092</pubid><pubid idtype="pmpid" link="fulltext">14976030</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>General framework for developing and evaluating database scoring algorithms using the TANDEM search engine</p></title><aug><au><snm>MacLean</snm><fnm>B</fnm></au><au><snm>Eng</snm><fnm>J</fnm></au><au><snm>Beavis</snm><fnm>R</fnm></au><au><snm>McIntosh</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>22</issue><fpage>2830</fpage><lpage>2832</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl379</pubid><pubid idtype="pmpid" link="fulltext">16877754</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search</p></title><aug><au><snm>Keller</snm><fnm>A</fnm></au><au><snm>Nesvizhskii</snm><fnm>A</fnm></au><au><snm>Kolker</snm><fnm>E</fnm></au><au><snm>Aebersold</snm><fnm>R</fnm></au></aug><source>Analytical Chemistry</source><pubdate>2002</pubdate><volume>74</volume><fpage>5383</fpage><lpage>5392</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/ac025747h</pubid><pubid idtype="pmpid">12403597</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics</p></title><aug><au><snm>Nesvizhskii</snm><fnm>A</fnm></au></aug><source>Journal of Proteomics</source><pubdate>2010</pubdate><volume>73</volume><fpage>2092</fpage><lpage>2123</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jprot.2010.08.009</pubid><pubid idtype="pmcid">2956504</pubid><pubid idtype="pmpid" link="fulltext">20816881</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Head-to-head comparison of serum fractionation techniques</p></title><aug><au><snm>Whiteaker</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>H</fnm></au><au><snm>Eng</snm><fnm>J</fnm></au><au><snm>Fang</snm><fnm>R</fnm></au><au><snm>Piening</snm><fnm>B</fnm></au><au><snm>Feng</snm><fnm>L</fnm></au><au><snm>Lorentzen</snm><fnm>T</fnm></au><au><snm>Schoenherr</snm><fnm>R</fnm></au><au><snm>Keane</snm><fnm>J</fnm></au><au><snm>Holzman</snm><fnm>T</fnm></au><au><snm>Fitzgibbon</snm><fnm>M</fnm></au><au><snm>Lin</snm><fnm>C</fnm></au><au><snm>Zhang</snm><fnm>H</fnm></au><au><snm>Cooke</snm><fnm>K</fnm></au><au><snm>Liu</snm><fnm>T</fnm></au><au><snm>II</snm><fnm>DC</fnm></au><au><snm>Anderson</snm><fnm>L</fnm></au><au><snm>Watts</snm><fnm>J</fnm></au><au><snm>Smith</snm><fnm>R</fnm></au><au><snm>McIntosh</snm><fnm>M</fnm></au><au><snm>Paulovich</snm><fnm>A</fnm></au></aug><source>Journal of Proteome Research</source><pubdate>2007</pubdate><volume>6</volume><fpage>828</fpage><lpage>836</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/pr0604920</pubid><pubid idtype="pmpid" link="fulltext">17269739</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics</p></title><aug><au><snm>Choi</snm><fnm>H</fnm></au><au><snm>Nesvizhskii</snm><fnm>A</fnm></au></aug><source>Journal of Proteome Research</source><pubdate>2008</pubdate><volume>7</volume><fpage>254</fpage><lpage>265</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/pr070542g</pubid><pubid idtype="pmpid" link="fulltext">18159924</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>The standard protein mix database: a diverse data set to assist in the production of improved peptide and protein identification software tools</p></title><aug><au><snm>Klimek</snm><fnm>J</fnm></au><au><snm>Eddes</snm><fnm>J</fnm></au><au><snm>Hohmann</snm><fnm>L</fnm></au><au><snm>Jackson</snm><fnm>J</fnm></au><au><snm>Peterson</snm><fnm>A</fnm></au><au><snm>Letarte</snm><fnm>S</fnm></au><au><snm>Gafken</snm><fnm>P</fnm></au><au><snm>Katz</snm><fnm>J</fnm></au><au><snm>Mallick</snm><fnm>P</fnm></au><au><snm>Lee</snm><fnm>H</fnm></au><au><snm>Schmidt</snm><fnm>A</fnm></au><au><snm>Ossola</snm><fnm>R</fnm></au><au><snm>Eng</snm><fnm>J</fnm></au><au><snm>Aebersold</snm><fnm>R</fnm></au><au><snm>Martin</snm><fnm>D</fnm></au></aug><source>Journal of proteome research</source><pubdate>2007</pubdate><volume>7</volume><fpage>96</fpage><lpage>103</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2577160</pubid><pubid idtype="pmpid" link="fulltext">17711323</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>A direct approach to false discovery rates</p></title><aug><au><snm>Storey</snm><fnm>J</fnm></au></aug><source>Journal of the Royal Statistical Society. Series B</source><pubdate>2002</pubdate><volume>64</volume><issue>3</issue><fpage>479</fpage><lpage>498</lpage><xrefbib><pubid idtype="doi">10.1111/1467-9868.00346</pubid></xrefbib></bibl><bibl id="B10"><title><p>Microarrays, empirical Bayes and the two-groups model</p></title><aug><au><snm>Efron</snm><fnm>B</fnm></au></aug><source>Statistical Science</source><pubdate>2008</pubdate><volume>23</volume><fpage>1</fpage><lpage>22</lpage><xrefbib><pubid idtype="doi">10.1214/07-STS236</pubid></xrefbib></bibl><bibl id="B11"><title><p>Posterior error probabilities and false discovery rates: two sides of the same coin</p></title><aug><au><snm>Kall</snm><fnm>L</fnm></au><au><snm>Storey</snm><fnm>J</fnm></au><au><snm>MacCoss</snm><fnm>M</fnm></au></aug><source>Journal of Proteome Research</source><pubdate>2008</pubdate><volume>7</volume><fpage>40</fpage><lpage>44</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/pr700739d</pubid><pubid idtype="pmpid" link="fulltext">18052118</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling</p></title><aug><au><snm>Choi</snm><fnm>H</fnm></au><au><snm>Ghosh</snm><fnm>D</fnm></au><au><snm>Nesvizhskii</snm><fnm>A</fnm></au></aug><source>Journal of Proteome Research</source><pubdate>2008</pubdate><volume>7</volume><fpage>286</fpage><lpage>292</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/pr7006818</pubid><pubid idtype="pmpid" link="fulltext">18078310</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Adaptive discriminant function analysis and reranking of MS/MS database search results for improved peptide identification in shotgun proteomics</p></title><aug><au><snm>Ding</snm><fnm>Y</fnm></au><au><snm>Choi</snm><fnm>H</fnm></au><au><snm>Nesvizhskii</snm><fnm>A</fnm></au></aug><source>Journal of Proteome Research</source><pubdate>2008</pubdate><volume>7</volume><fpage>4878</fpage><lpage>4889</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/pr800484x</pubid><pubid idtype="pmpid" link="fulltext">18788775</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Maximum likelihood from incomplete data via the EM algorithm</p></title><aug><au><snm>Dempster</snm><fnm>A</fnm></au><au><snm>Laird</snm><fnm>N</fnm></au><au><snm>Rubin</snm><fnm>D</fnm></au></aug><source>Journal of the Royal Statistical Society. Series B</source><pubdate>1977</pubdate><volume>39</volume><fpage>1</fpage><lpage>38</lpage><url>http://www.jstor.org/discover/10.2307/2984875?uid=3738032&amp;uid=2&amp;uid=4&amp;sid=21101269442551</url></bibl><bibl id="B15"><title><p>The positive false discovery rate: a Bayesian interpretation and the q-value</p></title><aug><au><snm>Storey</snm><fnm>J</fnm></au></aug><source>Annals of Statistics</source><pubdate>2003</pubdate><volume>31</volume><issue>6</issue><fpage>2013</fpage><lpage>2035</lpage><xrefbib><pubid idtype="doi">10.1214/aos/1074290335</pubid></xrefbib></bibl><bibl id="B16"><title><p>Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry</p></title><aug><au><snm>Elias</snm><fnm>J</fnm></au><au><snm>Gygi</snm><fnm>S</fnm></au></aug><source>Nature Methods</source><pubdate>2007</pubdate><volume>4</volume><issue>3</issue><fpage>207</fpage><lpage>214</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nmeth1019</pubid><pubid idtype="pmpid" link="fulltext">17327847</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Assigning significance to peptides identified by tandem mass spectrometry using decoy databases</p></title><aug><au><snm>K&#228;ll</snm><fnm>L</fnm></au><au><snm>Storey</snm><fnm>J</fnm></au><au><snm>MacCoss</snm><fnm>M</fnm></au><au><snm>Noble</snm><fnm>W</fnm></au></aug><source>Journal of Proteome Research</source><pubdate>2008</pubdate><volume>7</volume><fpage>29</fpage><lpage>34</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/pr700600n</pubid><pubid idtype="pmpid" link="fulltext">18067246</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>A guided tour of the Trans Proteomic Pipeline</p></title><aug><au><snm>Deutsch</snm><fnm>E</fnm></au><au><snm>Mendoza</snm><fnm>L</fnm></au><au><snm>Shteynberg</snm><fnm>D</fnm></au><au><snm>Farrah</snm><fnm>T</fnm></au><au><snm>Lam</snm><fnm>H</fnm></au><au><snm>Tasman</snm><fnm>N</fnm></au><au><snm>Sun</snm><fnm>Z</fnm></au><au><snm>Nilsson</snm><fnm>E</fnm></au><au><snm>Pratt</snm><fnm>B</fnm></au><au><snm>Prazen</snm><fnm>B</fnm></au><au><snm>Eng</snm><fnm>JK</fnm></au><au><snm>Martin</snm><fnm>DB</fnm></au><au><snm>Nesvizhskii</snm><fnm>AI</fnm></au><au><snm>Aebersold</snm><fnm>R</fnm></au></aug><source>Proteomics</source><pubdate>2010</pubdate><volume>10</volume><fpage>1150</fpage><lpage>1159</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/pmic.200900375</pubid><pubid idtype="pmcid">3017125</pubid><pubid idtype="pmpid" link="fulltext">20101611</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>A statistical model for identifying proteins by tandem mass spectrometry</p></title><aug><au><snm>Nesvizhskii</snm><fnm>A</fnm></au><au><snm>Keller</snm><fnm>A</fnm></au><au><snm>Kolker</snm><fnm>E</fnm></au><au><snm>Aebersold</snm><fnm>R</fnm></au></aug><source>Analytical Chemistry</source><pubdate>2003</pubdate><volume>75</volume><url>http://pubs.acs.org/doi/abs/10.1021/ac0341261</url><xrefbib><pubid idtype="pmpid" link="fulltext">14632076</pubid></xrefbib></bibl></refgrp>
   </bm>
</art>