<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1755-8794-1-12</ui>
   <ji>1755-8794</ji>
   <fm>
      <dochead>Database</dochead>
      <bibl>
         <title>
            <p>HIP<sup>2</sup>: An online database of human plasma proteins from healthy individuals</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Saha</snm>
               <fnm>Sudipto</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>sahas@iupui.edu</email>
            </au>
            <au id="A2">
               <snm>Harrison</snm>
               <mi>H</mi>
               <fnm>Scott</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>sharrisn@iupui.edu</email>
            </au>
            <au id="A3">
               <snm>Shen</snm>
               <fnm>Changyu</fnm>
               <insr iid="I3"/>
               <email>chashen@iupui.edu</email>
            </au>
            <au id="A4">
               <snm>Tang</snm>
               <fnm>Haixu</fnm>
               <insr iid="I4"/>
               <email>hatang@indiana.edu</email>
            </au>
            <au id="A5">
               <snm>Radivojac</snm>
               <fnm>Predrag</fnm>
               <insr iid="I4"/>
               <email>predrag@indiana.edu</email>
            </au>
            <au id="A6">
               <snm>Arnold</snm>
               <mi>J</mi>
               <fnm>Randy</fnm>
               <insr iid="I5"/>
               <email>rarnold@indiana.edu</email>
            </au>
            <au id="A7">
               <snm>Zhang</snm>
               <fnm>Xiang</fnm>
               <insr iid="I6"/>
               <email>xiang.zhang@louisville.edu</email>
            </au>
            <au id="A8" ca="yes">
               <snm>Chen</snm>
               <mnm>Yue</mnm>
               <fnm>Jake</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I7"/>
               <email>jakechen@iupui.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Informatics, Indiana University &#8211; Purdue University, Indianapolis, USA</p>
            </ins>
            <ins id="I2">
               <p>Indiana Center for Systems Biology and Personalized Medicine, Indiana University &#8211; Purdue University, Indianapolis, Indianapolis, USA</p>
            </ins>
            <ins id="I3">
               <p>Division of Biostatistics, Indiana University School of Medicine, Indianapolis, USA</p>
            </ins>
            <ins id="I4">
               <p>School of Informatics, Indiana University, Bloomington, USA</p>
            </ins>
            <ins id="I5">
               <p>Department of Chemistry, Indiana University, Bloomington, USA</p>
            </ins>
            <ins id="I6">
               <p>Department of Chemistry, University of Louisville, Louisville, USA</p>
            </ins>
            <ins id="I7">
               <p>Department of Computer &amp; Information Science, Purdue University, Indianapolis, USA</p>
            </ins>
         </insg>
         <source>BMC Medical Genomics</source>
         <issn>1755-8794</issn>
         <pubdate>2008</pubdate>
         <volume>1</volume>
         <issue>1</issue>
         <fpage>12</fpage>
         <url>http://www.biomedcentral.com/1755-8794/1/12</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18439290</pubid>
               <pubid idtype="doi">10.1186/1755-8794-1-12</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>13</day>
               <month>2</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>25</day>
               <month>4</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>25</day>
               <month>4</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Saha et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>With the introduction of increasingly powerful mass spectrometry (MS) techniques for clinical research, several recent large-scale MS proteomics studies have sought to characterize the entire human plasma proteome with a general objective for identifying thousands of proteins leaked from tissues in the circulating blood. Understanding the basic constituents, diversity, and variability of the human plasma proteome is essential to the development of sensitive molecular diagnosis and treatment monitoring solutions for future biomedical applications. Biomedical researchers today, however, do not have an integrated online resource in which they can search for plasma proteins collected from different mass spectrometry platforms, experimental protocols, and search software for healthy individuals. The lack of such a resource for comparisons has made it difficult to interpret proteomics profile changes in patients' plasma and to design protein biomarker discovery experiments.</p>
            </sec>
            <sec>
               <st>
                  <p>Description</p>
               </st>
               <p>To aid future protein biomarker studies of disease and health from human plasma, we developed an online database, HIP<sup>2 </sup>(Healthy Human Individual's Integrated Plasma Proteome). The current version contains 12,787 protein entries linked to 86,831 peptide entries identified using different MS platforms.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>This web-based database will be useful to biomedical researchers involved in biomarker discovery research. This database has been developed to be the comprehensive collection of healthy human plasma proteins, and has protein data captured in a relational database schema built to contain mappings of supporting peptide evidence from several high-quality and high-throughput mass-spectrometry (MS) experimental data sets. Users can search for plasma protein/peptide annotations, peptide/protein alignments, and experimental/sample conditions with options for filter-based retrieval to achieve greater analytical power for discovery and validation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A surge of interest in defining molecular biomarkers of health and disease from human plasma has recently emerged with the recent launch of the pilot Plasma Proteome Project (PPP) by the Human Proteome Organization (HUPO) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The easy clinical access and processing of plasma samples, and the abundance of proteins as well as metabolites that may collectively define a person's health status, have made human plasma the top choice among bio-fluids for future clinical molecular diagnostic applications. The fluctuating nature of blood from different individuals, huge dynamic protein concentration ranges (up to 10<sup>12</sup>), and the protein detection limits of most MS platforms, have made the plasma proteome elusive to define. Many proteomics researchers even believe that the current "plasma proteome" observed by a single shotgun MS experiment is analogous to a stochastic sampling of the human proteome, with low run-to-run consistencies and inherent detection biases peculiar to each type of MS platform <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Even when used for healthy individual plasma, with multi-dimensional separations and advanced bioinformatics search software tools, proteins identified in different shotgun MS/MS plasma proteomics experiments are often inconsistent with each other except for the most abundant proteins. To overcome the poor coverage, potential bias, and complementary nature of each experimental measurement of the human plasma proteome, it is necessary for biomedical researchers to collect and assess all reliable publicly-available plasma protein data sets generated from different MS analytical and computational platforms for healthy individuals. A comprehensive integrated resource of the human plasma proteins for healthy individuals, currently missing in the field of clinical proteomics, would enable researchers to understand the basic constituents, diversity, and variability of the human plasma proteome. Such a resource would provide a high amount of comparative power for interpreting proteomics profile changes in patients' plasma, and may supplement or compensate for limitations and biases associated with the set of controls for a given study. It would also improve the ability for finding protein biomarkers that are known to occur in healthy human plasma for instances where a protein is differentially expressed in a patient sample related to the quantities observed in the study control.</p>
         <p>Although multiple projects to profile the human plasma proteome have been attempted, including PeptideAtlas, GPMO, HUPO PPP, and several recent publications <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>, an unaddressed need has been for a compiled, central repository structured to enable the stable retrieval, comparison and querying of results. The existing sources vary widely in terms of data set size, available user interface, experimental protocol or sample details, choices of protein identifiers, linking to peptide evidence from MS experiments, MS search software used, and extent of data annotation. This information needs to be compiled further and assembled for end users before they can consider incorporating human plasma proteome data into their studies. The largest single source of data is from an independently conducted experimental study that utilized a ion-mobility spectrometry (IMS) platform to chart the existence of 9,087 proteins based on 37,842 unique inferred peptide sequences, of which 2,928 proteins are high-confidence <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. There is not however a web-based interface or other online resource for making this data widely available. The other sources of data we examined have some form of online presentation (aside from publication) and range in size from less than one thousand to over several thousand identified proteins. The Plasma Proteome Database (PPD) provides a web interface and is geared for providing detailed functional annotations of 3,778 distinct proteins based on data extracted from the literature, yet the PPD provides information on neither experimental protocol or associated MS-detected peptides used for protein identification <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. HUPO PPP information consists of 3,020 proteins and 47,950 peptides along with experimental protocol information and is available to the public online, but the data is only accessible as flat files <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. The Institute of Systems Biology (ISB) has surveyed and analyzed a comparably smaller set of data produced by 28 human plasma proteomics experiments, and has reported an approximate count of 960 proteins based on the 6,929 distinct observed peptides in their web-interfaced PeptideAtlas database <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Another resource, providing evidence for human proteomics based mainly on data from HUPO PPP, is hosted by the Global Proteome Machine Organization (GPMO). An important feature of the GPMO database is that it provides annotated information to assist with the difficult process of validating peptide MS/MS spectra and patterns of protein coverage <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. GPMO also includes data from non-human organisms such as cats, guinea pigs, rabbits, unicellular eukaryotes like yeasts, as well as a number of prokaryotic organisms.</p>
         <p>By gathering the protein and peptide data used to characterize the proteome of a healthy individual, we made an attempt to develop a resource that presents a comparative baseline of plasma proteomics results against which proteomic data from patients with diseases such as cancer, neurodegenerative diseases, metabolic diseases, and other genetic disorders may be studied. In this effort, we define "healthy" or "normal" as human adults without major known life-threatening diseases, genetic diseases, HIV, or inflammation at the time of blood drawing (a slightly more stringent variation than the HUPO definition in Omenn <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>). On these premises, we developed an integrated database HIP<sup>2 </sup>(Healthy Human Individual's Integrated Plasma Proteome) by compiling all of the existing experimental data performed on healthy individual samples, and creating a web-based interface to aid the many upcoming projects of protein biomarker studies of health and disease <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. With HIP<sup>2</sup>, clinical samplings of patient plasma may be better compared to random or non-random aspects of overlap with the reported set of healthy human plasma proteins.</p>
      </sec>
      <sec>
         <st>
            <p>Construction and content</p>
         </st>
         <p>We collected 3,020 proteins and 47,950 peptides from HUPO PPP in text format <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>; 9,087 proteins and 37,842 peptides from David Clemmer's group (DCG) recent publication supplemental material (in PDF format) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>; 788 proteins and 6,039 peptides from PeptideAtlas (through successive querying of the web interface) <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and 1,175 proteins from Leigh Anderson's group (LAG) (in PDF format) <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. HUPO contain two datasets: i) a dataset consisting of 3,020 proteins, and ii) a dataset consisting of 9,504 proteins. In our case, we included the high confidence core dataset of 3,020 proteins, where there are two or more peptide hits. HUPO datasets come from 35 collaborating laboratories, and for the single-peptide hits in the second data set, the number of laboratories can itself introduce high amounts of noise from diverse laboratory procedures and MS platforms, potentially acting to reduce value of the data resource for large-scale interpretation. In contrast, the single-peptide hits used in HIP<sup>2 </sup>come from a single laboratory source, DCG. The uniformity attributable to this single laboratory source (and single MS platform) merited inclusion of all 9,087 proteins coming from this group. Data from the PeptideAtlas database were also included, but was filtered to only include those with publicly available experimental details. The integration of data from different resources is a non-trivial bioinformatics task, because the data sets are not standardized syntactically or semantically. To resolve syntactic incompatibility, we wrote perl programs and converted all the original data sets into comma-delimited files, and loaded them into an Oracle 10G database server as staging tables. To resolve semantic incompatibility, we developed a data model first as shown in Fig. <figr fid="F1">1</figr>, and then converted all the incompatible identifiers to standard IPI accession numbers <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, using a mapping table downloaded from BioMart <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. HUPO PPP and PeptideAtlas proteins are indexed with IPI accession numbers, whereas the other sources of data are indexed with other types of identifiers such as Swiss-Prot accession numbers and RefSeq IDs. Correspondence with other identifiers was achieved for some of the text-based sources by parsing the sequence header data inside FASTA files, and the integrity of the mapping has been validated through comparisons of referenced sequences. A subset of 247 proteins from the LAG data set did not have correspondence with IPI accession numbers, and this subset was excluded from comparisons between the four sources of data. The set of 788 proteins from PeptideAtlas represents a subset of unique proteins from a larger set of 960 proteins. The data from DCG contains multiple splice variants labeled by numbers appended to Swiss-Prot identifier strings, and the first enumerated variant was found to map to the IPI accession number in all cases. Table <tblr tid="T1">1</tblr> presents a union built from the four sources of data used in HIP<sup>2</sup>, and also categorizes proteins in terms of MS-based detection platform and search software.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Relational data model schema for the table</p>
            </caption>
            <text>
               <p><b>Relational data model schema for the table</b>. Protein, peptide evidence, MS Experiment and sample.</p>
            </text>
            <graphic file="1755-8794-1-12-1"/>
         </fig>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Summary of HIP<sup>2 </sup>database. The numbers of peptides and proteins represent unique entries that are the union of multiple subjects, possibly from different ethnic groups.</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c ca="left">
                     <p>Source</p>
                  </c>
                  <c ca="left">
                     <p>Platform</p>
                  </c>
                  <c ca="left">
                     <p>Peptides</p>
                  </c>
                  <c ca="left">
                     <p>Proteins</p>
                  </c>
                  <c ca="left">
                     <p>Search Software</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><b>HUPO PPP </b>(3,020 proteins and 47,950 peptides)</p>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_DECA</p>
                  </c>
                  <c ca="left">
                     <p>712</p>
                  </c>
                  <c ca="left">
                     <p>348</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_DECAXP</p>
                  </c>
                  <c ca="left">
                     <p>5796</p>
                  </c>
                  <c ca="left">
                     <p>2149</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_LCQ</p>
                  </c>
                  <c ca="left">
                     <p>1818</p>
                  </c>
                  <c ca="left">
                     <p>427</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST/SONAR</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_QSTAR</p>
                  </c>
                  <c ca="left">
                     <p>309</p>
                  </c>
                  <c ca="left">
                     <p>137</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_QTOF</p>
                  </c>
                  <c ca="left">
                     <p>5078</p>
                  </c>
                  <c ca="left">
                     <p>573</p>
                  </c>
                  <c ca="left">
                     <p>MASCOT</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_QTRAP</p>
                  </c>
                  <c ca="left">
                     <p>195</p>
                  </c>
                  <c ca="left">
                     <p>51</p>
                  </c>
                  <c ca="left">
                     <p>MASCOT</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>MALDI_MS/MS_QSTAR</p>
                  </c>
                  <c ca="left">
                     <p>384</p>
                  </c>
                  <c ca="left">
                     <p>60</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST/MASCOT</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><b>David Clemmer's group </b>(9,087 proteins and 37,842 peptides)</p>
                  </c>
                  <c ca="left">
                     <p>IMS_MS/MS_TOF</p>
                  </c>
                  <c ca="left">
                     <p>35781</p>
                  </c>
                  <c ca="left">
                     <p>9,087</p>
                  </c>
                  <c ca="left">
                     <p>MASCOT</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><b>PeptideAtlas </b>(788 proteins and 6,039 peptides)</p>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_DECA</p>
                  </c>
                  <c ca="left">
                     <p>260</p>
                  </c>
                  <c ca="left">
                     <p>110</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_DECAXP</p>
                  </c>
                  <c ca="left">
                     <p>317</p>
                  </c>
                  <c ca="left">
                     <p>159</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_LCQ</p>
                  </c>
                  <c ca="left">
                     <p>1101</p>
                  </c>
                  <c ca="left">
                     <p>263</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_QSTAR</p>
                  </c>
                  <c ca="left">
                     <p>14</p>
                  </c>
                  <c ca="left">
                     <p>14</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>ESI-MS/MS_QTOF</p>
                  </c>
                  <c ca="left">
                     <p>728</p>
                  </c>
                  <c ca="left">
                     <p>215</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="left">
                     <p>LC_MS/MS*</p>
                  </c>
                  <c ca="left">
                     <p>1153</p>
                  </c>
                  <c ca="left">
                     <p>250</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p><b>Leigh Anderson's group </b>(1,175 proteins)</p>
                  </c>
                  <c ca="left">
                     <p>2DEMS &amp; LC_MS/MS*</p>
                  </c>
                  <c ca="left">
                     <p>----</p>
                  </c>
                  <c ca="left">
                     <p>928</p>
                  </c>
                  <c ca="left">
                     <p>SEQUEST</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>*Not defined</p>
            </tblfn>
         </tbl>
         <p>We show the overall design of the database in Fig. <figr fid="F2">2</figr>. Queries from web site users are connected to a backend relational Oracle 10G database hosted at Indiana University's High-Performance Computing Facility. The web interface and database connectivity was implemented with PHP and perl. The HIP<sup>2 </sup>web interface outputs results for user queries into three different types of web pages: protein pages, peptide pages, and experimental information pages. The database result pages are also linked to external web pages (see details on the "Help" section of the HIP<sup>2 </sup>database home site). The protein information page provides information as to whether a protein is in the healthy human individual's plasma proteome, what function it has, what peptides can be mapped to the protein, and how the peptides are aligned according to trypsin digestion criteria <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The HIP<sup>2 </sup>protein-peptide alignment map highlights potential trypsin cleavage sites in red. Identified peptides are displayed in green and are aligned underneath the corresponding sequence in the protein. The peptide information page provides peptide summaries, experimental evidence and the aforementioned peptide-protein alignment maps. Peptide summaries provide information about the amino acid sequence length, a link to the PeptideAtlas database for each peptide, and a link to the Statistical Analysis of Protein Sequences server <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> for researchers interested in peptide compositional patterns. The experimental information page contains additional details about the experiment whenever disclosed including the human subject's ethnic group, gender, sample preparation method, protein separation, material type, peptide separation, depletion method, and reduction method (treatment of indoacetamide). HIP<sup>2 </sup>also includes information on material type to distinguish plasma and serum. We plan to develop an automated web interface for future data contributions, using a standard data submission format that is compliant with the upcoming MIAPE proteomics standards <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>The overall design of the HIP<sup>2 </sup>database</p>
            </caption>
            <text>
               <p><b>The overall design of the HIP<sup>2 </sup>database</b>. 1 = Query of protein "X" observed in plasma proteome; 2 = Query of peptide "Y" observed in plasma proteome; 3 = Output result of "X" protein page ; 4 = Link to external database; 5 = Link to experimental page of protein "X"; 6 = Link to the peptide page of the protein "X"; 7 = Output result of peptide "Y"; 8 = Iterative query from peptide page to search for other proteins associated with the peptide "Y".</p>
            </text>
            <graphic file="1755-8794-1-12-2"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Utility and discussion</p>
         </st>
         <p>The HIP<sup>2 </sup>database provides protein biologists and clinical biomedical researchers with a new gateway for exploring proteins from the human proteome with peptide-level evidence found in plasma. The basic questions that the HIP<sup>2 </sup>database helps biological researchers answer includes "whether a protein may be found in human plasma" and "how likely or easily it is for a protein to be observed in healthy human plasma with mass spectrometry." The HIP<sup>2 </sup>database allows its users to assess the confidence of identifying plasma proteins in "normal" plasma MS proteomics experiments by examining such evidence as the number of matched peptide hits, data sources covered, MS experiments observed, types of MS platforms, and search software used. The protein and peptide sequence information also enables the user to examine peptide evidence that may be mapped to different gene splice variants and protein isoforms. Partial protein trypsin digestions can be evaluated based on multiple peptide to protein alignment information presented in the database. The overall quality of digested peptide mapped to proteins can be further used by mass spectrometry data analysts to assess different performance of MS proteomics platforms or samples. A typical example in the HIP<sup>2 </sup>database of a protein-peptide sequence comparison is how the peptide sequence 'KQSAGLVLWGAILFVAWNALLLLFFWTRPAPGRPPSVSALDGDPASLTR' is present in two proteins, IPI00000138 and IPI00179044, and aligns with the same sites of trypsin cleavage (as shown in Fig. <figr fid="F6">6</figr>). In the case of protein IPI00000138, evidence for the protein was found in three data sources, three MS experiments, three MS platforms, two MS search software and six mapped peptide sequences, whereas in the case of protein 'IPI00179044', there are not any experimentally proven peptides from MS results.</p>
         <p>The HIP<sup>2 </sup>database has been mapped to cover a significant portion of the human proteome, which can be retrieved by user queries of plasma proteins or peptides, with full MS experimental context and detailed peptide-to-protein mapping relationships. As of the latest version (Feb 2008), the HIP<sup>2 </sup>database aggregates information from 14 different protein separation/MS analytical platforms, 63 different samples, and 6 different types of MS search software. The HIP<sup>2 </sup>database provides associated identifications of 12,787 protein entries representing 11,588 unique proteins covering 17% of all 67,511 proteins in the human IPI database <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Over 77% of the proteins included are identified from two or more different peptide sequences as shown in Fig. <figr fid="F3">3</figr>, where identification is defined by both peptide detection and a protein sequence match. We observed that almost 24% of the proteins included have been identified from two or more experimental platforms as shown in Fig. <figr fid="F4">4</figr>, while only 106 proteins (&lt;1%) are identified by all four sources, implicating substantial variation in protein detections caused by a number of factors including experimental parameters and subject-to-subject variation (Fig. <figr fid="F5">5</figr>). Finding detailed instances of experimental detection of proteins by experimentally proven peptides is necessary to evaluate future options for experimental detection. For biostatisticians, the HIP<sup>2 </sup>database can be a supplemental resource for assessing the likelihood of observing different peptides of the same plasma proteins from different platforms. Our resource is also useful to computational proteomics researchers, since we enable navigation among various ranges of association between a peptide sequence and a candidate protein sequence; this information may be valuable for assigning probabilistic confidence of all human proteins, whether or not they have been experimentally observed in human plasma <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The study of how search software is used in the experimental protocol, especially for how sequence alignment and locations of potential tryptic cleavage sites (highlighted by the HIP<sup>2 </sup>interface) influence peptide detection, can now be directly addressed by the proteomics community. This online HIP<sup>2 </sup>database can be a valuable source of information to biomedical researchers as they interpret proteomics profile changes in patients' plasma and pursue biomarker discovery.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Protein identification</p>
            </caption>
            <text>
               <p><b>Protein identification</b>. Number of proteins (Y axis) versus the number of distinct peptides used for the protein identification (X axis).</p>
            </text>
            <graphic file="1755-8794-1-12-3"/>
         </fig>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Total proteins categorized by the number of platforms that identify them</p>
            </caption>
            <text>
               <p><b>Total proteins categorized by the number of platforms that identify them</b>. Numbers in the legend refer to the number of platforms.</p>
            </text>
            <graphic file="1755-8794-1-12-4"/>
         </fig>
         <fig id="F5">
            <title>
               <p>Figure 5</p>
            </title>
            <caption>
               <p>Overlapping of plasma proteins identified from different sources</p>
            </caption>
            <text>
               <p><b>Overlapping of plasma proteins identified from different sources</b>. C = Clemmer's group; H = HUPO PPP ; L = Leigh Anderson's group; P = PepAtlas data source.</p>
            </text>
            <graphic file="1755-8794-1-12-5"/>
         </fig>
         <fig id="F6">
            <title>
               <p>Figure 6</p>
            </title>
            <caption>
               <p>Snapshot of the peptide query search</p>
            </caption>
            <text>
               <p>
                  <b>Snapshot of the peptide query search.</b>
               </p>
            </text>
            <graphic file="1755-8794-1-12-6"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The primary goal of the HIP<sup>2 </sup>database is to support future clinical proteomics research, especially the discovery of biomarkers through plasma proteomics profiling. For biomedical researchers interested in MS-based plasma biomarker studies, HIP<sup>2 </sup>can be the first database to search against a list of candidate proteins/genes or peptides before choices of prioritized biomarker candidates are made. As the database grows, additional annotation information of human plasma proteins such as relative abundance, normal range of variability, detectability, peptidomic patterns associative with cleavage, mutation and putative sites of glycosylation will be added. HIP<sup>2 </sup>helps provide an integrated interface where database curators and data contributors can work together to collect ongoing published data from healthy human plasma proteomics experiments, foster community-based assessment of presence and absence of proteins in healthy human plasma, and provide a centralized data repository for subsequent bioinformatics analysis of the consistency and biases of each MS proteomics platform or search software. We expect this database to become an essential clinical proteomics resource, helping link together the community of biomedical researchers engaged in biomarker studies and the community of mass spectrometry researchers developing sensitive analytical solutions.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>The online content of HIP<sup>2 </sup>is freely available to all WWW users. The database infrastructure and software tools used to develop the database are subject to the intellectual property protection terms of Indiana University.</p>
         <p>Project name: Database for Healthy Human Individual's Integrated Plasma Proteome</p>
         <p>Project home page: HIP<sup>2 </sup>website <abbrgrp><abbr bid="B17">17</abbr></abbrgrp></p>
         <p>Browser requirements: Modern browsers (e.g., Firefox or Microsoft Explorer) will function satisfactorily.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JYC conceived the idea and constructed the general design of the database. SS collected data. SS and SHH processed data for insertion into an integrated database. SS and JYC built the web interface, and online documentation was written by RJA and JYC. SS, SHH and CS performed the statistical analyses and wrote the paper. RJA and XZ resolved issues with describing and categorizing MS platform technology. HT and PR helped characterize the analytical scope of peptide-based detection methodology. All authors participated with overall planning, editing and reviewing of the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work was partially supported by a Clinical Proteomic Technology Assessment for Cancer (CPTAC) grant from the National Cancer Institute (U24CA126480-01), part of NCI's Clinical Proteomic Technologies Initiative. We thank Fred Regnier and Charles Buck from Purdue University for support of this project. We thank Ron Beavis, David Tabb, and Steve Stein for helpful initial discussions that led to the conceptualization of this work.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Advancement of biomarker discovery and validation through the HUPO plasma proteome project</p>
            </title>
            <aug>
               <au>
                  <snm>Omenn</snm>
                  <fnm>GS</fnm>
               </au>
            </aug>
            <source>Disease markers</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>3</issue>
            <fpage>131</fpage>
            <lpage>134</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15502245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data</p>
            </title>
            <aug>
               <au>
                  <snm>Carr</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Aebersold</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baldwin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Burlingame</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Clauser</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nesvizhskii</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Cell Proteomics</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>6</issue>
            <fpage>531</fpage>
            <lpage>533</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">15075378</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Human Plasma PeptideAtlas</p>
            </title>
            <aug>
               <au>
                  <snm>Deutsch</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Eng</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Nesvizhskii</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yi</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Ossola</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aebersold</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <issue>13</issue>
            <fpage>3497</fpage>
            <lpage>3500</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16052627</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Using the global proteome machine for protein identification</p>
            </title>
            <aug>
               <au>
                  <snm>Beavis</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Methods in molecular biology (Clifton, NJ)</source>
            <pubdate>2006</pubdate>
            <volume>328</volume>
            <fpage>217</fpage>
            <lpage>228</lpage>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database</p>
            </title>
            <aug>
               <au>
                  <snm>Omenn</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>States</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Adamski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Blackwell</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Menon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hermjakob</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Haab</snm>
                  <fnm>BB</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Eddes</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Kapp</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Moritz</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Chan</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Rai</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Admon</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Aebersold</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Eng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hancock</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Hefta</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Paik</snm>
                  <fnm>YK</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Ping</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pounds</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Adkins</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Qian</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wasinger</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CY</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>X</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proteomics</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <issue>13</issue>
            <fpage>3226</fpage>
            <lpage>3245</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16104056</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Mapping the human plasma proteome by SCX-LC-IMS-MS</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Valentine</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Plasencia</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Trimpin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Naylor</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Clemmer</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Journal of the American Society for Mass Spectrometry</source>
            <pubdate>2007</pubdate>
            <volume>18</volume>
            <issue>7</issue>
            <fpage>1249</fpage>
            <lpage>1264</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17553692</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Plasma Proteome Database as a resource for proteomics research</p>
            </title>
            <aug>
               <au>
                  <snm>Muthusamy</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hanumanthu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Suresh</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rekha</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Srinivas</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Karthick</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vrushabendra</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Sharma</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mishra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chatterjee</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Mangala</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Shivashankar</snm>
                  <fnm>HN</fnm>
               </au>
               <au>
                  <snm>Chandrika</snm>
                  <fnm>KN</fnm>
               </au>
               <au>
                  <snm>Deshpande</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Suresh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kannabiran</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Niranjan</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Nalli</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Prasad</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Arun</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Reddy</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chandran</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jadhav</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Julie</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Mahesh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>John</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Palvankar</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sudhir</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bala</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rashmi</snm>
                  <fnm>NS</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proteomics</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <issue>13</issue>
            <fpage>3531</fpage>
            <lpage>3536</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16041672</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Plasma Proteome Project</p>
            </title>
            <url>http://www.bioinformatics.med.umich.edu/hupo/ppp</url>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Clinical Proteomics Technologies for Cancer</p>
            </title>
            <url>http://proteomics.cancer.gov/</url>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The human plasma proteome: a nonredundant list developed by combination of four separate sources</p>
            </title>
            <aug>
               <au>
                  <snm>Anderson</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Polanski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pieper</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gatlin</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tirumalai</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Conrads</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Veenstra</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Adkins</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Pounds</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Fagan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lobley</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Cell Proteomics</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>4</issue>
            <fpage>311</fpage>
            <lpage>326</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14718574</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The International Protein Index: an integrated database for proteomics experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Kersey</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Duarte</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Karavidopoulou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2004</pubdate>
            <volume>4</volume>
            <issue>7</issue>
            <fpage>1985</fpage>
            <lpage>1988</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15221759</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>BioMart</p>
            </title>
            <url>http://www.biomart.org/biomart/martview/</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Trypsin cleaves exclusively C-terminal to arginine and lysine residues</p>
            </title>
            <aug>
               <au>
                  <snm>Olsen</snm>
                  <fnm>JV</fnm>
               </au>
               <au>
                  <snm>Ong</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Cell Proteomics</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>6</issue>
            <fpage>608</fpage>
            <lpage>614</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15034119</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Statistical Analysis of Protein Sequences</p>
            </title>
            <url>http://www.ebi.ac.uk/saps/</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The minimum information about a proteomics experiment (MIAPE)</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Paton</snm>
                  <fnm>NW</fnm>
               </au>
               <au>
                  <snm>Lilley</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Binz</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Julian</snm>
                  <fnm>RK</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Aebersold</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Deutsch</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Heck</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Leitner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Macht</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martens</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Neubert</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Patterson</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Ping</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Seymour</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Souda</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Tsugita</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vandekerckhove</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Vondriska</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Whitelegge</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Wilkins</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Xenarios</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Yates</snm>
                  <fnm>JR</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Hermjakob</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nature biotechnology</source>
            <pubdate>2007</pubdate>
            <volume>25</volume>
            <issue>8</issue>
            <fpage>887</fpage>
            <lpage>893</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">17687369</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A machine learning approach to predicting peptide fragmentation spectra</p>
            </title>
            <aug>
               <au>
                  <snm>Arnold</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Jayasankar</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Aggarwal</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Radivojac</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Pacific Symposium on Biocomputing</source>
            <pubdate>2006</pubdate>
            <fpage>219</fpage>
            <lpage>230</lpage>
            <xrefbib>
               <pubid idtype="pmpid">17094241</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>HIP<sup>2 </sup>website</p>
            </title>
            <url>http://bio.informatics.iupui.edu/HIP2/</url>
         </bibl>
      </refgrp>
      <sec>
         <st>
            <p>Pre-publication history</p>
         </st>
         <p>The pre-publication history for this paper can be accessed here:</p>
         <p>
            <url>http://www.biomedcentral.com/1755-8794/1/12/prepub</url>
         </p>
      </sec>
   </bm>
</art>
