<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2105-11-220</ui><ji>1471-2105</ji><fm>
<dochead>Software</dochead>
<bibl>
<title>
<p>Integration of open access literature into the RCSB Protein Data Bank using BioLit</p>
</title>
<aug>
<au ca="yes" id="A1"><snm>Prli&#263;</snm><fnm>Andreas</fnm><insr iid="I1"/><email>andreas.prlic@gmail.com</email></au>
<au id="A2"><snm>Martinez</snm><mi>A</mi><fnm>Marco</fnm><insr iid="I2"/><email>mam002@ucsd.edu</email></au>
<au id="A3"><snm>Dimitropoulos</snm><fnm>Dimitris</fnm><insr iid="I1"/><email>ddimitrop@sdsc.edu</email></au>
<au id="A4"><snm>Beran</snm><fnm>Bojan</fnm><insr iid="I1"/><email>bberan@sdsc.edu</email></au>
<au id="A5"><snm>Yukich</snm><mi>T</mi><fnm>Benjamin</fnm><insr iid="I1"/><email>byukich@gmail.com</email></au>
<au id="A6"><snm>Rose</snm><mi>W</mi><fnm>Peter</fnm><insr iid="I1"/><email>pwrose@ucsd.edu</email></au>
<au id="A7"><snm>Bourne</snm><mi>E</mi><fnm>Philip</fnm><insr iid="I1"/><insr iid="I2"/><email>bourne@sdsc.edu</email></au>
<au ca="yes" id="A8"><snm>Fink</snm><fnm>J Lynn</fnm><insr iid="I2"/><email>lynn.fink@gmail.com</email></au>
</aug>
<insg>
<ins id="I1"><p>San Diego Supercomputer Center, University of California San Diego, 9500 Gilman Drive, Mailcode 0505 La Jolla, CA 92093-0505 USA</p></ins>
<ins id="I2"><p>Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, Mailcode 0743, La Jolla, CA, 92093-0743 USA</p></ins>
</insg>
<source>BMC Bioinformatics</source>
<issn>1471-2105</issn>
<pubdate>2010</pubdate>
<volume>11</volume>
<issue>1</issue>
<fpage>220</fpage>
<url>http://www.biomedcentral.com/1471-2105/11/220</url>
<xrefbib><pubidlist><pubid idtype="pmpid">20429930</pubid><pubid idtype="doi">10.1186/1471-2105-11-220</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>13</day><month>8</month><year>2009</year></date></rec><acc><date><day>29</day><month>4</month><year>2010</year></date></acc><pub><date><day>29</day><month>4</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Prli&#263; et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB).</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly accessible. A client library is provided that supports querying these services (Java).</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB structure even if it is not formally cited in the paper. Other structures related through the same literature references can also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both database and literature content.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Biological databases and the biological literature have traditionally been distinct resources. The main difference being that databases are providing structured information, while articles are essentially free text and unstructured. As we have argued in the past <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
</abbrgrp>, this is an artificial distinction, in part defined by the way each resource is perceived, and in part because databases are an on-line medium and journals have traditionally been hardcopy. As the scientific literature has moved on-line, the distinction has blurred. Databases have become more like the literature by including an increased amount of annotation, often extracted from the literature <abbrgrp>
<abbr bid="B4">4</abbr>
</abbrgrp>. Conversely, the literature has become more like databases by including an increasing amount of supplemental information, including the data derived from the experiments described in the research article.</p>
<p>The blurring of the distinction between the biological literature and biological databases is furthered by open access that (depending on the specific open access license) provides literature in a free and unrestricted XML marked-up form as defined by the National Library of Medicine (NLM) Document Type Definition (DTD). By parsing the literature available in this XML form it is possible to extract semantic meaning. We have taken a relatively simple approach to this by extracting semantics associated with database identifiers and ontology terms <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. With these terms, which are typically already well defined in a variety of biological databases, it is possible to create interesting associations between database and literature content.</p>
<p>We illustrate this here by integrating the content of PubMed Central (PMC) with that of the RCSB Protein Data Bank (PDB) <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp> via the BioLit <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> resource. At this time only about 21% of structures in the PDB (as defined by their PDB identifiers) are referenced in the full text of open access articles contained in PMC, but already some interesting associations can be made. The immediate impact when viewed from the RCSB PDB is that new literature references to the structure are uncovered, even when the primary citation to the structure (if one exists) is not cited in that literature. As described subsequently, appropriate components of that literature can then be integrated with database content and presented as a unified view; a small step towards true literature-data integration.</p>
<p>Many groups have made significant contributions in extracting semantic data from both open- and closed-access literature in the life sciences, although none focus on PDB IDs. In particular, GoPubMed <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, SEGOPubmed <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>, and Textpresso <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>, all web resources, have improved the classification and searchability of articles by inferring semantic relationships in articles using existing or customized ontologies <abbrgrp>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. In structural biology, PDBsum <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp> has gained permission from publishers to use selected figures and captions for the PDBsum pages and the figures are extracted from the journal's websites using custom made scripts. The FEBS Letters engaged a collaboration with the MINT database aimed at integrating each manuscript with a structured summary, which precisely reports the protein interactions reported in the manuscript. This is achieved using database identifiers and predefined controlled vocabularies <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>.</p>
<p>We proceed by describing the methods used to establish RCSB PDB-BioLit integration, followed by examples of how that integration is presented and conclude with future directions afforded by free and unrestricted access to database and literature content.</p>
</sec>
<sec>
<st>
<p>Implementation</p>
</st>
<sec>
<st>
<p>BioLit pipeline</p>
</st>
<p>BioLit is a resource that delivers semantically enriched content for all research articles from PubMed Central (PMC) <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. Specifically, this content includes database identifiers and ontology terms found within the full text of the articles. BioLit uses a local copy of PMC and applies a text-mining pipeline in order to identify ontology terms provided by a number of ontologies from the National Center for Biomedical Ontology (NCBO, <url>http://bioportal.bioontology.org/</url>). If matches to multiple terms are found BioLit identifies and applies the longest possible match. In addition BioLit identifies PDB IDs in the articles. The IDs are identified by pattern matching which includes heuristics to avoid false positives. A validity check is performed by comparing possible matches with existing PDB IDs. BioLit provides weekly updates from PMC using a cron job to fetch the latest articles (approximately 1,000-2,000 per week at the time of writing).</p>
</sec>
<sec>
<st>
<p>Web Service Communication between BioLit and RCSB PDB</p>
</st>
<p>The BioLit web and database servers are independent of the servers that are hosting the RCSB PDB site. For integration between the two resources RESTful Web Services are used to communicate XML documents over HTTP. Two example URLs are given to demonstrate the communication process: The first URL allows access to BioLit information for all articles containing a particular PDB ID: <url>http://biolit.ucsd.edu/ws/rest/articles/pdbid/1HIV/metadata</url> (to request articles for PDB ID 1HIV). The second URL allows access to descriptions of the Figures in a specific article, based on its PMC ID: <url>http://biolit.ucsd.edu/ws/rest/files/pmcid/1483839/figures</url>. For more detailed documentation of these Web Services, see the online documentation at <url>http://biolit.ucsd.edu/doc/rest.jsp</url>. To simplify communication with these services and in order to allow other people to access these data, we provide a simple Web Service client (Java). This client library is used by the RCSB PDB web site to request data from the BioLit servers. The data are then rendered in a user's browser using standard Javascript and JSON web technologies.</p>
<p>To make sure the latest literature associations are available to RCSB PDB users, the data are requested dynamically from BioLit prior to visualization on the website. However, to provide a fast response once a given PDB structure is requested, results are cached RCSB PDB-server side using the Memcached library <url>http://www.danga.com/memcached</url>.</p>
</sec>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>PDB Integration</p>
</st>
<p>At present, BioLit has identified articles in PMC for approximately 21% of PDB structures (at this time 13,273 PDB IDs). Currently 44,984 articles in the BioLit copy of PMC mention PDB IDs.</p>
<p>If articles are available from BioLit, the new "Literature" tab on the structure summary page of the RCSB PDB website provides a comprehensive summary of the associated literature. An example of how this is displayed is given in Figure <figr fid="F1">1</figr>.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Literature tab view for PDB ID <ext-link ext-link-id="1KX3" ext-link-type="pdb">1KX3</ext-link> from the RCSB PDB</p></caption><text>
   <p><b>Literature tab view for PDB ID </b><ext-link ext-link-id="1KX3" ext-link-type="pdb">1KX3</ext-link><b>from the RCSB PDB</b>. The view provides the following data fields: <it>Primary Citation </it>for the protein structure; <it>Publication Details</it>: MeSH Keywords for the article and related citations from iHOP and GeneRIF. <it>Related Citations in PDB entry </it>as provided by the depositors of the structure; <it>PubMed Central articles </it>are articles identified by BioLit that mention the PDB ID; <it>Other PDB IDs </it>(not shown) that co-occur with 1KX3 in PubMedCentral articles.</p>
</text><graphic file="1471-2105-11-220-1" hint_layout="double"/></fig>
<p>The "Literature" tab provides several sections of information (if available) for a given PDB structure:</p>
<p indent="1">1. <it>Primary Citation </it>presents the abstract and literature reference information for the original article associated with the PDB structure.</p>
<p indent="1">2. <it>Publication Details </it>displays MeSH Keywords for the article and links to related citations from the iHOP <abbrgrp>
<abbr bid="B13">13</abbr>
</abbrgrp> and GeneRIF sites <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>.</p>
<p indent="1">3. <it>Related Citations </it>show secondary literature references provided by the depositors of the structure.</p>
<p indent="1">4. <it>PubMed Central articles </it>displays articles that have been identified by BioLit to contain references to the PDB structure. In this section of the web page it is possible to browse through all the figures provided in the original article. If a thumbnail is selected, a larger version of the figure as well as the figure legend is displayed. Additionally, the article abstract and copyright information can be displayed. The <it>Context </it>in which a PDB ID has been mentioned in an article can be investigated as well. Usually this will be the surrounding text paragraph. In the case of figures it displays the figure legend. All content is made available under the same copyright that applies to the original material.</p>
<p indent="1">5. <it>Other PDB IDs </it>that are found in the same articles as the target PDB ID are also listed. This list provides the user with an additional set of structures that are referenced in the same paper(s). Structures grouped by the same literature references may or may not indicate some common features. In order to provide further information on what it means if two entries are cited in the same article, we also show the sequence similarity between the co-occurring IDs. In cases where a PDB structure contains multiple chains the one with the highest similarity is displayed.</p>
</sec>
<sec>
<st>
<p>Finding Citations that are not in the Article Reference List</p>
</st>
<p>The major value of the literature and database integration is to identify citations to articles that are not in the article reference list. For example, the PDB ID <ext-link ext-link-id="3BY7" ext-link-type="pdb">3BY7</ext-link> was cited by an article in Genome Biology <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp> even before the primary citation for the protein structure was published <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>. The identification of such citations would be impossible without PMC and subsequently BioLit.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>At present research articles for approximately 21% of the structures in the Protein Data Bank (PDB) are found in PubMed Central (PMC). This number is expected to increase rapidly given that many of the world's research funding agencies have specified that the publications associated with the research they fund must be made open access and hence available through PMC. For example the NIH stipulates that research publications they fund be accessible within one year of publication <url>http://publicaccess.nih.gov/</url>.</p>
<p>One of the difficulties when data mining PDB IDs is that at 4-characters in length they may not be unique. For example, the PDB ID 3DNA had several false positive hits that were found to refer to a software package and a wet lab kit, both sharing the name "3DNA." By applying contextual criteria within the text mining and data analysis pipeline such misleading results can be filtered. Still subsequent manual review is the only method for achieving total accuracy. We are maintaining a list of PDB IDs that are known to contain false positive hits and for which stricter criteria are applied. This simple approach has turned out to work well to filter out false positive articles.</p>
<p>The future calls for unique identifiers. Digital Object Identifiers (DOIs) are already provided for each PDB structure, in a manner similar to research articles, so the authoritative reference to the content associated with the DOI can be resolved in an Internet environment. This is a positive step, but at present few research articles provide DOIs to reference structures. Moreover, this does not resolve the finer parts of a macromolecular structure, for example individual polypeptide chains and ligands, each of which are often referenced specifically in research articles.</p>
<p>We describe an initial step in database and literature integration using common and reliable semantics to create the linkage between the two previously disparate resources. The intent is to show the promise that this approach has to providing improved comprehension, not just to users of the RCSB PDB, but if implemented by other databases, to those as well. Semantic association through database identifiers is just a first step in what is possible as PMC increases in content. Ontological terms commonly used by databases can be located in the literature and richer and more contextual interrelationships are possible. In the case of the RCSB PDB a sign of success would be for the user to identify from literature integration a function of the protein previously unknown to them. Tool development to hopefully facilitate this type of discovery is on-going.</p>
<sec>
<st>
<p>Availability and requirements</p>
</st>
<sec>
<st>
<p>Most Frequently Cited PDB Structures</p>
</st>
<p>Table <tblr tid="T1">1</tblr> lists the eight most frequently cited PDB structures in PMC and their associated functions. A longer-term goal is to extract functional associations automatically from PMC.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Structures appearing most frequently in PubMed Central (PMC), based on the citations identified using the BioLit pipeline</p></caption><tblbdy cols="3">
      <r>
         <c ca="center">
            <p>
               <b>PDB ID</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Protein Name</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Nr. of Articles</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1JJ2" ext-link-type="pdb">1JJ2</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>Large Ribosomal Subunit</p>
         </c>
         <c ca="center">
            <p>27</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1J5E" ext-link-type="pdb">1J5E</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>30S Ribosomal Subunit</p>
         </c>
         <c ca="center">
            <p>19</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1FFK" ext-link-type="pdb">1FFK</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>Large Ribosomal Subunit</p>
         </c>
         <c ca="center">
            <p>19</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1LMB" ext-link-type="pdb">1LMB</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>Lambda Repressor</p>
         </c>
         <c ca="center">
            <p>19</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1AAY" ext-link-type="pdb">1AAY</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>Zinc Finger</p>
         </c>
         <c ca="center">
            <p>17</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1TSR" ext-link-type="pdb">1TSR</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>P53</p>
         </c>
         <c ca="center">
            <p>16</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1F88" ext-link-type="pdb">1F88</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>Rhodopsin</p>
         </c>
         <c ca="center">
            <p>15</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <ext-link ext-link-id="1BRS" ext-link-type="pdb">1BRS</ext-link>
            </p>
         </c>
         <c ca="center">
            <p>Barnase/Barstar complex</p>
         </c>
         <c ca="center">
            <p>14</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>The open access literature for RCSB PDB entries is available from the Literature tab for each structure entry at <url>http://www.rcsb.org</url>. The RESTful Web Services provided by BioLit <url>http://biolit.ucsd.edu</url> are documented at <url>http://biolit.ucsd.edu/doc/rest.jsp</url>. The client library is written in Java to simplify communication with these services can be downloaded at <url>http://biojava.org/wiki/BioLit</url>.</p>
</sec>
</sec>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>AP wrote the BioLit client library, provided the new literature view and drafted the manuscript. MAM &amp; JLF developed BioLit and provided the Web Services. MAM maintains updates of PMC and implemented the false positive filters. BTY and DD provide development and support of the servers. BB contributed to the development of the literature view. PR participated in the design and coordination. PEB conceived of the idea, directed the research and wrote sections of the manuscript. JLF coordinated and managed the BioLit project and helped draft the manuscript. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We gratefully acknowledge all scientists who openly share their data in public repositories and publish their research findings in the open access literature. The RCSB PDB is managed by two members of the RCSB: Rutgers and UCSD, and is funded by NSF, NIGMS, DOE, NLM, NCI, NINDS, and NIDDK. The RCSB PDB is a member of the wwPDB.</p>
<p>This work was supported by the RCSB PDB grant NSF DBI 0829586 and by the BioLit grant NSF DBI 0544575.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Will a biological database be different from a biological journal?</p></title><aug><au><snm>Bourne</snm><fnm>PE</fnm></au></aug><source>PLoS Computational Biology</source><pubdate>2005</pubdate><volume>1</volume><issue>3</issue><fpage>179</fpage><lpage>181</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.0010034</pubid><pubid idtype="pmcid">1193993</pubid><pubid idtype="pmpid">16158097</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Open access: taking full advantage of the content</p></title><aug><au><snm>Bourne</snm><fnm>PE</fnm></au><au><snm>Fink</snm><fnm>JL</fnm></au><au><snm>Gerstein</snm><fnm>M</fnm></au></aug><source>PLoS Computational Biology</source><pubdate>2008</pubdate><volume>4</volume><issue>3</issue><fpage>e1000037</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.1000037</pubid><pubid idtype="pmcid">2275780</pubid><pubid idtype="pmpid">18369428</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Reinventing Scholarly Communication for the Electronic Age</p></title><aug><au><snm>Fink</snm><fnm>L</fnm></au><au><snm>Bourne</snm><fnm>P</fnm></au></aug><source>CTWatch Quarterly</source><pubdate>2007</pubdate><volume>3</volume><issue>3</issue></bibl><bibl id="B4"><title><p>Biocurators: contributors to the world of science</p></title><aug><au><snm>Bourne</snm><fnm>PE</fnm></au><au><snm>McEntyre</snm><fnm>J</fnm></au></aug><source>PLoS Computational Biology</source><pubdate>2006</pubdate><volume>2</volume><issue>10</issue><fpage>e142</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.0020142</pubid><pubid idtype="pmcid">1626157</pubid><pubid idtype="pmpid">17411327</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>BioLit: Integrating Biological Literature with Databases</p></title><aug><au><snm>Fink</snm><fnm>J</fnm></au><au><snm>Kushch</snm><fnm>S</fnm></au><au><snm>Williams</snm><fnm>P</fnm></au><au><snm>Bourne</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2008</pubdate><volume>36</volume><issue>11</issue><fpage>W385</fpage><lpage>9</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkn317</pubid><pubid idtype="pmcid">2447735</pubid><pubid idtype="pmpid">18515836</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>The Protein Data Bank</p></title><aug><au><snm>Berman</snm><fnm>HM</fnm></au><au><snm>Westbrook</snm><fnm>J</fnm></au><au><snm>Feng</snm><fnm>Z</fnm></au><au><snm>Gilliland</snm><fnm>G</fnm></au><au><snm>Bhat</snm><fnm>TN</fnm></au><au><snm>Weissig</snm><fnm>H</fnm></au><au><snm>Shindyalov</snm><fnm>IN</fnm></au><au><snm>Bourne</snm><fnm>PE</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2000</pubdate><volume>28</volume><fpage>235</fpage><lpage>242</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.235</pubid><pubid idtype="pmcid">102472</pubid><pubid idtype="pmpid">10592235</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>GoPubMed: exploring PubMed with the Gene Ontology</p></title><aug><au><snm>Doms</snm><fnm>A</fnm></au><au><snm>Schroeder</snm><fnm>M</fnm></au></aug><source>Nucleic acids research</source><pubdate>2005</pubdate><issue>33 Web Server</issue><fpage>W783</fpage><lpage>786</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gki470</pubid><pubid idtype="pmcid">1160231</pubid><pubid idtype="pmpid">15980585</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Semantically linking and browsing PubMed abstracts with gene ontology</p></title><aug><au><snm>Vanteru</snm><fnm>BC</fnm></au><au><snm>Shaik</snm><fnm>JS</fnm></au><au><snm>Yeasin</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2008</pubdate><volume>9</volume><issue>Suppl 1</issue><fpage>S10</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-9-S1-S10</pubid><pubid idtype="pmcid">2386052</pubid><pubid idtype="pmpid">18366599</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Textpresso: an ontology-based information retrieval and extraction system for biological literature</p></title><aug><au><snm>Muller</snm><fnm>HM</fnm></au><au><snm>Kenny</snm><fnm>EE</fnm></au><au><snm>Sternberg</snm><fnm>PW</fnm></au></aug><source>PLoS Biology</source><pubdate>2004</pubdate><volume>2</volume><issue>11</issue><fpage>e309</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pbio.0020309</pubid><pubid idtype="pmcid">517822</pubid><pubid idtype="pmpid">15383839</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Automatic document classification of biological literature</p></title><aug><au><snm>Chen</snm><fnm>D</fnm></au><au><snm>Muller</snm><fnm>HM</fnm></au><au><snm>Sternberg</snm><fnm>PW</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>370</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-370</pubid><pubid idtype="pmcid">1559726</pubid><pubid idtype="pmpid">16893465</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature</p></title><aug><au><snm>Laskowski</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>15;23</volume><issue>14</issue></bibl><bibl id="B12"><title><p>The FEBS Letters experiment</p></title><aug><au><snm>Ceol</snm><fnm>A</fnm></au><au><snm>Chatr-Aryamontri</snm><fnm>A</fnm></au><au><snm>Licata</snm><fnm>L</fnm></au><au><snm>Cesareni</snm><fnm>G</fnm></au></aug><source>FEBS Letters</source><volume>582</volume><issue>8</issue><fpage>1171</fpage><lpage>1177</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.febslet.2008.02.071</pubid><pubid idtype="pmpid" link="fulltext">18328820</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>A gene network for navigating the literature</p></title><aug><au><snm>Hoffmann</snm><fnm>R</fnm></au><au><snm>Valencia</snm><fnm>A</fnm></au></aug><source>Nat Genet</source><pubdate>2004</pubdate><volume>36</volume><issue>7</issue><fpage>664</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng0704-664</pubid><pubid idtype="pmpid" link="fulltext">15226743</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Gene indexing: characterization and analysis of NLM's GeneRIFs</p></title><aug><au><snm>Mitchell</snm><fnm>JA</fnm></au><au><snm>Aronson</snm><fnm>AR</fnm></au><au><snm>Mork</snm><fnm>JG</fnm></au><au><snm>Folk</snm><fnm>LC</fnm></au><au><snm>Humphrey</snm><fnm>SM</fnm></au><au><snm>Ward</snm><fnm>JM</fnm></au></aug><source>AMIA Annu Symp Proc</source><pubdate>2003</pubdate><fpage>460</fpage><lpage>4</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1480312</pubid><pubid idtype="pmpid">14728215</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Taxonomic distribution of large DNA viruses in the sea</p></title><aug><au><snm>Monier</snm><fnm>A</fnm></au><au><snm>Claverie</snm><fnm>JM</fnm></au><au><snm>Ogata</snm><fnm>H</fnm></au></aug><source>Genome Biol</source><pubdate>2008</pubdate><volume>9</volume><issue>7</issue><fpage>R106</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2008-9-7-r106</pubid><pubid idtype="pmcid">2530865</pubid><pubid idtype="pmpid">18598358</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Crystal structure of uncharacterized protein (JCVI_PEP_1096686650277) from an environmental metagenome (unidentified marine microbe, Sorcerer II Global Ocean Sampling experiment) at 2.60 A resolution</p></title><aug><au><snm>Das</snm><fnm>D</fnm></au><au><snm>Kozbial</snm><fnm>P</fnm></au><etal/></aug><source>Proteins</source><pubdate>2009</pubdate><volume>75</volume><fpage>296</fpage><lpage>307</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/prot.22360</pubid><pubid idtype="pmcid">2785455</pubid><pubid idtype="pmpid">19173316</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>