<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-11-S4-S24</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>Algorithms and semantic infrastructure for mutation impact extraction and grounding</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Laurila</snm>
               <mi>B</mi>
               <fnm>Jonas</fnm>
               <insr iid="I1"/>
               <email>j02h9@unb.ca</email>
            </au>
            <au id="A2">
               <snm>Naderi</snm>
               <fnm>Nona</fnm>
               <insr iid="I2"/>
               <email>n_nad@encs.concordia.ca</email>
            </au>
            <au id="A3">
               <snm>Witte</snm>
               <fnm>Ren&#233;</fnm>
               <insr iid="I2"/>
               <email>rwitte@encs.concordia.ca</email>
            </au>
            <au id="A4">
               <snm>Riazanov</snm>
               <fnm>Alexandre</fnm>
               <insr iid="I1"/>
               <email>alexr@unb.ca</email>
            </au>
            <au id="A5">
               <snm>Kouznetsov</snm>
               <fnm>Alexandre</fnm>
               <insr iid="I1"/>
               <email>alexk@unb.ca</email>
            </au>
            <au ca="yes" id="A6">
               <snm>Baker</snm>
               <mi>JO</mi>
               <fnm>Christopher</fnm>
               <insr iid="I1"/>
               <email>bakerc@unb.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science <it>&amp;</it> Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada</p>
            </ins>
            <ins id="I2">
               <p>Department of Computer Science &amp; Software Engineering, Concordia University, Montr&#233;al, Qu&#233;bec, H3G 1M8, Canada</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <supplement>
            <title>
               <p>Ninth International Conference on Bioinformatics (InCoB2010): Computational Biology</p>
            </title>
            <editor>Christian Sch&#246;nbach, Kenta Nakai, Tin Wee Tan and Shoba Ranganathan</editor>
            <note>Proceedings</note>
            <url>http://www.biomedcentral.com/content/pdf/1471-2164-11-S4-info.pdf</url>
         </supplement>
         <conference>
            <title>
               <p>Asia Pacific Bioinformatics Network (APBioNet) Ninth International Conference on Bioinformatics (InCoB2010)</p>
            </title>
            <location>Tokyo, Japan</location>
            <date-range>26-28 September 2010</date-range>
            <url>http://incob.apbionet.org/incob10/</url>
         </conference>
         <issn>1471-2164</issn>
         <pubdate>2010</pubdate>
         <volume>11</volume>
         <issue>Suppl 4</issue>
         <fpage>S24</fpage>
         <url>http://www.biomedcentral.com/1471-2164/11/S4/S24</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">21143808</pubid>
               <pubid idtype="doi">10.1186/1471-2164-11-S4-S24</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>2</day>
               <month>12</month>
               <year>2010</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2010</year>
         <collab>Laurila et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Introduction</p>
         </st>
         <p>Annotation of protein mutants with their new properties is crucial to the understanding of genetic mechanisms, biological processes and the complex diseases or phenotypes that may result. Despite attempts to manually organize variation information e.g. Protein Mutant Database <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and Human Genome Variation Society <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, the amount of information is increasing exponentially so that such databases are perpetually out of date, and having a latency of many years. In recent years the extraction of mutation mentions from biomedical documents has been a growing area of research. A number of information systems target the extraction of mutation mentions from the biomedical literature to permit the reuse of knowledge about mutation impacts. These include work by Rebholz-Schuhmann et al. <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, <it>MuteXt</it> by <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and <it>Mutation Miner</it> by <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The <it>MutationFinder</it> system <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> extended the rules of <it>MuteXt</it> for point mutation extraction. The <it>mSTRAP</it> system created by <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> is developed to extract mutations, represent them as instances of an ontology and use the <it>mSTRAPviz</it> client to query the populated ontology and visualize the mutations and annotations on protein structures / homology models. <it>Mutation GraB</it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp> proposed the utilization of graph bigram to disambiguate the extracted protein point mutations. The <it>MuGeX</it> system extracts mutation-gene pairs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Two recent systems by Krallinger et al. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and Winnenburg et al. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> ground mutation mentions, as does the mSTRAP system <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>However, little work exists on automatically detecting and extracting mutation impacts. An exception is EnzyMiner <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, which was developed with the aim of automatic classification of PubMed abstracts based on the impact of a protein level mutation on the stability and the activity of a given enzyme. In EnzyMiner, the predefined patterns of <it>MuGeX</it> are used to extract the mutations and a machine learning approach was taken to disambiguate the cell line names and strain names from mutations. Using a document classifier, the abstracts containing mutations without any impacts are removed and the remaining abstracts are classified into two groups of disease related and non-disease related documents, after which extracted mutations are listed for each group. In the case of the non-disease related abstracts, the documents are sub-classified into two groups: Documents containing impacts on stability; and documents containing impacts on functionality. This method for document classification can be useful in narrowing down search results but from the perspective of reuse and document annotation, more detailed methods for sentence-level detection, extraction and grounding of mutation impact information are required. In the current paper we present a rule-based approach for the extraction of mutation impacts on protein properties categorizing their directionality and grounding these entities to external resources. The system populates and RDF triple store and the algorithms are deployed as semantic web services.</p>
         <sec>
            <st>
               <p>Content overview</p>
            </st>
            <p>The Methods section starts by describing our text mining pipeline (with named entity recognition and grounding of named entities to real-world entities), it continues to outline a mutation impact ontology specification and describes methods used to deploy mutation impact knowledge on the web. The Results section presents evaluations of the different subtasks and includes discussion of these results in the context of future improvements. Finally we provide a Conclusion and an outline of future work.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Named entity recognition</p>
            </st>
            <p>The first step of a mutation impact extraction system is to find named entities throughout the text, these include <it>mutations</it>, <it>protein properties</it> and words describing impact <it>directionality</it> as in the following sentence:</p>
            <p>&#8220;The <b>W125F</b> mutant showed only a slight <b>reduction</b> of <b>activity</b> (V<it><sub>max</sub></it>) and a larger <b>increase</b> of <b>K</b><it><sub>m</sub></it> with 1,2-dibromoethane.&#8221; <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
            <p><it>protein-, gene-</it> and <it>organism names</it> also have to be recognized in order for the system to be able to properly ground mutations and protein properties:</p>
            <p>&#8220;<b>Haloalkane dehalogenase (DhlA)</b> from <b>Xanthobacter autotrophicus GJI0</b> hydrolyses terminally chlorinated and brominated n-alkanes to the corresponding alcohols.&#8221; <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
            <p>We use GATE in combination with gazetteer lists created from a variety of resources and rules written in the JAPE language to find these entities. The following sections describe these methods in more detail. See Figure <figr fid="F1">1</figr> for a system overview.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Extraction and grounding framework</p>
               </caption>
               <text>
                  <p><b>Extraction and grounding framework</b>. Full-text documents (1) are run through a GATE pipeline with gazetteers derived from Swiss-Prot (2) and created with MutationFinder (3). Mutations and proteins are grounded (4). Protein properties are extracted with use of MuNPEx and custom JAPE rules (5) and grounded to the Gene Ontology when applicable. The impact extractor (6) makes use of the previous annotations to establish relations between mutants and impacts on protein properties. The output consists of annotated text (8).</p>
               </text>
               <graphic file="1471-2164-11-S4-S24-1"/>
            </fig>
            <sec>
               <st>
                  <p>Mutations</p>
               </st>
               <p>To extract mutation mentions we used the MutationFinder system <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The system employs a complex set of regular expressions and is currently the best available tool for point mutation extraction. Full-text documents are first run through MutationFinder to create gazetteer lists containing mutation mentions that are compliant with the GATE framework. MutationFinder is also able to normalize mutations into <it>wNm</it> format, where <it>w</it> and m are one-letter codes for the wildtype and mutation residues, and <it>N</it> is the position on the amino acid sequence. Normalization is required prior to the mutation grounding task, we therefore add the normalized form as a feature to each gazetteer entry.</p>
            </sec>
            <sec>
               <st>
                  <p>Proteins, genes and organisms</p>
               </st>
               <p>The protein database Swiss-Prot, a manually annotated part of UniProt KB <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, was used to select protein, gene and organism names. The use of Swiss-Prot is motivated by their high quality naming and mappings between names and protein sequences. The text format version of Swiss-Prot was encoded into our gazetteer lists compliant with GATE. Mappings between names and primary accession numbers and mappings between primary accession numbers and amino acid sequences are exported to a local database named Mutation Grounding Database (MGDB), for later use in the grounding / disambiguation step described in the <it>Grounding</it> section. Protein- and gene names containing more than one word are separated from names with only one word. The former are put in a gazetteer list for case insensitive matching of longer names to increase recall, and the latter are used for case sensitive matching of shorter names to increase precision. The organism names are put in a single gazetteer list for case insensitive matching containing both scientific (Latin genus and species) and English names.</p>
            </sec>
            <sec>
               <st>
                  <p>Protein properties</p>
               </st>
               <p>Functions of proteins, as described in the Gene Ontology, are either activities e.g. <it>carbonate dehydratase activity</it>, or bindings to another entity e.g. <it>zinc ion binding.</it> To capture mentions of these functions in text we look for noun phrases with one of the words <it>activity</it>, <it>binding</it>, <it>affinity</it> or <it>specificity</it> as the head noun. This is accomplished by using <it>MuNPEx</it>, which is a multi-lingual noun phrase extraction component developed for the GATE architecture <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
               <p>Kinetic variables are used to describe different features in an enzymatic reaction. They can for example describe how well the enzyme binds to the substrate or how efficient the overall catalysis is. Although they have to be interpreted in the context of the specific enzyme and substrates to be understood fully we still want to extract how these variables are impacted by mutations. This information can then be used in further enzyme dependent reasoning or by domain experts that are already capable of interpreting the meaning of these kinetic variables. In our implementation we annotate the Michaelis constant <it>K<sub>M</sub></it>, the rate constant <it>k<sub>cat</sub></it> and the compound variable <it>k<sub>cat</sub>/K<sub>M</sub>.</it> This is accomplished with rules written in the JAPE language which also makes sure variables are not part of a more complex variable or equation. Other protein properties such as <it>stability</it> are not considered in the current implementation.</p>
            </sec>
            <sec>
               <st>
                  <p>Impact directions</p>
               </st>
               <p>To extract the actual impacts on protein properties we need terms describing directionality or the existence of a change. For example the negative impact on carbonate dehydratase activity of carbonic anhydrase II, which is due to two point mutations, might be described as: &#8220;<it>The double mutant had intact conformation but reduced catalytic activity (30-40%) compared to HCA II<sub>pwt</sub></it>&#8221; <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. In this example the word <it>reduced</it> and to some extent <it>intact</it> are keywords describing directionality of impacts.</p>
               <p>In our implementation we used five different gazetteer lists categorized as positive, negative, neutral, non-neutral and negation. The gazetteers were created by domain experts who extracted words describing directionality from sentences containing protein functions. To escape the need for a stemmer, the gazetteers were extended with other grammatical forms of words already extracted. A total number of 337 sentences containing protein functions were extracted from a corpus containing documents about mutations on carbonic anhydrases and apolipoproteins and the resulting gazetteer lists contain a total of 85 words describing directionality. An overview of the direction gazetteer lists is presented in Table <tblr tid="T1">1</tblr>.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Categorized directionality words.</p>
                  </caption>
                  <tblbdy cols="6">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Positive</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>Negative</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>(cont.)</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>Neutral</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>Negation</b>
                           </p>
                        </c>
                        <c ca="left">
                           <p>
                              <b>Non-Neutral</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="6">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>increase</p>
                        </c>
                        <c ca="left">
                           <p>abolish</p>
                        </c>
                        <c ca="left">
                           <p>loose</p>
                        </c>
                        <c ca="left">
                           <p>identical</p>
                        </c>
                        <c ca="left">
                           <p>without</p>
                        </c>
                        <c ca="left">
                           <p>affect</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>-increases</p>
                        </c>
                        <c ca="left">
                           <p>decrease</p>
                        </c>
                        <c ca="left">
                           <p>defect</p>
                        </c>
                        <c ca="left">
                           <p>similar</p>
                        </c>
                        <c ca="left">
                           <p>no</p>
                        </c>
                        <c ca="left">
                           <p>effect</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>-increased</p>
                        </c>
                        <c ca="left">
                           <p>reduce</p>
                        </c>
                        <c ca="left">
                           <p>disrupt</p>
                        </c>
                        <c ca="left">
                           <p>full</p>
                        </c>
                        <c ca="left">
                           <p>not</p>
                        </c>
                        <c ca="left">
                           <p>alter</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>-increasing</p>
                        </c>
                        <c ca="left">
                           <p>lower</p>
                        </c>
                        <c ca="left">
                           <p>diminish</p>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p>differ</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>enhance</p>
                        </c>
                        <c ca="left">
                           <p>inhibit</p>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>higher</p>
                        </c>
                        <c ca="left">
                           <p>impair</p>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>improve</p>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                        <c ca="left">
                           <p/>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Grounding</p>
            </st>
            <p>Grounding is the task of cross-linking entities found in text with their real-world counterparts. In the case of proteins the entities, protein mentions are grounded when they have been assigned the correct UniProtKB ID, and for mutation entities, the grounding task is to map mutation mentions to the correct amino-acid residues of sequences stored in the UniProtKB <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In the case of protein functions we define grounding as establishment of a link from these mentions to the correct Gene Ontology concept. Kinetic variable grounding is straightforward in our current implementation, as we only consider three different variables; <it>K<sub>M</sub>, k<sub>cat</sub></it> and <it>k<sub>cat</sub>/K<sub>M</sub>.</it> Links to the substrates being acted upon would serve as a more granular grounding and would increase the ability to query for impact information more precisely, but for the time being we do not establish these links to substrates.</p>
            <sec>
               <st>
                  <p>Proteins and mutations</p>
               </st>
               <p>The method we use for protein and mutation grounding was previously described by <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and is summarized below:</p>
               <p>In the first stage a pool of candidate protein accession numbers is generated based on mappings of gene and protein names occurring in the target documents to accession numbers in MGDB. To ensure a comprehensive pool of candidate accession numbers, and avoid errors as a result of poor co-reference resolution techniques (i.e. not linking shorter names in text to the previously mentioned long form stated earlier in text), all accessions for names in MGDB with additional suffixes to the original protein or gene name are also extracted. A pool of candidate accession numbers is generated for each document and trimmed to contain only the most frequently occurring accession numbers. For these proteins all extracted organism mentions are cross checked. Accession numbers not related to any retrieved organism mentions are discarded and the protein sequences of candidate proteins are retrieved from MGDB.</p>
               <p>In the second step mutations extracted from the text are mapped onto the candidate sequences using regular expressions generated from the mutation mentions extracted from the text. Mapping mentioned mutations to the correct position on the correct sequence is a non-trivial task. False positives can occur as a consequence of DNA level variations, plasmid names and cases where the numbering scheme used by authors can differ from the one used in sequence databases, e.g. as a consequence of N-terminal methionine cleavage or other post-translational modifications. These issues are discussed further in <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
               <p>The mutation grounding algorithm briefly works as follows. For each possible pair of mutations, we create a regular expression by using the wildtype residues and the distance between them; for two normalized mutation mentions <it>w</it><sub>1</sub><it>N</it><sub>1</sub><it>m</it><sub>1</sub> and <it>w</it><sub>2</sub><it>N</it><sub>2</sub><it>m</it><sub>2</sub>, sorted in the ascending order of <it>N<sub>i</sub></it>, the regular expression will be <it>w</it><sub>1</sub> &#8226; {<it>N</it><sub>2</sub> &#8211; <it>N</it><sub>1</sub> &#8211; 1}<it>w</it><sub>2</sub>. E.g. <it>A</it>378<it>C</it> and <it>S</it>381<it>L</it> will result in <it>A &#183; &#183;S.</it> If a regular expression matches a sequence, we check for the remaining mutations in the set, one after another, taking into account the numbering displacement found when using the regular expression.</p>
               <p>The output of the algorithm is the accession number and corresponding sequence onto which most mutations are grounded, which is considered to be the wildtype sequence of the protein described in the document. Mutation mentions that do not match the sequence are discarded and in cases where two sequences are identified, the sequence with least displacement from the mutation numbering in the paper is chosen.</p>
            </sec>
            <sec>
               <st>
                  <p>Protein functions</p>
               </st>
               <p>For grounding of protein function mentions we use the Molecular Function part of the Gene Ontology as a reference vocabulary. The terms in the Gene Ontology are already used for annotation of Swiss-Prot entries to describe the properties of proteins. This means that we can leverage these mappings between the proteins we have grounded and protein functions we are looking for. We can then use the information on related functions to ground protein function mentions found throughout the document. In addition to creating links to Gene Ontology concepts the relevance of each protein function mention is scored based on its similarity to synonyms of a certain Gene Ontology concept. In order to measure this similarity the protein function mentions are first split into words, thereafter stop words are removed and finally the remaining words are stemmed using the Snowball English stemmer <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. The resulting set of words (<it>N</it>) are then compared with each synonym (<it>G</it>) of the Gene Ontology concept, which are prepared in the same way, by measuring the relative intersection as below:</p>
               <p>
                  <display-formula>
                     <graphic file="1471-2164-11-S4-S24-i1.gif"/>
                  </display-formula>
               </p>
               <p>After comparisons have been made to all synonyms the highest similarity score is chosen and added as a feature together with the id of the related Gene Ontology concept to the protein function mention annotation. In the next section, <it>Relation detection</it>, we show how these similarity scores together with mutant-impact relation scores and impact scores are used to solve contradictions in the output annotations. In order to increase the number of synonyms and hence the number of highly and correctly scored protein function mentions, synonyms of ancestors to the retrieved Gene Ontology concepts are also used for comparison.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Relation detection</p>
            </st>
            <p>In order to establish legitimate links between previously recognized and in some cases grounded entities, we need to detect the relations between them. For the purpose of mutation impact extraction we recognize relations between directionality words and protein properties which, taken together as a triple, constitute impact statements. Relations between mutants and these impacts are also detected. The two methods make use of heuristics based on entity distance.</p>
            <sec>
               <st>
                  <p>Impacts</p>
               </st>
               <p>Impacts can be seen as relations between protein properties and words describing directionality, or change. In order to extract these relations we use a set of rules (Figure <figr fid="F2">2</figr>) which are applied to the documents with properties (protein functions and kinetic variables) and directionality words found in them.</p>
               <fig id="F2">
                  <title>
                     <p>Figure <figr fid="F2">2</figr></p>
                  </title>
                  <caption>
                     <p>Rules for impact classification.</p>
                  </caption>
                  <text>
                     <p>
                        <b>Rules for impact classification.</b>
                     </p>
                  </text>
                  <graphic file="1471-2164-11-S4-S24-2"/>
               </fig>
               <p>Since impacts on different properties can occur in the same sentence; sentences containing two or more properties are split by looking for the comma character or the word <it>and.</it> If none of these delimiters are found the sentence is split just before or after the next or previous property, depending on order. The impacts are also scored according to the distance between directionality words and protein properties:</p>
               <p>
                  <display-formula>
                     <graphic file="1471-2164-11-S4-S24-i2.gif"/>
                  </display-formula>
               </p>
               <p>where <it>tokenDistance</it> is the number of space tokens between the directionality word and the protein property. If the directionality word would be a part of the noun phrase of a property the distance is set to 1.</p>
            </sec>
            <sec>
               <st>
                  <p>Mutant-impact relations</p>
               </st>
               <p>When impacts have been extracted and correctly classified according to directionality, we need to find the mutant that has this change in protein property relative to the wildtype. Mutants can be described in many ways: (i) as a series of mutations e.g. &#8220;Arg172Lys+His65Ala&#8221;, (ii) with a short <it>nick name</it> specific for the paper e.g. &#8220;Mut1&#8221;, (iii) as a pronominal reference e.g. &#8220;The triple mutant&#8221; or (iv) simply by a single point mutation. In our implementation we say that each grounded mutation mention constitutes one single mutant. To extract the relation between mutants and impacts we say that when an impact is found, the closest mutants all have that impact. The closeness is measured by sentence distance and is scored as:</p>
               <p>
                  <display-formula>
                     <graphic file="1471-2164-11-S4-S24-i3.gif"/>
                  </display-formula>
               </p>
               <p>where <it>sentenceDistance</it> equals 1 if a mutation mention occurs in the same sentence as the impact and increases by 1 for each previous sentence, limited to at most three previous sentences. Only mutations with the shortest distance are considered.</p>
               <p>To solve contradictions in the output annotations, e.g. when a mutant is said to have both negative and positive impact on a specific property the arithmetic mean of all scores gathered through the process are used, i.e. the mutant-impact relation score, the impact score and the similarity score between function mentions and Gene Ontology concepts. For kinetic variables the similarity score is omitted since it is not measured. A higher score means higher similarity to the Gene Ontology concept and shorter distance between directionality, property and mutant terms making the overall assertion more likely to be correct.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Mutation impact ontology</p>
            </st>
            <p>In order to ensure the results of our text mining pipeline are reusable and understandable by both humans and machines we have formally specified the concepts used by our system in an OWL-DL ontology, with a small set of SWRL rules added for more convenient querying. The ontology we use is an extension of the ontology proposed by <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and will serve as both the T-Box for a triple store populated with results from our text mining pipeline and for publishing our text mining pipeline as SADI services, making it possible to deploy our pipeline as semantic web services connected to other existing services. These two ways of publishing knowledge will be described in more detail in the next section, <it>Web based deployment.</it> Table <tblr tid="T2">2</tblr> shows more precise definitions of the most important concepts and Figure <figr fid="F3">3</figr> displays a schematic view of the concepts and the relations between them. In addition to object properties connecting instances of concepts, datatype properties are also used to associate data values with such instantances, e.g. <it>hasSequence</it> and <it>hasWildtypeResidue</it> associate string values with instances of <it>Protein</it> and <it>PointMutation</it> respectively. Some of the concepts are closely related to concepts in already existing ontologies. For example, the concept <it>ProteinFunction</it> in our ontology can be considered as equivalent to <it>Molecular Function</it> in the Gene Ontology. When making these alignments, it is possible to further enhance the querying ability and options for knowledge discovery. A user could, for example, search for all mutations that have positively impacted on a specific protein function, specified as a sub-concept of <it>MolecularFunction.</it> This type of query would not be possible without the grounding of protein properties, provided by our algorithm. The ontology, hereafter named Mutation Impact Ontology, is made publicly available <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Concepts in the Mutation Impact Ontology and their descriptions.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Concept</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Description</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein</p>
                     </c>
                     <c ca="left">
                        <p>Proteins, also known as polypeptides, are organic compounds made of amino acids arranged in a linear chain and folded into a globular form.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein Mutant</p>
                     </c>
                     <c ca="left">
                        <p>A protein mutant is a protein where the amino acid sequence is altered compared to the wildtype protein. These alterations are called mutations.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Protein Property</p>
                     </c>
                     <c ca="left">
                        <p>The physical, chemical and biological properties of proteins. Stability and Function to mention a couple.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Elementary Mutation</p>
                     </c>
                     <c ca="left">
                        <p>An elementary change in the amino acid sequence of a protein.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mutation Series</p>
                     </c>
                     <c ca="left">
                        <p>A set of elementary mutations.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mutation Specification</p>
                     </c>
                     <c ca="left">
                        <p>An umbrella concept introduced as a link between mutations, their corresponding proteins, the impacts they cause and the texts.</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mutation Impact</p>
                     </c>
                     <c ca="left">
                        <p>A mutation impact describes a directional alteration of a protein.</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F3">
               <title>
                  <p>Figure <figr fid="F3">3</figr></p>
               </title>
               <caption>
                  <p>Mutation impact ontology structure.</p>
               </caption>
               <text>
                  <p><b>Mutation impact ontology structure.</b> Visualization of top level concepts as <it>Mutation Specification, Protein, Mutation Impact</it> and <it>Protein Property</it> being connected through object properties. Detailed descriptions of the concepts are provided in Table <tblr tid="T2">2</tblr>.</p>
               </text>
               <graphic file="1471-2164-11-S4-S24-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Web based deployment</p>
            </st>
            <p>The most straightforward way to deliver the results of our text mining pipeline to end users is to run the pipeline on available publications, store the results in a triplestore and provide a query interface. We have set up such a triplestore using Sesame <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, which is a framework that allows different storage and querying engines to be used via a unified interface. Our users can query the populated RDF triplestore via a SPARQL <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> endpoint <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Figure <figr fid="F4">4</figr> shows an example query which, translated into a natural language question, reads &#8220;<it>Which proteins have been mutated so that there is a negative impact on haloalkane dehalogenase activity and what are the sequences of the corresponding mutants?</it>&#8221;. Figure <figr fid="F5">5</figr> shows how mutation impact information is made available for the user through both SPARQL endpoints and SADI clients as discussed below.</p>
            <fig id="F4">
               <title>
                  <p>Figure <figr fid="F4">4</figr></p>
               </title>
               <caption>
                  <p>SPARQL query and answers.</p>
               </caption>
               <text>
                  <p><b>SPARQL query and answers.</b> A SPARQL query expressing the natural language question &#8220;Which proteins have been mutated so that there is a negative impact on haloalkane dehalogenase activity and what are the sequences of the corresponding mutants?&#8221; is shown to the left. The first four answers (result rows) are displayed to the right.</p>
               </text>
               <graphic file="1471-2164-11-S4-S24-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure <figr fid="F5">5</figr></p>
               </title>
               <caption>
                  <p>Mutation impact knowledge flow.</p>
               </caption>
               <text>
                  <p><b>Mutation impact knowledge flow.</b> The text-to-entity SADI service uses the text mining pipeline to extract mutations and impacts from a given text. The results are saved in an RDF triple store. The triple store can then be interrogated, either by a user through a SPARQL endpoint or by a second layer of entity-to-entity SADI services that in turn can be accessed through a SADI client.</p>
               </text>
               <graphic file="1471-2164-11-S4-S24-5"/>
            </fig>
            <sec>
               <st>
                  <p>SADI-compliant semantic web services</p>
               </st>
               <p>Although querying the triplestore can serve many useful information requests, such as searching for publications related to various biological entities, or just searching for links between the entities, we are aiming to make this data available in a format that is suitable for rapid data integration. This can be achieved by integrating our pipeline with other sources of semantically described biological data and analytical resources, so that queries can be made to our data combined with external data and data generated by externally hosted algorithms. For example, if some other resource is able to link proteins to pathways, combining it with our pipeline (that can link mutations to proteins) would make it possible to find a pathway in which a mutated protein participates. The SADI framework <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> provides a convenient way to facilitate such combinations. SADI is a set of conventions for creating Semantic Web Services (SWS) that can be <it>automatically discovered and orchestrated.</it> A SADI-compliant SWS consumes an RDF graph with some designated node (individual) as input. The output is an RDF graph similar to the input but with some new property assertions. The most important feature of SADI is that the predicates for these property assertions are fixed for each service. A declaration of these predicates, available online constitutes a <it>semantic description</it> of the service. For example, if a service is declared with the predicate <it>myontology</it>:<it>isTargetOf Drug</it> described in an ontology as a relation linking proteins to drugs, we know that we can use the service to search for drugs targeting a given protein. More importantly, such semantic descriptions allow <it>completely automatic discovery</it> and <it>composition</it> of SADI services (see, e.g., <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>). Practically, this means that the publication of our pipeline as SADI services will allow automatic integration with hundreds of external resources dealing with mutations, proteins and related biomedical entities, e.g., pathways and drugs. As an initial implementation with SADI, we created a service that takes a reference to a text, and outputs the property assertions derived from the input text, such as links to the identified grounded mutations. Note that those grounded mutations also have links to ungrounded mutations, proteins and impacts. This service can be mostly useful in combination with services that find documents, as well as for users just wishing to use our pipeline remotely (with no installation effort). In fact, we use this service ourselves to populate the previously mentioned RDF triple store. As the service output already constitutes an RDF graph no intermediate processing is necesssary.</p>
               <p>We also created services that provide mappings in different directions: from entities to texts and from entities to entities derived from texts. In fact, all these services produce instances of <it>MutationSpecification</it>, which are blank nodes linked to other objects that may be of interest. For example, we can ask about grounded mutations applying to a certain protein, and the extracted <it>MutationSpecification</it> instances will lead us to relevant impacts, or just to the documents mentioning them. Our entity-to-text and entity-to-entity services serve data from the same triplestore providing the SPARQL interface. Our services are registered at the SADI Registry and can be viewed at <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Automatic data integration example</p>
               </st>
               <p>To exemplify SADI service composition, we present an example of a query which in natural language reads: &#8220;Retrieve all mutated proteins, together with their 3D-structure information and mutant sequence, where mutations had a positive impact on haloalkane dehalogenase activity.&#8221;</p>
               <p>To answer this query, two services have to be used together. The first service is represented by the predicate <it>impactIsSpecifiedBy</it> (inverse for <it>specifiesImpact)</it> and, for a given mutation impact, retrieves a mutation specification containing protein and mutant information, which in part answers the service request. The second service is represented by the predicate <it>has3DStructure</it> from the central SADI ontology <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. It makes use of the protein information retrieved by the first service to further retrieve the related 3D structure information in the form of Protein Data Bank identifiers.</p>
               <p>The discovery and integration of these two services can be done automatically by the use of SHARE (Semantic Health and Research Environment) <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, a SPARQL query engine that enables composition of registered SADI services.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Evaluation</p>
            </st>
            <p>To evaluate the methods of mutation grounding and impact extraction a gold standard corpus was built as an extension to the corpus used by <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> containing documents about haloalkane dehalogenases. Full-text papers mainly about a single haloalkane dehalogenase were chosen. They also had to contain more than one point mutation in order for our grounding algorithm to work properly. The resulting corpus contains 13 documents and a domain expert was able to extract 54 unique (per document) mutation mentions and 73 unique mutant-impact relations from the text of these documents, with tables and figures excluded. Mutants containing more than one point mutation were split so that each mutation was considered as one mutant, this was made to better evaluate the impact extraction task without interference from the variety of ways to describe mutants.</p>
            <p>For both tasks we measure performance with precision and recall. In the case of mutation grounding precision is defined as the number of correctly grounded mutations over all grounded mutations and recall is defined as the number of correctly grounded mutations over all uniquely mentioned mutations. For mutant-impact relations precision is defined as the number of correct relations over all retrieved relations and recall is defined as the number of correct relations over all uniquely mentioned relations. In order for an extracted mutant-impact relation to be considered correct all the parts have to be correct i.e. the protein property that is being impacted, the direction of the impact and the causal mutation. The results are displayed in Table <tblr tid="T3">3</tblr>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Performance evaluation made on a haloalkane dehalogenase corpus</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="center">
                        <p>Task</p>
                     </c>
                     <c ca="right">
                        <p>Precision</p>
                     </c>
                     <c ca="right">
                        <p>Recall</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Mutation grounding</p>
                     </c>
                     <c ca="right">
                        <p>0.83</p>
                     </c>
                     <c ca="right">
                        <p>0.73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Mutant-Impact relation extraction</p>
                     </c>
                     <c ca="right">
                        <p>0.86</p>
                     </c>
                     <c ca="right">
                        <p>0.34</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The performance of the underlying algorithms for mutation grounding and mutation-impact detection show respectable levels of precision and recall. The performance of the grounding algorithm is in line with our previous evaluation on a medical corpus built from the COSMIC database <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> with an average <it>precision =</it> 0.84 and <it>recall =</it> 0.63. The lower performance of Mutant-Impact relations retrieval <it>(recall =</it> 0.34) in our current study is caused by several factors. Out of 45 <it>false negatives</it> (correct relations that were not retrieved) 16 were influenced by mutation mentions that were not grounded and 14 were caused by co-reference issues, e.g. when &#8220;<it>double mutant</it>&#8221; was used instead of mentions of single point mutations. Other contributing factors include shortcomings in our rules for extracting kinetic variables and protein functions which gave rise to 12 false negatives and lastly, our method for extracting directionality words which accounts for 8 false negatives. The two latter categories of false negatives can in some cases be illustrated by the special case when there is a total loss of function. This can be described in text as an inactive enzyme instead of a decrease of function relative to wildtype as in the below example sentences:</p>
         <p>&#8220;Replacement of Trp-125 or Trp-175 with arginine leads to a <b>nonactive</b> enzyme.&#8221; <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         <p>&#8220;Mutation of Asp260 to asparagine resulted in a catalytically <b>inactive</b> D260N <b>mutant.</b>&#8221; <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         <p>We believe these issues can be addressed by developing methods for co-reference resolution of mutation mentions and by improving mutation extraction and grounding algorithms, as well as extending gazetteers containing words describing directionality. Textual descriptions of kinetic variables could also be used as an extension to our current abbreviation-centric method and therefore improve recall of Mutant-Impact relation extraction. Finally, the special cases where the impact is a total loss of function can be handled by a new set of rules connecting terms describing enzymes/mutants and terms describing inactivity. Until now the tools for the extraction of mutation mentions from text have been considered appropriate for augmenting the manual curation of mutation databases, providing candidate protein point-mutation impact suggestions <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, <it>de novo.</it> However the number of <it>reuse</it> cases where mutation information is used to facilitate new annotation and prediction algorithms is growing [7,11,35,36] albeit dependent on semi-automatic processing of information from databases or text mining pipelines.</p>
         <p>The dedicated infrastructure we have developed for <it>fully automated</it> mutation impact extraction from unstructured text has a respectable level of precision of 0.86, albeit with moderate recall. Although further testing of these grounding and impact extraction algorithms on a larger corpus of documents from open access journals is required, using such platforms it will become possible to assess the range of impacts that have been investigated though mutational analysis of target protein sequences and the outcomes of these investigations. This will give researchers insight into the type and scale of improvements that have been made to enzymes using existing mutagenesis approaches. Moreover, cross referencing of these improvements with the methodologies used to generate the mutations will provide further guidance to scientists in deciding on strategies for further enzyme improvement, e.g. site directed mutagenesis versus directed evolution. Beyond the summarization of such information for trend analyses, extracted and grounded mutation impact annotations will also aid protein engineers when reviewing 3D visualizations of protein structures, as described by <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Finally the publishing of services delivering mutation impact information in a format that can be readily integrated with other services will facilitate the reuse of mutation impacts to other communities. e.g. as training data for Machine Learning algorithms <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, so that tools that predict the impacts of mutations can be improved.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The challenges we addressed, namely extraction and publication of mutation impacts, required the development and deployment of advanced solutions leveraging named entity recognition, grounding techniques, knowledge representation for mutation impacts as well as the setup and registration of semantic web services. The major innovations were to: design novel impact grounding techniques and to couple this with existing approaches for mutation grounding to protein sequences; exploit the utility of the SADI framework to expose the grounding and relation detection algorithms as semantic web services. Once operational these services are readily findable and easy to integrate with existing semantic web services in the SADI registry. This combination provides enhanced access to legacy information using a contemporary publishing medium.</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations used</p>
         </st>
         <p>GATE: General Architecture for Text Engineering; MuNPEx: Multi-lingual Noun Phrase Extractor; JAPE: Java Annotation Patterns Engine; MGDB: Mutation Grounding Database; OWL: Web Ontology Language; SWRL: Semantic Web Rule Language; SADI: Semantic Automated Discovery and Integration; RDF: Resource Description Framework; SPARQL: SPARQL Protocol and RDF Query Language; SWS: Semantic Web Service; SHARE: Semantic Health and Research Environment; COSMIC: Catalogue Of Somatic Mutations In Cancer;</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>JBL developed the rules for grounding of mutations and protein properties, contributed to the ontology design and corpora annotation. NN contributed to the pipeline design and corpora preparation. RW participated in coordinating the work and contributed to the ontology design. AR developed the web based deployment and wrote the corresponding section. AK contributed to the methods for relation scoring. CJOB led the work coordination and study design. All authors contributed to the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This research was funded in part by the New Brunswick Innovation Foundation, New Brunswick, Canada; the NSERC, Discovery Grant Program, Canada and the Quebec-New Brunswick University Co-operation in Advanced Education - Research Program, Government of New Brunswick, Canada.</p>
            <p>This article has been published as part of <it>BMC Genomics</it> Volume 11 Supplement 4, 2010: Ninth International Conference on Bioinformatics (InCoB2010): Computational Biology. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2164/11?issue=S4</url>.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Constructing a protein mutant database</p>
            </title>
            <aug>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishino</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Takenaka</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Norioka</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hirai</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Seto</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1993</pubdate>
            <volume>7</volume>
            <issue>5</issue>
            <fpage>733</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/protein/7.5.733</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The Challenge of Documenting Mutation Across the Genome: The Hu-man Genome Variation Society Approach</p>
            </title>
            <aug>
               <au>
                  <snm>Cotton</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Horaitis</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Hum Mutat</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <fpage>447</fpage>
            <lpage>452</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/humu.20038</pubid>
                  <pubid idtype="pmpid" link="fulltext">15108276</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Automatic extraction of mutations from Medline and cross-validation with OMIM</p>
            </title>
            <aug>
               <au>
                  <snm>Rebholz-Schuhmann</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Marcel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Albert</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tolle</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kirsch</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>135</fpage>
            <lpage>142</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkh162</pubid>
                  <pubid idtype="pmcid">373272</pubid>
                  <pubid idtype="pmpid" link="fulltext">14704350</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors</p>
            </title>
            <aug>
               <au>
                  <snm>Horn</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>557</fpage>
            <lpage>568</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg449</pubid>
                  <pubid idtype="pmpid" link="fulltext">14990452</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Mutation Mining-A Prospector's Tale</p>
            </title>
            <aug>
               <au>
                  <snm>Baker</snm>
                  <fnm>CJO</fnm>
               </au>
               <au>
                  <snm>Witte</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Information Systems Frontiers</source>
            <pubdate>2006</pubdate>
            <volume>8</volume>
            <fpage>47</fpage>
            <lpage>57</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1007/s10796-006-6103-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>MutationFinder: a high-performance system for extracting point mutation mentions from text</p>
            </title>
            <aug>
               <au>
                  <snm>Caporaso</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Jr</snm>
                  <fnm>WB</fnm>
               </au>
               <au>
                  <snm>Randolph</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hunter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <fpage>1862</fpage>
            <lpage>1865</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm235</pubid>
                  <pubid idtype="pmcid">2516306</pubid>
                  <pubid idtype="pmpid" link="fulltext">17495998</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>A Workflow for Mutation Extraction and Structure Annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Kanagasabai</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Choo</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Ranganathan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>CJO</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>6</issue>
            <fpage>1319</fpage>
            <lpage>1337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0219720007003119</pubid>
                  <pubid idtype="pmpid" link="fulltext">18172931</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Horn</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>FE</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>2</issue>
            <fpage>e16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pcbi.0030016</pubid>
                  <pubid idtype="pmcid">1794323</pubid>
                  <pubid idtype="pmpid" link="fulltext">17274683</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Application of automatic mutation-gene pair extraction to diseases</p>
            </title>
            <aug>
               <au>
                  <snm>Erdogmus</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sezerman</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>6</issue>
            <fpage>1261</fpage>
            <lpage>75</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S021972000700317X</pubid>
                  <pubid idtype="pmpid" link="fulltext">18172928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Extraction of human kinase mutations from literature, databases and genotyping studies</p>
            </title>
            <aug>
               <au>
                  <snm>Krallinger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Izarzugaza</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Rodriguez-Penagos</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <issue>Suppl 8</issue>
            <fpage>S1</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-S8-S1</pubid>
                  <pubid idtype="pmcid">2745582</pubid>
                  <pubid idtype="pmpid" link="fulltext">19758464</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Improved mutation tagging with gene identifiers applied to membrane protein stability prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Winnenburg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Plake</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shroeder</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <issue>Suppl 8</issue>
            <fpage>S3</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-S8-S3</pubid>
                  <pubid idtype="pmcid">2745585</pubid>
                  <pubid idtype="pmpid" link="fulltext">19758467</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts</p>
            </title>
            <aug>
               <au>
                  <snm>Yeniterzi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sezerman</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <issue>Suppl 8</issue>
            <fpage>S2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2105-10-S8-S2</pubid>
                  <pubid idtype="pmcid">2745584</pubid>
                  <pubid idtype="pmpid" link="fulltext">19758466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Replacement of tryptophan residues in haloalkane dehalogenase reduces halide binding and catalytic activity</p>
            </title>
            <aug>
               <au>
                  <snm>Kennes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pries</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Krooshof</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Bokma</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kingma</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Janssen</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Eur J Biochem</source>
            <pubdate>1995</pubdate>
            <volume>228</volume>
            <fpage>403</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1432-1033.1995.00403.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">7705355</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Activation of an Asp-124-Asn mutant of haloalkane dehalogenase by hydrolytic deamidation of asparagine</p>
            </title>
            <aug>
               <au>
                  <snm>Pries</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kingma</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Janssen</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>1995</pubdate>
            <volume>358</volume>
            <issue>2</issue>
            <fpage>171</fpage>
            <lpage>174</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0014-5793(94)01420-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">7828730</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>The Swiss-Prot Protein Knowledgebase and its supplement TrEMBL in 2003</p>
            </title>
            <aug>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Blatter</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Estreicher</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Michoud</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Phan</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pilbout</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>365</fpage>
            <lpage>370</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkg095</pubid>
                  <pubid idtype="pmcid">165542</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520024</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Multi-lingual Noun Phrase Extractor</p>
            </title>
            <url>http://www.semanticsoftware.info/munpex</url>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Subtle Differences in Dissociation Rates of Interactions between Destabilized Human Carbonic Anhydrase II Mutants and Immobilized Benzenesul-fonamide Inhibitors Probed by a Surface Plasmon Resonance Biosensor</p>
            </title>
            <aug>
               <au>
                  <snm>Svedhem</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Enander</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sjbom</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Liedberg</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lfs</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mrtensson</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Sjstrand</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Svensson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Carlsson</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Lundstrm</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Anal Biochem</source>
            <pubdate>2001</pubdate>
            <volume>296</volume>
            <issue>2</issue>
            <fpage>188</fpage>
            <lpage>196</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/abio.2001.5301</pubid>
                  <pubid idtype="pmpid" link="fulltext">11554714</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Towards a Systematic Evaluation of protein Mutation Extraction Systems</p>
            </title>
            <aug>
               <au>
                  <snm>Witte</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>CJO</fnm>
               </au>
            </aug>
            <source>J Bioinform Comput Biol</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>6</issue>
            <fpage>1339</fpage>
            <lpage>1359</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1142/S0219720007003193</pubid>
                  <pubid idtype="pmpid" link="fulltext">18172932</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Algorithm for Grounding Mutation Mentions from Text to Protein Sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Laurila</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Kanagasabai</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>CJO</fnm>
               </au>
            </aug>
            <source>Lecture Notes in Computer Science</source>
            <pubdate>2010</pubdate>
            <volume>6254/2010</volume>
            <fpage>122</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubid idtype="doi">full_text</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Snowball</p>
            </title>
            <url>http://snowball.tartarus.org/index.php</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Enhanced semantic access to the protein engineering literature using ontologies populated by text mining</p>
            </title>
            <aug>
               <au>
                  <snm>Witte</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kappler</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>CJO</fnm>
               </au>
            </aug>
            <source>Int J Bioinform Res Appl</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>3</issue>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1504/IJBRA.2007.015009</pubid>
                  <pubid idtype="pmpid" link="fulltext">18048198</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Mutation Impact Ontology</p>
            </title>
            <url>http://unbsj.biordf.net/ontologies/mutation-impact-ontology.owl</url>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema</p>
            </title>
            <aug>
               <au>
                  <snm>Broekstra</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kampman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>van Harmelen</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>The Semantic Web ISWC 2002</source>
            <pubdate>2002</pubdate>
            <fpage>54</fpage>
            <lpage>68</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>SPARQL Query Language for RDF, W3C Recommendation 15 January 2008</p>
            </title>
            <url>http://www.w3.org/TR/rdf-sparql-query/</url>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Mutation Impact RDF triplestore SPARQL endpoint</p>
            </title>
            <url>http://unbsj.biordf.net/openrdf-workbench/repositories/mutation-impact-db/query</url>
         </bibl>
         <bibl id="B26">
            <title>
               <p>SADI framework</p>
            </title>
            <url>http://sadiframework.org</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>SADI Semantic Web Services - &#8217;cause you can&#8217;t always GET what you want!</p>
            </title>
            <aug>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Vandervalk</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>McCarthy</snm>
                  <fnm>EL</fnm>
               </au>
            </aug>
            <source>APSCC</source>
            <pubdate>2009</pubdate>
            <fpage>13</fpage>
            <lpage>18</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>SHARE: A Semantic Web Query Engine for Bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Vandervalk</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>McCarthy</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Wilkinson</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>The Semantic Web (ISWC 2009)</source>
            <pubdate>2009</pubdate>
            <fpage>367</fpage>
            <lpage>369</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Registered SADI Services</p>
            </title>
            <url>http://unbsj.biordf.net/mutation-impact</url>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Central SADI Ontology</p>
            </title>
            <url>http://sadiframework.org/ontologies/predicates.owl</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The Catalogue of Somatic Mutations in Cancer (COSMIC)</p>
            </title>
            <aug>
               <au>
                  <snm>Forbes</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bhamra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bamford</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dawson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kok</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Clements</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Menzies</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Teague</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Futreal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Stratton</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Curr Protoc Hum Genet</source>
            <pubdate>2008</pubdate>
            <volume>57</volume>
            <fpage>10.11.1</fpage>
            <lpage>10.11.26</lpage>
         </bibl>
         <bibl id="B32">
            <title>
               <p>The importance of reactant positioning in enzyme catalysis: A hybrid quantum mechanicsymolecular mechanics study of a haloalkane dehalogenase</p>
            </title>
            <aug>
               <au>
                  <snm>Lau</snm>
                  <fnm>EY</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bash</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Bruice</snm>
                  <fnm>TC</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>9937</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.97.18.9937</pubid>
                  <pubid idtype="pmcid">27632</pubid>
                  <pubid idtype="pmpid" link="fulltext">10963662</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Repositioning the Catalytic Triad Aspartic Acid of Haloalkane Dehalogenase: Effects on Stability, Kinetics, and Structure</p>
            </title>
            <aug>
               <au>
                  <snm>Krooshof</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Kwant</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Damborsky</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Koca</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Janssen</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1997</pubdate>
            <volume>36</volume>
            <fpage>9571</fpage>
            <lpage>9580</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi971014t</pubid>
                  <pubid idtype="pmpid" link="fulltext">9236003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Intrinsic evaluation of text mining tools may not predict performance on realistic tasks</p>
            </title>
            <aug>
               <au>
                  <snm>Caporaso</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Deshpande</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Fink</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Bourne</snm>
                  <fnm>PE</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Hunter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2008</pubdate>
            <volume>13</volume>
            <fpage>640</fpage>
            <lpage>651</lpage>
         </bibl>
         <bibl id="B35">
            <title>
               <p>From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways</p>
            </title>
            <aug>
               <au>
                  <snm>Bauher-Mehren</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Rautschka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sanz</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <issue>Suppl 8</issue>
            <fpage>S6</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">19758470</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-10-S8-S6</pubid>
                  <pubid idtype="pmcid">2745588</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>SNAP: predict effect of non-synonymous polymorphisms on function</p>
            </title>
            <aug>
               <au>
                  <snm>Bromberg</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <fpage>3823</fpage>
            <lpage>3835</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm238</pubid>
                  <pubid idtype="pmcid">1920242</pubid>
                  <pubid idtype="pmpid" link="fulltext">17526529</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
