<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2105-11-423</ui><ji>1471-2105</ji><fm>
<dochead>Methodology article</dochead>
<bibl>
<title>
<p>Ranked retrieval of Computational Biology models</p>
</title>
<aug>
<au id="A1"><snm>Henkel</snm><fnm>Ron</fnm><insr iid="I1"/><insr iid="I2"/><email>ron.henkel@uni-rostock.de</email></au>
<au id="A2"><snm>Endler</snm><fnm>Lukas</fnm><insr iid="I2"/><email>lukas@ebi.ac.uk</email></au>
<au id="A3"><snm>Peters</snm><fnm>Andre</fnm><insr iid="I1"/><email>ap@informatik.uni-rostock.de</email></au>
<au id="A4"><snm>Le Nov&#232;re</snm><fnm>Nicolas</fnm><insr iid="I2"/><email>lenov@ebi.ac.uk</email></au>
<au ca="yes" id="A5"><snm>Waltemath</snm><fnm>Dagmar</fnm><insr iid="I1"/><email>dagmar.waltemath@uni-rostock.de</email></au>
</aug>
<insg>
<ins id="I1"><p>Database and Information Systems, University of Rostock, Rostock, Germany</p></ins>
<ins id="I2"><p>Computational Neurobiology, European Bioinformatics Institute, Hinxton, UK</p></ins>
</insg>
<source>BMC Bioinformatics</source>
<issn>1471-2105</issn>
<pubdate>2010</pubdate>
<volume>11</volume>
<issue>1</issue>
<fpage>423</fpage>
<url>http://www.biomedcentral.com/1471-2105/11/423</url>
<xrefbib><pubidlist><pubid idtype="pmpid">20701772</pubid><pubid idtype="doi">10.1186/1471-2105-11-423</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>12</day><month>5</month><year>2010</year></date></rec><acc><date><day>11</day><month>8</month><year>2010</year></date></acc><pub><date><day>11</day><month>8</month><year>2010</year></date></pub></history>
<cpyrt><year>2010</year><collab>Henkel et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>The study of biological systems demands computational support. If targeting a biological problem, the reuse of existing computational models can save time and effort. Deciding for potentially suitable models, however, becomes more challenging with the increasing number of computational models available, and even more when considering the models' growing complexity. Firstly, among a set of potential model candidates it is difficult to decide for the model that best suits ones needs. Secondly, it is hard to grasp the nature of an unknown model listed in a search result set, and to judge how well it fits for the particular problem one has in mind.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>Here we present an improved search approach for computational models of biological processes. It is based on existing retrieval and ranking methods from Information Retrieval. The approach incorporates annotations suggested by MIRIAM, and additional meta-information. It is now part of the search engine of BioModels Database, a standard repository for computational models.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>The introduced concept and implementation are, to our knowledge, the first application of Information Retrieval techniques on model search in Computational Systems Biology. Using the example of BioModels Database, it was shown that the approach is feasible and extends the current possibilities to search for relevant models. The advantages of our system over existing solutions are that we incorporate a rich set of meta-information, and that we provide the user with a relevance ranking of the models found for a query. Better search capabilities in model databases are expected to have a positive effect on the reuse of existing models.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<sec>
<st>
<p>Importance of model exchange and reuse</p>
</st>
<p>The study of a complex biological system now frequently includes the use of modelling and simulation techniques, in order to help understanding the system of interest, and to provide suggestions for promising experimental procedures <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. The rising complexity of <it>modelled </it>systems (see Figure <figr fid="F1">1</figr>, number of encoded species and reactions in BioModels Database <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>), and the fact that research activities overlap between different research groups demand for model reuse. Modellers do not want, or cannot build their models of biological systems from scratch, but, on the contrary, need to seek for existing bits and pieces to build their models on, especially when composing complex systems by combining smaller sub-models (see for example <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>).</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Growing number of Computational Biology models and their components in BioModels Database</p></caption><text>
   <p><b>Growing number of Computational Biology models and their components in BioModels Database</b>. Upper chart: Numbers of models in BioModels Database, as of January 2010. Lower chart: Number of species (black bar) and reactions (gray bar) in BioModels Database, as of January 2010. BioModels Database started with 20 models and a total of 322 species when it was launched in April 2005. In 2007 it already reached almost 200 models and 10482 species. The release in January 2010 recorded 453 models with 33702 species and 41069 reactions.</p>
</text><graphic file="1471-2105-11-423-1" hint_layout="single"/></fig>
<p>Standard formats for model exchange and open model repositories are crucial tools to make existing models available and accessible to the community as it becomes impossible to actually be aware of all existing models, and research groups involved in the modelling of a system of interest. Some standard formats developed for model representation are widely accepted. Examples include the <it>Systems Biology Markup Language </it>(SBML, <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>), CellML <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>, or BioPAX <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>. Computational models of biological systems (bio-models) in standardised representation formats are available from different model repositories, including BioModels Database <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>, the JWS Online Model Database <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>, or the CellML Model Repository <abbrgrp>
<abbr bid="B9">9</abbr>
</abbrgrp>.</p>
<p>However, although getting more frequent, model reuse is not yet common-place. The reasons are similar to those hampering code reuse in computer science, where insufficient code documentation and missing modularisation have been the biggest hindrances <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. Most models are created using computational modelling environments; the constituents' names are often generated automatically and therefore are semantically poor. Models with unspecific species names such as P<it>o</it>1, P<it>o</it>2, P<it>c</it>1, P<it>c</it>2 (for instance, see model BIOMD0000000060 in BioModels Database), or unspecific reaction names <it>re</it>1 to <it>re</it>76 (model BIOMD0000000227 in BioModels Database) are common-place. A documentation of the names' meaning, amongst other things, is essential.</p>
</sec>
<sec>
<st>
<p>Standardised meta-information representation helps grasping models' nature</p>
</st>
<p>To countervail the problems experienced in computer science, efforts for the documentation of models' nature were developed. A minimum set of meta-information that is requested to be provided by many journals with each published bio-model is the <it>Minimum Information Required in the Annotation of a Model </it>(MIRIAM, <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp>). Such meta-information provides a better understanding of a bio-model's complex and diverse <it>semantics </it>and, if computationally processed, enhances the model reuse.</p>
<p>MIRIAM meta-information encompasses general information about the model itself, e. g. the model's name, authors, or publication reference. But it also includes detailed descriptions of the model constituents, including the identification of encoded species, reactions, and compartments. MIRIAM itself is a textual recommendation, in form of a <it>Minimum Information </it>guideline following the MIBBI idea of coherent reporting guidelines for biological and biomedical investigations <abbrgrp>
<abbr bid="B12">12</abbr>
</abbrgrp>.</p>
<p>A technical, standardised way of providing the MIRIAM-recommended meta-information is the <it>MIRIAM standard annotation </it>
<abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>. The proposed format is a triplet referencing a piece of meta-information, also referred to as <it>annotation</it>, in an external resource. The reference to that meta-information is build of (1) the data type, (2) the identifier, and (3) a qualifier from a set of pre-defined qualifiers. Here the <it>data type </it>specifies the namespace within which to interpret the identifier. Some resources encode their knowledge as controlled vocabulary or ontologies. Among existing ontologies that are also used as data types by the MIRIAM standard are the <it>Systems Biology Ontology </it>(SBO, <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>), the <it>Gene Ontology </it>(GO, <abbrgrp>
<abbr bid="B15">15</abbr>
</abbrgrp>), or the NCBI Taxonomy <url>http://www.ncbi.nlm.nih.gov/Taxonomy/</url>. One advantage of using ontologies, i. e. "explicit specifications of a conceptualization" <abbrgrp>
<abbr bid="B16">16</abbr>
</abbrgrp>, over free text information is the standardised encoding of biological knowledge that is then put into relation with other ontology terms. The MIRIAM standard <it>identifier </it>refers to the actual entry within the data type. It corresponds to the identifier (ID) the entry has in the external resource. Finally, the <it>qualifier </it>is used to characterise the relation between the annotated model element and the encoded meta-information. The possible qualifiers are defined at BioModels.net and include relationships such as <monospace>is</monospace>, <monospace>isVersionOf</monospace>, or <monospace>hasPart</monospace>
<abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>.</p>
<p>For example, a <monospace>species</monospace> element encoded in a particular SBML model could stand for the compound "phosphosphoenolpyruvate" and in the model simply be called "PEP", offering little valuable information to the user. This compound, on the other hand, is described by the entry <monospace>CHEBI:18021</monospace> in the <it>Chemical Entities of Biological Interest</it> (ChEBI, <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>) ontology. Referring to this particular identifier in that data resource by linking the resource and ID to the <monospace>species</monospace> element via the qualifier <monospace>is</monospace>, gives software and users access to a wealth of information independent of the elements name, such as synonyms, molecular and structural formulae and cross-links to other databases. Technically, the link is encoded in a standard form using URNs, e. g. <monospace>urn:miriam:obo.chebi:CHEBI%3A18021</monospace> for the given annotation. Another example is the annotation of a <monospace>reaction</monospace> element in an SBML document. Given a <monospace>reaction</monospace> element in a particular model stands for the "phosphorylation of glucose by hexokinase during glycolysis". This enzymatic reaction is also described by the GeneOntology entry GO:0004396 (hexokinase activity). Attaching the URN <monospace>urn:miriam:obo.go:GO%3A0004396</monospace> to the reaction element using the qualifier <monospace>isVersionOf</monospace>, semantically enriches it and again gives access to further information, like alternative terms and enzyme nomenclature codes.</p>
<sec>
<st>
<p>Extending the MIRIAM information</p>
</st>
<p>In order to enable a fine-grained retrieval of bio-models, <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp> proposes to consider even more information than MIRIAM's required one. Among them are versioning information on both the model and its annotations, but also information on the model encoding format, and information that is only related to the model, such as model behavior under certain conditions, simulation experiments applicable to the model, or simulation results available for the model. A detailed description of different kinds of meta-information considered in this work, even beyond MIRIAM is given in <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>.</p>
</sec>
</sec>
<sec>
<st>
<p>Finding models in model repositories using Information Retrieval techniques</p>
</st>
<p>We argued that a crucial step for a computational system to return relevant models upon a user's query is the availability - and then incorporation - of meta-information on top of a model's structure <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>. With the advent and growth of Computational Systems Biology research, the number of bio-models available rapidly increases. For example, the number of bio-models available from BioModels Database is steadily growing, doubling about every 18 month (see Figure <figr fid="F1">1</figr>, number of models in BioModels Database). As a consequence, searching an existing model base for relevant models can result in a rather big number of models. Therefore, it is very important to support the user in <it>finding relevant </it>models in existing resources. It is common-place to leave the user with an unordered result set of models, without any explanation of why a particular model was found. For complex models the user is typically unable to grasp the model's nature at first sight <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>. Having no information to assess <it>how good </it>a model matched his query, he cannot decide on its relevance. <it>Information Retrieval </it>techniques, which have been widely and successfully used in other areas, offer exactly these benefits for bio-model retrieval.</p>
<p>Information Retrieval is "the process to recover an information stored in a system (i. e. a database) on users demand" <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. One application for which the successful ranked retrieval of annotated documents has already been shown is <it>Multimedia Information Retrieval </it>(MIR). MIR models describe songs, images or videos annotated with different kinds of information, including meta-information like author or title, but also temporal or spectral information, as well as keywords. Currently, MIR distinguishes three independent classes of similarity measures depending on the kinds of identified features <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp>:</p>
<p>
<b>Metadata-based similarity measure (MBSM) </b>defines queries by connecting keywords gained from the media object with Boolean operators like &#8896;, &#8897;. Text retrieval techniques are then used to compare these query keywords with features of the multimedia objects.</p>
<p>
<b>Content-based similarity measure (CBSM) </b>utilizes so-called low-level features, i e. automatically extractable items, such as rhythm. Queries make use of these features to search the content of music pieces. Different methods have been developed to retrieve the items represented by low-level features, e. g. humming, tapping or query-by-example.</p>
<p>
<b>Semantic-description-based similarity measure (SDSM) </b>evaluates meta-information on multimedia objects that are described with predefined words of different vocabularies.</p>
<p>Motivated by the above observations, we propose a novel retrieval and ranking framework that takes into account different model meta-information to perform similarity-measure-based operations on bio-models. We are aware that <it>data </it>retrieval techniques have already successfully been applied to Life Science data in general <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. Existing approaches do, however, not consider the retrieval and ranking of <it>models</it>.</p>
</sec>
</sec>
<sec>
<st>
<p>Results and discussion</p>
</st>
<p>Here we apply an adapted version of the aforementioned solutions for MIR on bio-model retrieval. To re-use MBSM for bio-model retrieval, the MIRIAM required meta-information on the model and its constituents is essential. Furthermore, we use parts of the meta-information suggested by <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp> and <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>. When adapting CBSM techniques to bio-model retrieval, low level features (such as the encoded species, reactions, and so on) can be used. Finally, SDSM techniques can be used by tagging the models manually with relevant terms.</p>
<sec>
<st>
<p>Definitions</p>
</st>
<p>Our study necessitates a collection of <it>k </it>models from a pool of bio-models <it>M </it>and associated meta-information that is sufficient to rank the retrieved results with respect to a user's query. An annotated bio-model is defined as:</p>
<p>
<b>Definition 1 </b>(Annotated bio-model). <it>An annotated bio-model m </it>&#8712; <it>M is described as a tuple m = (m<sub>S</sub>, m<sub>A</sub>) of</it>
</p>
<p indent="1">
<it>1. model source code m<sub>S </sub>in a machine-readable format</it>
</p>
<p indent="1">
<it>2. annotation information m<sub>A </sub>describing the nature of a bio-model, and of its constituents</it>.</p>
<p>In the following, we will not distinguish annotations of the model <it>m </it>from annotations of the model's constituents. All annotations will be processed equally, denoted as <it>m<sub>A</sub>
</it>. The annotation information <it>m<sub>A </sub>
</it>might be referred to as third party knowledge linked to <it>m<sub>S</sub>
</it>.</p>
<p>A feature is defined as:</p>
<p>
<b>Definition 2 </b>(Feature). <it>A feature f </it>&#8712; <it>F is an attribute or aspect of a model m instantiated either through its model encoding m<sub>S </sub>or its annotation information m<sub>A</sub>
</it>.</p>
<p>
<b>Definition 3 </b>(Term). <it>Let T be a set of words called terms, then <inline-formula>
<m:math name="1471-2105-11-423-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="script">P</m:mi>
   <m:mo stretchy="false">(</m:mo>
   <m:mi>T</m:mi>
   <m:mo stretchy="false">)</m:mo>
   <m:mo>=</m:mo>
   <m:mo>{</m:mo>
   <m:mi>&#961;</m:mi>
   <m:mo>:</m:mo>
   <m:mi>&#961;</m:mi>
   <m:mo>&#8838;</m:mo>
   <m:mi>T</m:mi>
   <m:mo>}</m:mo>
</m:mrow>
</m:math>
</inline-formula> is the set of all subsets of T called power set</it>.</p>
<p>A model collection is then:</p>
<p>
<b>Definition 4 </b>(Model collection). <it>A model collection C<sub>M </sub>is a representation of M. Each m<sub>j </sub>
</it>&#8712; <it>M can be mapped on a c<sub>j </sub>
</it>&#8712; <it>C<sub>M </sub>by splitting the model m<sub>j </sub>into features f </it>&#8712; <it>F and their instances <inline-formula>
<m:math name="1471-2105-11-423-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>&#961;</m:mi>
      <m:mi>f</m:mi>
   </m:msub>
   <m:mo>&#8712;</m:mo>
   <m:mi mathvariant="script">P</m:mi>
   <m:mo stretchy="false">(</m:mo>
   <m:mi>T</m:mi>
   <m:mo stretchy="false">)</m:mo>
</m:mrow>
</m:math>
</inline-formula>. So <inline-formula>
<m:math name="1471-2105-11-423-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>c</m:mi>
      <m:mi>j</m:mi>
   </m:msub>
   <m:mo>=</m:mo>
   <m:mo>{</m:mo>
   <m:mo stretchy="false">(</m:mo>
   <m:msub>
      <m:mi>f</m:mi>
      <m:mn>1</m:mn>
   </m:msub>
   <m:mo>,</m:mo>
   <m:msub>
      <m:mi>&#961;</m:mi>
      <m:mrow>
         <m:msub>
            <m:mi>f</m:mi>
            <m:mn>1</m:mn>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo stretchy="false">)</m:mo>
   <m:mo>,</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>.</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>.</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>.</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>,</m:mo>
   <m:mo stretchy="false">(</m:mo>
   <m:msub>
      <m:mi>f</m:mi>
      <m:mi>n</m:mi>
   </m:msub>
   <m:mo>,</m:mo>
   <m:msub>
      <m:mi>&#961;</m:mi>
      <m:mrow>
         <m:msub>
            <m:mi>f</m:mi>
            <m:mi>n</m:mi>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo stretchy="false">)</m:mo>
   <m:mo>}</m:mo>
   <m:mo>.</m:mo>
</m:mrow>
</m:math>
</inline-formula>
</it>
</p>
<p>Those Definitions (1, 2, 3, 4) hold for each model <it>m<sub>j </sub>
</it>&#8712; <it>M </it>classified into features and represented by <it>c<sub>j </sub>
</it>&#8712; <it>C<sub>M</sub>
</it>.</p>
<p>We furthermore define a query as (definition 5):</p>
<p>
<b>Definition 5 </b>(Query). <it>A query <inline-formula>
<m:math name="1471-2105-11-423-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>q</m:mi>
   <m:mo>=</m:mo>
   <m:mo>{</m:mo>
   <m:msub>
      <m:mi>q</m:mi>
      <m:mrow>
         <m:msub>
            <m:mi>f</m:mi>
            <m:mn>1</m:mn>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo>,</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>.</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>.</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>.</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:mo>,</m:mo>
   <m:mtext>&#8192;</m:mtext>
   <m:msub>
      <m:mi>q</m:mi>
      <m:mrow>
         <m:msub>
            <m:mi>f</m:mi>
            <m:mi>n</m:mi>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo>}</m:mo>
   <m:mo>&#8712;</m:mo>
   <m:mi>Q</m:mi>
</m:mrow>
</m:math>
</inline-formula> is a set of query parts <inline-formula>
<m:math name="1471-2105-11-423-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>q</m:mi>
      <m:mi>f</m:mi>
   </m:msub>
   <m:mo>&#8712;</m:mo>
   <m:mi>F</m:mi>
   <m:mo>&#215;</m:mo>
   <m:mi mathvariant="script">P</m:mi>
   <m:mo stretchy="false">(</m:mo>
   <m:mi>T</m:mi>
   <m:mo stretchy="false">)</m:mo>
</m:mrow>
</m:math>
</inline-formula> with q<sub>f </sub>= </it>(<it>f, &#961;<sub>f </sub>
</it>); <it>f </it>&#8712; <it>F and <inline-formula>
<m:math name="1471-2105-11-423-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mi>&#961;</m:mi>
      <m:mi>f</m:mi>
   </m:msub>
   <m:mo>&#8712;</m:mo>
   <m:mi fontfamily="Euclid Math One">P</m:mi>
   <m:mo stretchy="false">(</m:mo>
   <m:mi>T</m:mi>
   <m:mo stretchy="false">)</m:mo>
</m:mrow>
</m:math>
</inline-formula>. All query parts q<sub>f </sub>of a query q are pairwise disjoint</it>.</p>
<p>
<it>q </it>&#8712; <it>Q </it>represents the user query. The parts <it>q<sub>f </sub>
</it>of <it>q </it>can either be mapped on the full set of defined features <it>F</it>, or on a subset of <it>F</it>.</p>
<p>Assuming a collection <it>C<sub>M </sub>
</it>of processed models <it>M </it>and extracted model features <it>f<sub>i</sub>, ..., f<sub>n </sub>
</it>&#8712; <it>F</it>, we now define bio-model retrieval.</p>
<p>
<b>Definition 6 </b>(Bio-model retrieval based on <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>). <it>An Information Retrieval model is a quadruple (C<sub>M</sub>, Q, FW, R</it>(<it>q, c</it>)<it>) where</it>
</p>
<p indent="1">
<it>1. C<sub>M </sub>is a feature-classified representation of M</it>
</p>
<p indent="1">
<it>2. Q is a set of queries q, where each part q</it>
<sub>f&#8712;<it>F </it>
</sub>&#8712; <it>q can be mapped on a f </it>&#8712; <it>F</it>
</p>
<p indent="1">
<it>3. FW is a framework for model representations, queries and their relationships</it>
</p>
<p indent="1">
<it>4. R</it>(<it>q, c</it>) <it>is a set of ranking functions defining an order among c </it>&#8712; <it>C<sub>M </sub>with regard to q</it>.</p>
<p>The framework <it>FW </it>realises the retrieval functionality. Each ranking function <it>r</it>, when applied to a query <it>q</it>, returns a ranked list of model representations <it>c</it>. The order of retrieved results is determined by the ranking function itself, the underlying collection and by the particular query. From the ranked list of feature-based model representations <it>c<sub>j</sub>
</it>, we deduce the ranking of the corresponding models <it>m<sub>j </sub>
</it>represented by <it>c<sub>j</sub>
</it>.</p>
</sec>
<sec>
<st>
<p>Conceptual architecture of the framework</p>
</st>
<p>To perform ranked retrieval of annotated bio-models, we use a combination of text retrieval, ontologies, simulation dependent data, and model meta-data. The conceptual architecture for the developed retrieval and ranking framework is shown in Figure <figr fid="F2">2</figr>.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Conceptual architecture</p></caption><text>
   <p><b>Conceptual architecture</b>. Overview of the conceptual architecture of the proposed ranking- and retrieval system. A version has been implemented in BioModels Database. The architecture shows the process of transforming a <it>user given query </it>by creating sub-queries, which are then assembled by enrichment of structural information and semantic indexing (see also Figure 3). The re-assembled query is then sent to the retrieval and ranking module, which makes use of the Extended Boolean Model to retrieve a list of matching models, and the Vector Space Model to rank the list of retrieved models. To determine the ranking, different weight information is used. Those are, however, not shown in the given Figure.</p>
</text><graphic file="1471-2105-11-423-2" hint_layout="double"/></fig>
<p>For a user-given query <it>q</it>, consisting of a set of feature-assigned terms (<it>f</it>, <it>&#961;</it>), we return a ranked list of models. The ordered list of models <it>m<sub>j </sub>... m<sub>k </sub>
</it>is inferred from the order that is defined by the ranking function <it>r</it>(<it>c<sub>j</sub>, q</it>) &gt; ... &gt;<it>r</it>(<it>c<sub>k</sub>
</it>, <it>q</it>), where <it>c<sub>j </sub>
</it>is the most relevant model representation with regard to the query <it>q </it>(see definition 6). To achieve this order, each query <it>q </it>is first disassembled into a set of sub-queries <it>q<sub>1 </sub>
</it>to <it>q<sub>n</sub>
</it>. Each sub-query <it>q<sub>i </sub>
</it>now contains a set of terms that will be mapped on a particular feature <it>f<sub>i</sub>
</it>. However, the query parts <it>q<sub>i </sub>
</it>are not directly executed on the data resources, but rather expanded using the Query Expander. So far we distinguish two different kinds of sub-queries:</p>
<p>
<b>Semantic sub-query </b>is any query addressing model constituents. This type of query is applied to the <smcaps>SEMANTIC INDEX</smcaps>.</p>
<p>
<b>Ontology sub-query </b>is any query enriching the user query by finding related ontological terms. This type of query is applied to the <smcaps>BIOLOGY ONTOLOGIES</smcaps>.</p>
<p>All expanded sub-queries are assembled into a final query <it>q</it>* which is sent to the retrieval and ranking system. The Extended Boolean Model <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp> is used to select all models that are relevant to the query, and then the Vector Space Model <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp> is used to define the ranking on those models. Both IR models work on the <smcaps>MODEL INDEX</smcaps> which contains all models and their associated URIs. The result of the process is a ranked list of model IDs.</p>
</sec>
<sec>
<st>
<p>Architectural components of the framework</p>
</st>
<sec>
<st>
<p>Types of user queries</p>
</st>
<p>We process and store information from different resources, and map them on our internal structures; i. e. full-text indexes and databases. As a result it becomes feasible to answer very specific queries. We distinguish two different types of queries. A query may consist of a number of terms (<it>query by value, QBV</it>) or of a complete set of features representing a model (<it>query by model example, QBME</it>).</p>
<p>
<b>Query by value (QBV) </b>Using QBV, the user query <it>q </it>consists of features and free-text terms (<it>f</it>, <it>&#961;</it>). The user given features <it>f </it>are a subset of all available features <it>F</it>.</p>
<p>
<b>Query by model example (QBME) </b>Using QBME, a model forms the basis of a search for similar results, i. e. the complete set of features <it>F </it>is aligned.</p>
<p>Questions a user might have in mind are "Which models describe calcium concentrations in pancreatic cells?" (QBV), or "Are there any models dealing with the effects of caffeine on blood pressure in humans?" (QBV). One could also easily imagine to search for a model that "is similar to model BIOMD0000000227" (QBME).</p>
</sec>
<sec>
<st>
<p>Model index: incorporating model meta-information</p>
</st>
<p>The <smcaps>MODEL INDEX</smcaps> contains references to all models <it>m<sub>i </sub>
</it>&#8712; <it>M</it>, as well as encoded information about constituents and meta-information.</p>
<p>Relevant features representing a bio-model were defined and grouped into several content-related <it>dimensions </it>to facilitate the creation of the bio-model collection <it>C<sub>M</sub>
</it>. Each of those dimensions has a certain importance associated to it, i. e. a measure of how relevant the information it carries is (see Table <tblr tid="T1">1</tblr>). <it>(1) Model constituents </it>is an important dimension which contains several features describing a model's constituents, e. g. species or reactions. <it>(2) </it>Information about authors, encoders or submitters of a model are grouped into a <it>persons </it>dimension. <it>(3) </it>Publications or published abstracts are contained in the <it>publication </it>dimension. The <it>(4) user generated content </it>holds information like keywords or tags. To restrict search results timewise a <it>(5) dates </it>dimension holds time information, for example submission or modification dates. Finally, the <it>(6) administrative data </it>dimension contains specific information about the model file or the representation format used to encode the model.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Importance of different information dimensions</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>dim </b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Part </b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Importance </b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Description </b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>1</p>
         </c>
         <c ca="left">
            <p>
               <monospace>Administrative data</monospace>
            </p>
         </c>
         <c ca="left">
            <p>low</p>
         </c>
         <c ca="left">
            <p>administrative data like id, file</p>
            <p>name, file version, encoding formalism</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>2</p>
         </c>
         <c ca="left">
            <p>
               <monospace>Persons</monospace>
            </p>
         </c>
         <c ca="left">
            <p>medium</p>
         </c>
         <c ca="left">
            <p>covers the author, encoder and submitter</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>3</p>
         </c>
         <c ca="left">
            <p>
               <monospace>Dates</monospace>
            </p>
         </c>
         <c ca="left">
            <p>low</p>
         </c>
         <c ca="left">
            <p>submission or modification date</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>4</p>
         </c>
         <c ca="left">
            <p>
               <monospace>Publication</monospace>
            </p>
         </c>
         <c ca="left">
            <p>high</p>
         </c>
         <c ca="left">
            <p>main publication or description of the model</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>5</p>
         </c>
         <c ca="left">
            <p>
               <monospace>Constituents</monospace>
            </p>
         </c>
         <c ca="left">
            <p>very high</p>
         </c>
         <c ca="left">
            <p>information about the model constituents</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>6</p>
         </c>
         <c ca="left">
            <p>
               <monospace>User generated content</monospace>
            </p>
         </c>
         <c ca="left">
            <p>very high</p>
         </c>
         <c ca="left">
            <p>additional user-provided information,</p>
            <p>e. g. keywords</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Information dimensions sorted by relevance. The information that is relevant for the characterisation of a bio- model's ranking is grouped into six different dimensions (dim). Each dimension has a different influence on the ranking. The least important dimension is the <it>administrative data</it>, the most important dimensions are the one encoding information about the model <it>constituents </it>and the one created from <it>user generated contents</it>.</p>
   </tblfn></tbl>
<p>The concept of dimension is a rather general one. Each dimension can, however, be refined into <it>features f</it>. A full list of features that make up the model index for all aforementioned dimensions can be found in Table <tblr tid="T2">2</tblr>. For example, the dimension <it>model constituents </it>is split into several features, among them <it>species, compartment, reaction</it>. Limiting a query to certain model features allows a user to be more specific. For example, it is possible to restrict a query <monospace>caffeine</monospace> to the feature <it>species </it>- and to disregard a "tribute to caffeine for the writing" in the <it>publication </it>feature. The values for each defined feature can be automatically extracted from a bio-model <it>m </it>if <it>m </it>complies with the given model definition 1. The additional assignment of weights for each distinct feature helps to determine similarity values, as will be explained later.</p>
<tbl id="T2"><title><p>Table 2</p></title><caption><p>Assigned feature weights by dimension</p></caption><tblbdy cols="3">
      <r>
         <c ca="left">
            <p>
               <b>Dimension </b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Feature </b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Weight </b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Constituents</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>(description)</p>
         </c>
         <c ca="left">
            <p>modelName</p>
         </c>
         <c ca="left">
            <p>4</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>species</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>compartment</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>reaction</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>parameter</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>event</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>function</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>modelDescription</p>
         </c>
         <c ca="left">
            <p>0.5</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>(URI)</p>
         </c>
         <c ca="left">
            <p>modelURI</p>
         </c>
         <c ca="left">
            <p>5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>speciesURI</p>
         </c>
         <c ca="left">
            <p>5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>compartmentURI</p>
         </c>
         <c ca="left">
            <p>5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>reactionURI</p>
         </c>
         <c ca="left">
            <p>5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>parameterURI</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>eventURI</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>functionURI</p>
         </c>
         <c ca="left">
            <p>3</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Persons</p>
         </c>
         <c ca="left">
            <p>Author</p>
         </c>
         <c ca="left">
            <p>4</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>Encoder</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>submitter</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Publications</p>
         </c>
         <c ca="left">
            <p>publicationURI</p>
         </c>
         <c ca="left">
            <p>5</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>publicationText</p>
         </c>
         <c ca="left">
            <p>2.5</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>content</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>User generated</p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Content</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
         <c ca="left">
            <p>-</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Dates</p>
         </c>
         <c ca="left">
            <p>CreationDate</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c/>
         <c ca="left">
            <p>modificationDate</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Administrative data</p>
         </c>
         <c ca="left">
            <p>ID</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>additionalID</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c/>
         <c ca="left">
            <p>path</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>content</p>
         </c>
         <c ca="left">
            <p>1</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Feature weights for the different model dimensions. Each dimension is further separated into the features it covers. For each feature, a concrete relevance value, i. e. weight, is given. For example, in the <it>Constituents </it>dimension, one important feature for the model description is the <monospace>modelName</monospace>. The different URIs (<monospace>modelURI</monospace>, <monospace>speciesURI</monospace>, <monospace>compartmentURI</monospace> and <monospace>reactionURI</monospace>) also play an important role in determining the ranking. A less influential feature is the <monospace>modelDescription</monospace>, as for example found in the SBML &lt;notes> tag.</p>
   </tblfn></tbl>
</sec>
<sec>
<st>
<p>Semantic index: identifying biological entities</p>
</st>
<p>Bio-model entities can be described by annotation information <it>m<sub>A </sub>
</it>encoded in MIRIAM standard URIs and stored in the Model Index. When searching for a model, a user cannot be expected to know the URIs for each biological entity of interest. On the contrary, searches for a constituent or bio-model must be possible using <it>characterising terms</it>, i. e. keywords. Therefore, the URIs must be parsed and the extracted information processed. The textual representation of each known constituent found in the external resources is resolved from its URI, and then indexed. By making it available for searching, <it>keywords </it>describing a model constituent can be used to retrieve models. For example, when searching for models dealing with caffeine, one may type either <monospace>caffeine</monospace>, <monospace>1,3,7-trimethylpurine-2,6-dione</monospace>, or even <monospace>C<sub>8</sub>H<sub>10</sub>N<sub>4</sub>O<sub>2</sub>
</monospace>.</p>
<p>To map the textual descriptions, and also synonyms of a term, on a set of URIs representing the best matches for a defining term, a so-called <smcaps>SEMANTIC INDEX</smcaps> is used (see Table <tblr tid="T3">3</tblr> for the structure of the Semantic Index). This index contains all URIs found in the models included. It furthermore is build of a column for each existing qualifier. Every model <it>m </it>that contains a particular URI is added to the set of model IDs in the relevant qualifier column. The semantic index therefore enables to link a URI, resolved from search terms, to a set of bio-models within the collection <it>C<sub>M</sub>
</it>.</p>
<tbl id="T3"><title><p>Table 3</p></title><caption><p>Semantic index</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>URI </b>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>qualifier </b>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>content</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="3">
            <hr/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>bqbiol_is</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>bqbiol_isVersionOf</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>bqmodel_is</b>
            </p>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>urn:miriam:obo.chebi. CHEBI:27732</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c ca="left">
            <p>BIOMD0000000241</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c ca="left">
            <p>BIOMD0000000241</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>caffeine chebi 27732</p>
            <p>chebi home advanced search</p>
            <p>browse ontology periodic</p>
            <p>... moleculeschebimain</p>
            <p>caffeine chebi 116485</p>
            <p>central nervous system</p>
            <p>stimulant caffeine ryanodine</p>
            <p>receptor modulator mutagen</p>
            <p>1,3,7-trimethyl-3,7</p>
            <p>dihydro-1 h-purine-2,6 iuphar</p>
            <p>1,3,7-trimethylxanthine dion</p>
            <p>msdchem d00528 kegg drug</p>
            <p>[..]</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>urn:miriam:kegg. compound:C07481</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c ca="left">
            <p>BIOMD0000000241</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>BIOMD0000000241</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c ca="left">
            <p>kegg compound c07481 entry</p>
            <p>c07481 compound name caffeine</p>
            <p>1,3,7-trimethylxanthine</p>
            <p>formula c8h10n4o2 mass 194.0804</p>
            <p>structure remark d00528 comment</p>
            <p>source coffea arabica tax 13443</p>
            <p>xanthines reaction r07920 r07921</p>
            <p>27732 knapsack c00001492 [..]</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>urn:miriam:kegg. compound:C00385</p>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c ca="left">
            <p>BIOMD0000000015</p>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>kegg compound c00385</p>
            <p>name xanthine formula</p>
            <p>c5h4n4o2 mass 152.0334</p>
            <p>ko00230 purine metabolism</p>
            <p>caffeine metabolism [...]</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>urn:miriam:kegg. compound:C00048</p>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>BIOMD0000000221</p>
            <p>BIOMD0000000222</p>
            <p>BIOMD0000000219</p>
            <p>BIOMD0000000218</p>
            <p/>
            <p/>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>kegg compound c00048 entry</p>
            <p>c00048 glyoxylate glyoxylic acid</p>
            <p>formula c2h2o3 mass 74.0004</p>
            <p>structure reaction r00013 r00364</p>
            <p>purine metabolism path ko00232</p>
            <p>caffeine metabolism glycine serine</p>
            <p>null_1 threonine metabolism [...]</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p> . . .</p>
         </c>
         <c cspan="3">
            <p/>
         </c>
         <c>
            <p>. . .</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The semantic index is used to connect each existing URI in the database to the models in which it occurs. Thus, each column contains a set of IDs identifying a bio-model in <it>C<sub>M</sub></it>. We additionally store <it>how </it>the URI is connected to the annotated model constituent (through the <it>qualifier </it>column). For each URI, the content, i. e. textual representation, that had been extracted from the ontology term corresponding to the URI, is normalised and indexed as well. A query can then be enriched by further related URIs (see also Figure 2, Ontology Query), resulting in an expanded query.</p>
   </tblfn></tbl>
<p>Having build the Semantic Index, queries may now be limited to models that use a particular qualifier to link a constituent to an annotation. For example, a user searching for <monospace>caffeine</monospace> can limit the result to models qualifying the annotation with <monospace>is</monospace> and <monospace>isHomolog</monospace>. The models using the query term in conjunction with <monospace>is</monospace> could be ranked higher. This procedure also allows for weighting URIs differently according to their associated qualifiers.</p>
<p>The result of a query on the Semantic Index is a weighted, ranked list of URIs for each query term. That list is passed on to the Model Index where it represents a sub-query result that together with other sub-query results is assembled into a similarity value.</p>
</sec>
<sec>
<st>
<p>Biology ontology: incorporating similar constituents</p>
</st>
<p>Sometimes it might be useful to also include models with constituents that are <it>similar</it>, though not identical, to the one described by the original search terms, for example, if a search resulted in only a few models containing a particular constituent. <smcaps>BIOLOGY ONTOLOGIES</smcaps> expand a query by deriving similar constituents. A user searching for models encoding the constituent <monospace>caffeine</monospace> may also be interested in models containing the constituent <monospace>xanthine</monospace>, which is structurally related to caffeine.</p>
<p>To compare the relevance of a search term with terms in a particular ontology, we use a solution proposed by Schulz, Liebermeister (discussed in personal communication), who suggest to map different ontology Web resources on one common ontology. Using that ontology, the similarities of ontology terms are measured. The approach also takes into account different relations between the terms. In our work, we used that approach to compute weights for ontology entries within a certain range of a given term. Apart from that method, other works from IR research exist which might be incorporated in later studies, e. g. <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Incorporating weights</p>
</st>
<p>After retrieval, the relevant bio-models are ranked. The ranking function comprises weights derived from different sources. (1) The <smcaps>MODEL INDEX</smcaps> itself is used to incorporate weights derived from IR techniques such as term frequency - inverse document frequency <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>. (2) The importance of each feature is expressed by its weight (see Table <tblr tid="T2">2</tblr>). (3) A user may in addition assign a weight to a term in the query in order to increase that term's importance. (4) Preliminary results of the single sub-queries assigned to particular data resources are evaluated. (5) Weights derived from ontologies (see <smcaps>BIOLOGY ONTOLOGIES</smcaps>) may change the result ranking, e. g. models retrieved by ontologically derived terms can be ranked lower than others.</p>
</sec>
<sec>
<st>
<p>Ranking the results</p>
</st>
<p>All weights assigned to a model are used to determine the model's position in the vector space that is spanned by the Vector Space Model. Having all model positions identified the similarity can then be computed and the ranking inferred, based on the models' positions.</p>
</sec>
</sec>
<sec>
<st>
<p>Implementation: enabling model retrieval in BioModels Database</p>
</st>
<p>The introduced implementation is based on prior work on a general framework for testing different ranking functions on a given model base, called Sombi <url>http://sourceforge.net/projects/sombi</url>.</p>
<p>Here we present an implementation for BioModels Database. We assume that the model source code <it>m<sub>S </sub>
</it>is provided in the open, standardised model representation format SBML. Furthermore, annotations <it>m<sub>A </sub>
</it>should be encoded using the MIRIAM standard annotation, i. e. MIRIAM URIs. The implementation is based on the architecture presented in the previous section. All source code is freely available from the Biomodels.net SVN Sourceforge repository <url>https://biomodels.svn.sourceforge.net</url>. The retrieval and ranking system is available online at <url>http://www.ebi.ac.uk/biomodels-demo/</url>.</p>
<p>The advantage of using BioModels Database as a proof of concept lies in the amount of stored models - currently 241 curated, i. e. verified, models and additional 213 non-curated models (as of 2010-04-01). All models are encoded in SBML. All models in the curated branch are annotated, and as a consequence provide sufficient meta-information for a thorough testing of the ranking and retrieval system.</p>
<p>Furthermore, analysing the stored information together with the BioModels.net team led to tentative weights for the different features (see Table <tblr tid="T2">2</tblr>), and helped on pinpointing the importance of different qualifiers (shown in Table <tblr tid="T4">4</tblr>).</p>
<tbl id="T4"><title><p>Table 4</p></title><caption><p>Qualifiers and their assigned importance</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>
               <b>Qualifier </b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Weight</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>is</p>
         </c>
         <c ca="left">
            <p>2.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isHomologTo</p>
         </c>
         <c ca="left">
            <p>1.7</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>hasPart</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isPartOf</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isVersionOf</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>hasVersion</p>
         </c>
         <c ca="left">
            <p>1.5</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isEncodedBy</p>
         </c>
         <c ca="left">
            <p>1.3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isDerivedFrom</p>
         </c>
         <c ca="left">
            <p>1.3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>encodes</p>
         </c>
         <c ca="left">
            <p>1.3</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isDescribedBy</p>
         </c>
         <c ca="left">
            <p>1.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>occursIn</p>
         </c>
         <c ca="left">
            <p>1.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>hasProperty</p>
         </c>
         <c ca="left">
            <p>1.0</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>isPropertyOf</p>
         </c>
         <c ca="left">
            <p>1.0</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The table shows the different qualifiers available from MIRIAM resources. The qualifiers are used in the S<smcaps>EMANTIC</smcaps> I<smcaps>NDEX</smcaps>. Each qualifier has a particular weight assigned to it which reflects the strength of connection between a URI and a constituent.</p>
   </tblfn></tbl>
<p>We extend the current BioModels Database search engine by including a greater number of features in the search process, by weighting different information, and by ranking the results according to the user query. Both types of queries, QBV and QBME are supported. The model index contains 454 models with 140977 terms separated into 25 features. The <smcaps>SEMANTIC INDEX</smcaps> contains 2261 URIs with 409124 terms. The used <smcaps>BIOLOGY ONTOLOGIES</smcaps> are NCBI Taxonomy, GO, ChEBI, KEGG Compound and KEGG Reaction <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp> (as of 2010-04-14). We anticipate to include more formal (biological) semantics in future versions, and to turn them into additional features for the similarity measure. Candidates for information relevant to preserve a bio-model's semantics have been suggested in <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>.</p>
<p>The <it>Lucene Framework </it>
<abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp> is integrated in the search system to create, maintain and search both the Model and Semantic Index. It provides retrieval functionality based on the Extended Boolean Model; its ranking possibilities are based on the Vector Space Model. To implement the retrieval and ranking process described above, Lucene has been extended by the different indices and sources, e. g. the Semantic Index. While the implementation makes use of an adapted Lucene built-in similarity function, it will be useful in the future to provide advanced users of the ranking system with a collection of different similarity functions to choose from.</p>
<sec>
<st>
<p>Search engine possibilities</p>
</st>
<p>
<b>Query by value </b>Query by value allows the user to either perform a free text search querying all features, or a more sophisticated search selecting features of the different dimensions to be searched (refer to Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>). For instance a user is able to search for models having a certain author or for models including a particular "species". Furthermore, it allows to weight the different parts of a user's query using the specific <it>feature matrix </it>shown in Table <tblr tid="T2">2</tblr>.</p>
<p>Depending on the dimension selected, the query might be enriched or limited. This is especially important for the constituent dimension. For example, different terms describing a model constituent are used to query the <smcaps>SEMANTIC INDEX</smcaps>. The result is a list of weighted URIs, which is then used to identify a model in <it>C<sub>M </sub>
</it>in case the model itself does not provide the search terms the user queried. When searching a model by URI, the importance of an URI within the model is reflected through a qualifier; i. e. models encoding a URI with the qualifier <monospace>is</monospace> are more important than models encoding the same URI with the qualifier <monospace>isVersionOf</monospace>. The weighting is done using the qualifier matrix shown in Table <tblr tid="T4">4</tblr>.</p>
<p>Additionally, the user is able to vary the importance of his search terms; i. e. one term describing a constituent can be more important than another. This <it>weight </it>is taken into account when computing the ranking. Besides the sophisticated ranking and retrieval system, the search engine supports common IR techniques like fuzzy search, range or proximity search, as well as wild-cards or phrase search <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>.</p>
<p>
<b>Query by model example </b>When querying by model example, the model used as a bait is analysed, and the values of extracted features are queried against the bio-model collection <it>C<sub>M</sub>
</it>. A ranked list of best matching models is retrieved. Enriched queries are switched off, as the example model itself provides sufficient contextual information.</p>
</sec>
</sec>
<sec>
<st>
<p>An example for model retrieval and ranking</p>
</st>
<p>The following example illustrates the functioning of the reference implementation. We want to search for <it>recent models by non-bogus authors describing the effect of caffeine in human's digestive tract when drinking a cup of coffee</it>. The characteristics fulfilled by the resulting models are:</p>
<p indent="1">1. the model <it>should have </it>the compartment <monospace>gut</monospace> encoded</p>
<p indent="1">2. at least one species <it>must be </it>exactly <monospace>caffeine</monospace> (qualified using <monospace>is</monospace>)</p>
<p indent="1">3. the model <it>should </it>have been submitted later than 2008</p>
<p indent="1">4. the author of the reference publication <it>must not </it>be <monospace>John Doe</monospace>
</p>
<p>That query can be submitted easily through the proposed advanced search interface of BioModels Database. The query is shown in Figure <figr fid="F3">3</figr>. The specification of different levels of requirements (should, must, must not) helps to be more specific in restricting the search.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Sample query on the new BioModels Database search interface</p></caption><text>
   <p><b>Sample query on the new BioModels Database search interface</b>. Screenshot of a part of the new search interface of BioModels Database. The interface allows to search for <it>Persons, SBML elements, Resources</it>, and allows to restrict the search terms to particular features using a single qualifier. Models may only be considered for a certain range of <it>dates</it>. The sample search correspond to <it>recent models by non-bogus authors describing the effect of caffeine in human's digestive tract when drinking coffee</it>.</p>
</text><graphic file="1471-2105-11-423-3" hint_layout="double"/></fig>
<p>To answer the query, the system first resolves the constituent <monospace>caffeine</monospace> into a set of URIs (<smcaps>SEMANTIC INDEX</smcaps>). Since the search for <monospace>caffeine</monospace> is restricted to the qualifier<monospace> is</monospace> (<it>must be exactly </it>
<monospace>caffeine</monospace>), only the retrieved URIs that are linked to a model using the is qualifier are kept. Of those, a weighted list of URIs is build and then used for the feature <monospace>speciesURI</monospace> to query the <smcaps>MODEL INDEX</smcaps>. For our example, the three best matching URIs are (a) <monospace>urn:miriam:obo.chebi:CHEBI%3A27732</monospace>, (b) <monospace>urn:miriam:kegg.compound:C07481</monospace> and (c) <monospace>urn:miriam:kegg.compound:C00385</monospace>. The URIs (a) and (b) both define caffeine, one in ChEBI <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp> and one in KEGG <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. The URI (c) describes xanthine, a chemical structurally related to caffeine.</p>
<p>Together with the queries for <monospace>gut</monospace> in the component feature and <it>not </it>
<monospace>John Doe</monospace> in the author feature, the <smcaps>MODEL INDEX</smcaps> query is internally assembled to:</p>
<p>
<monospace>+speciesURI:( urn:miriam:obo.chebi:chebi%3A27732 ^0.82</monospace>
</p>
<p>
<monospace>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;urn:miriam:kegg.compound:C07481 ^0.67</monospace>
</p>
<p>
<monospace>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;urn:miriam:kegg.compound:C00385 ^0.55)</monospace>
</p>
<p>
<monospace>compartment:(gut)</monospace>
</p>
<p>
<monospace>-author:(John Doe)</monospace>
</p>
<p>
<monospace>date:([01/01/2009 - *])</monospace>
</p>
<p>The prefix + and - denotes if a feature <it>must </it>or <it>must not </it>occur, no prefix implies the feature <it>should </it>occur. The &#710; denotes the weight assigned to the sub-query results retrieved from the semantic index. We use the Extended Boolean Model to query the index for each feature independently (speciesURI, compartment, date and author). The preliminary results are four sets of matching internal model identifiers. These sets are then conjuncted using Boolean algebra and taking into account whether a feature <it>should</it>, <it>must </it>or <it>must not </it>occur.</p>
<p>In a second step, the results are ranked using the Vector Space Model, according to the different types of weights. The predefined feature weights (Table <tblr tid="T2">2</tblr>) put a particular importance on the speciesURI feature. Thus, all models that matched the speciesURI feature are ranked high, incorporating the weight created by the sub-query to the semantic index. If a retrieved model, besides the mandatory features (<it>must</it>), matches additional optional features (<it>should</it>), the scores are summed up, resulting in a higher rank. In this case, the feature "date" is not very important - thus, it results only in a small increase of a model's score if the feature matched. The ranked results for the sample query performed on BioModels Database is shown on Figure <figr fid="F4">4</figr>.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Ranked results</p></caption><text>
   <p><b>Ranked results</b>. Search result obtained on BioModels Database with the given sample query (see Figure 3). The upper panel shows the enriched query. Due to the precise formulation of the query, and the requirement that <monospace>caffeine</monospace> must occur and additionally must be qualified with <monospace>is</monospace>, the result contains only three hits. (1) This model matches the top two constituents resolved by the semantic index, and additionally the term <it>gut </it>in the compartment feature. (2) The model matches the constituent ranked third by the semantic index. (3) The lowest ranked model only matches one constituent ranked eight by the semantic index - this is a very weak relation resulting in a very low rank.</p>
</text><graphic file="1471-2105-11-423-4" hint_layout="double"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>This paper presents, to our knowledge for the first time, the application of Information Retrieval techniques on Computational Biology models. The theoretical method relies on knowledge extracted from model annotations, but also incorporates context information. The BioModels Database implementation presents a practical example of this method. It enhances significantly the search possibilities of BioModels Database users. Thorough evaluation, for instance using F-measures, is needed, but currently difficult due to the lack of reference to compare with. The concepts' generality ensures it is easy to apply to other models bases.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>The application of ranking and retrieval methods on bio-models based on model annotations was suggested by DW. AP, RH and DW discussed different similarity functions and set up the architecture for the Sombi system. RH implemented the approach in BioModels Database during his research stay at the EBI, supervised by NLN. LE and RH discussed and determined the different weights for features and qualifiers used for the similarity function. LE provided detailed examples for the evaluation of the approach. All authors contributed to the manuscript and all authors have read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>RH and DW were supported by the German Research Association (DFG) research training group dIEM oSiRiS (DFG grant 1387). Implementation work at the EBI was funded by the Leonardo da Vinci - European Commission's Lifelong Learning Programme. AP would like to thank for support through the DFG research training group MUSAMA (DFG grant 1424).</p>
<p>The development of BioModels Database is funded by the European Molecular Biology Laboratory, the Biotechnology and Biological Science (grant BB/F010516/1), and the National Institute of General Medical Sciences (grant R01 GM070923).</p>
<p>Authors are grateful to Camille Laibe and the BioModels.net Team for their help with the implementation in BioModels Database.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><aug><au><snm>Klipp</snm><fnm>E</fnm></au><au><snm>Liebermeister</snm><fnm>W</fnm></au><au><snm>Wierling</snm><fnm>C</fnm></au><au><snm>Kowald</snm><fnm>A</fnm></au><au><snm>Lehrach</snm><fnm>H</fnm></au><au><snm>Herwig</snm><fnm>R</fnm></au></aug><source>Systems biology: a textbook</source><publisher>Wiley-VCH</publisher><pubdate>2009</pubdate></bibl><bibl id="B2"><title><p>BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models</p></title><aug><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Donizelli</snm><fnm>M</fnm></au><au><snm>Rodriguez</snm><fnm>N</fnm></au><au><snm>Dharuri</snm><fnm>H</fnm></au><au><snm>Endler</snm><fnm>L</fnm></au><au><snm>Chelliah</snm><fnm>V</fnm></au><au><snm>Li</snm><fnm>L</fnm></au><au><snm>He</snm><fnm>E</fnm></au><au><snm>Henry</snm><fnm>A</fnm></au><au><snm>Stefan</snm><fnm>MI</fnm></au><au><snm>Snoep</snm><fnm>JL</fnm></au><au><snm>Hucka</snm><fnm>M</fnm></au><au><snm>Nov&#233;re</snm><fnm>NL</fnm></au><au><snm>Laibe</snm><fnm>C</fnm></au></aug><source>BMC Syst Biol</source><pubdate>2010</pubdate><volume>4</volume><fpage>92</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-4-92</pubid><pubid idtype="pmcid">2909940</pubid><pubid idtype="pmpid" link="fulltext">20587024</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Validity and combination of biochemical models</p></title><aug><au><snm>Liebermeister</snm><fnm>W</fnm></au></aug><source>Proceedings of 3rd International ESCEC Workshop on Experimental Standard Conditions on Enzyme Characterizations</source><pubdate>2008</pubdate></bibl><bibl id="B4"><title><p>Designing and encoding models for synthetic biology</p></title><aug><au><snm>Endler</snm><fnm>L</fnm></au><au><snm>Rodriguez</snm><fnm>N</fnm></au><au><snm>Juty</snm><fnm>N</fnm></au><au><snm>Chelliah</snm><fnm>V</fnm></au><au><snm>Laibe</snm><fnm>C</fnm></au><au><snm>Li</snm><fnm>C</fnm></au><au><snm>Le Nov&#232;re</snm><fnm>N</fnm></au></aug><source>Journal of The Royal Society Interface</source><pubdate>2009</pubdate><volume>6</volume><issue>Suppl 4</issue><fpage>S405</fpage><lpage>S417</lpage><xrefbib><pubid idtype="doi">10.1098/rsif.2009.0035.focus</pubid></xrefbib></bibl><bibl id="B5"><title><p>Systems Biology Markup Language (SBML) Level 2: Structures and Facilities for Model Definitions</p></title><aug><au><snm>Finney</snm><fnm>A</fnm></au><au><snm>Hucka</snm><fnm>M</fnm></au><au><snm>Le Nov&#232;re</snm><fnm>N</fnm></au></aug><source>Systems Biology Workbench Group</source><pubdate>2003</pubdate></bibl><bibl id="B6"><title><p>An Overview of CellML 1.1, a Biological Model Description Language</p></title><aug><au><snm>Cuellar</snm><fnm>AA</fnm></au><au><snm>Lloyd</snm><fnm>CM</fnm></au><au><snm>Nielsen</snm><fnm>PF</fnm></au><au><snm>Bullivant</snm><fnm>DP</fnm></au><au><snm>Nickerson</snm><fnm>DP</fnm></au><au><snm>Hunter</snm><fnm>PJ</fnm></au></aug><source>SIMULATION</source><pubdate>2003</pubdate><volume>79</volume><issue>12</issue><fpage>740</fpage><lpage>747</lpage><xrefbib><pubid idtype="doi">10.1177/0037549703040939</pubid></xrefbib></bibl><bibl id="B7"><aug><au><snm>Bader</snm><fnm>GD</fnm></au><au><snm>Cary</snm><fnm>MP</fnm></au></aug><source>BioPAX - Biological Pathways Exchange Language Level 2, Version 1.0 Documentation</source><publisher>BioPAX workgroup</publisher><pubdate>2005</pubdate></bibl><bibl id="B8"><title><p>Web-based kinetic modelling using JWS Online</p></title><aug><au><snm>Olivier</snm><fnm>BG</fnm></au><au><snm>Snoep</snm><fnm>JL</fnm></au></aug><source>Bioinformatics</source><pubdate>2004</pubdate><volume>20</volume><issue>13</issue><fpage>2143</fpage><lpage>2144</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bth200</pubid><pubid idtype="pmpid" link="fulltext">15072998</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>The CellML Model Repository</p></title><aug><au><snm>Lloyd</snm><fnm>CMM</fnm></au><au><snm>Lawson</snm><fnm>JRR</fnm></au><au><snm>Hunter</snm><fnm>PJJ</fnm></au><au><snm>Nielsen</snm><fnm>PFF</fnm></au></aug><source>Bioinformatics</source><pubdate>2008</pubdate><volume>24</volume><issue>18</issue><fpage>2122</fpage><lpage>2123</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn390</pubid><pubid idtype="pmpid" link="fulltext">18658182</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Software reuse myths</p></title><aug><au><snm>Tracz</snm><fnm>W</fnm></au></aug><source>ACM SIGSOFT Software Engineering Notes</source><pubdate>1988</pubdate><volume>13</volume><fpage>17</fpage><lpage>21</lpage><xrefbib><pubid idtype="doi">10.1145/43857.43859</pubid></xrefbib></bibl><bibl id="B11"><title><p>Minimum Information Requested In the Annotation of biochemical Models (MIRIAM)</p></title><aug><au><snm>Le Nov&#232;re</snm><fnm>N</fnm></au><au><snm>Finney</snm><fnm>A</fnm></au><au><snm>Hucka</snm><fnm>M</fnm></au><au><snm>Bhalla</snm><fnm>US</fnm></au><au><snm>Campagne</snm><fnm>F</fnm></au><etal/></aug><source>Nature Biotechnology</source><pubdate>2005</pubdate><volume>23</volume><issue>12</issue><fpage>1509</fpage><lpage>1515</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1156</pubid><pubid idtype="pmpid" link="fulltext">16333295</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project</p></title><aug><au><snm>Taylor</snm><fnm>C</fnm></au><au><snm>Field</snm><fnm>D</fnm></au><au><snm>Sansone</snm><fnm>S</fnm></au><au><snm>Aerts</snm><fnm>J</fnm></au><au><snm>Apweiler</snm><fnm>R</fnm></au><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Ball</snm><fnm>C</fnm></au><au><snm>Binz</snm><fnm>P</fnm></au><au><snm>Bogue</snm><fnm>M</fnm></au><au><snm>Booth</snm><fnm>T</fnm></au><etal/></aug><source>Nature biotechnology</source><pubdate>2008</pubdate><volume>26</volume><issue>8</issue><fpage>889</fpage><lpage>896</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1411</pubid><pubid idtype="pmcid">2771753</pubid><pubid idtype="pmpid" link="fulltext">18688244</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>MIRIAM Resources: tools to generate and resolve robust cross-references in Systems Biology</p></title><aug><au><snm>Laibe</snm><fnm>C</fnm></au><au><snm>Le Nov&#232;re</snm><fnm>N</fnm></au></aug><source>BMC Systems Biology</source><pubdate>2007</pubdate><volume>1</volume><fpage>58</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1752-0509-1-58</pubid><pubid idtype="pmcid">2259379</pubid><pubid idtype="pmpid" link="fulltext">18078503</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Adding semantics in kinetics models of biochemical pathways</p></title><aug><au><snm>Le Nov&#232;re</snm><fnm>N</fnm></au><au><snm>Courtot</snm><fnm>M</fnm></au><au><snm>Laibe</snm><fnm>C</fnm></au></aug><source>Proceedings of the 2nd International Symposium on experimental standard conditions of enzyme characterizations</source><pubdate>2006</pubdate></bibl><bibl id="B15"><title><p>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium</p></title><aug><au><snm>Ashburner</snm><fnm>M</fnm></au><au><snm>Ball</snm><fnm>CA</fnm></au><au><snm>Blake</snm><fnm>JA</fnm></au><au><snm>Botstein</snm><fnm>D</fnm></au><au><snm>Butler</snm><fnm>H</fnm></au><etal/></aug><source>Nat Genet</source><pubdate>2000</pubdate><volume>25</volume><fpage>25</fpage><lpage>29</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/75556</pubid><pubid idtype="pmpid" link="fulltext">10802651</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>A Translation Approach to Portable Ontology Specifications</p></title><aug><au><snm>Gruber</snm><fnm>TR</fnm></au></aug><source>Knowledge Acquisition</source><pubdate>1993</pubdate><volume>5</volume><issue>2</issue><fpage>199</fpage><lpage>220</lpage><xrefbib><pubid idtype="doi">10.1006/knac.1993.1008</pubid></xrefbib></bibl><bibl id="B17"><title><p>ChEBI: a database and ontology for chemical entities of biological interest</p></title><aug><au><snm>Degtyarenko</snm><fnm>K</fnm></au><au><snm>de Matos</snm><fnm>P</fnm></au><au><snm>Ennis</snm><fnm>M</fnm></au><au><snm>Hastings</snm><fnm>J</fnm></au><au><snm>Zbinden</snm><fnm>M</fnm></au><au><snm>McNaught</snm><fnm>A</fnm></au><au><snm>Alcantara</snm><fnm>R</fnm></au><au><snm>Darsow</snm><fnm>M</fnm></au><au><snm>Guedj</snm><fnm>M</fnm></au><au><snm>Ashburner</snm><fnm>M</fnm></au></aug><source>Nucl. Acids Res</source><pubdate>2008</pubdate><volume>36</volume><issue>suppl_1</issue><fpage>D344</fpage><lpage>350</lpage></bibl><bibl id="B18"><title><p>Towards Enhanced Retrieval of Biological Models through Annotation-Based Ranking</p></title><aug><au><snm>K&#246;hn</snm><fnm>D</fnm></au><au><snm>Maus</snm><fnm>C</fnm></au><au><snm>Henkel</snm><fnm>R</fnm></au><au><snm>Kolbe</snm><fnm>M</fnm></au></aug><source>Data Integration in the Life Sciences</source><pubdate>2009</pubdate><fpage>204</fpage><lpage>219</lpage><xrefbib><pubid idtype="doi">10.1007/978-3-642-02879-3_17</pubid></xrefbib></bibl><bibl id="B19"><title><p>Towards a Semantic Description of Bio-Models: Meaning Facets - A Case Study</p></title><aug><au><snm>Kn&#252;pfer</snm><fnm>C</fnm></au><au><snm>Beckstein</snm><fnm>C</fnm></au><au><snm>Dittrich</snm><fnm>P</fnm></au></aug><source>Proceedings of the Second International Symposium on Semantic Mining in Biomedicine</source><pubdate>2006</pubdate><fpage>97</fpage><lpage>100</lpage></bibl><bibl id="B20"><aug><au><snm>Ferber</snm><fnm>R</fnm></au></aug><source>Information Retrieval: Suchmodelle und Data-Mining-Verfahren f&#252;r Textsammlungen und das Web. dpunkt Verlag</source><pubdate>2003</pubdate></bibl><bibl id="B21"><title><p>CompositeMap: a novel framework for music similarity measure</p></title><aug><au><snm>Zhang</snm><fnm>B</fnm></au><au><snm>Shen</snm><fnm>J</fnm></au><au><snm>Xiang</snm><fnm>Q</fnm></au><au><snm>Wang</snm><fnm>Y</fnm></au></aug><source>SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source><publisher>New York, NY, USA: ACM</publisher><pubdate>2009</pubdate><fpage>403</fpage><lpage>410</lpage><xrefbib><pubid idtype="doi">10.1145/1571941.1572011</pubid></xrefbib></bibl><bibl id="B22"><title><p>The LAILAPS Search Engine: Relevance Ranking in Life Science Databases</p></title><aug><au><snm>Lange</snm><fnm>M</fnm></au><au><snm>Spies</snm><fnm>K</fnm></au><au><snm>Colmsee</snm><fnm>C</fnm></au><au><snm>Flemming</snm><fnm>S</fnm></au><au><snm>Klapperst&#252;ck</snm><fnm>M</fnm></au><au><snm>Scholz</snm><fnm>U</fnm></au></aug><source>Journal of Integrative Bioinformatics</source><pubdate>2010</pubdate><volume>7</volume><issue>3</issue><xrefbib><pubid idtype="doi">10.2390/biecoll-jib-2010-110</pubid></xrefbib></bibl><bibl id="B23"><aug><au><snm>Baeza-Yates</snm><fnm>R</fnm></au><au><snm>Ribeiro-Neto</snm><fnm>B</fnm></au></aug><source>Modern Information Retrieval</source><publisher>Addison Wesley</publisher><edition>1</edition><pubdate>1999</pubdate></bibl><bibl id="B24"><title><p>A vector space model for automatic indexing</p></title><aug><au><snm>Salton</snm><fnm>G</fnm></au><au><snm>Wong</snm><fnm>A</fnm></au><au><snm>Yang</snm><fnm>C</fnm></au></aug><source>Communications of the ACM</source><pubdate>1975</pubdate><volume>18</volume><issue>11</issue><fpage>620.</fpage><xrefbib><pubid idtype="doi">10.1145/361219.361220</pubid></xrefbib></bibl><bibl id="B25"><title><p>An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources</p></title><aug><au><snm>Li</snm><fnm>Y</fnm></au><au><snm>Bandar</snm><fnm>ZA</fnm></au><au><snm>McLean</snm><fnm>D</fnm></au></aug><source>IEEE Transactions on Knowledge and Data Engineering</source><pubdate>2003</pubdate><volume>15</volume><issue>4</issue><fpage>871</fpage><lpage>882</lpage><xrefbib><pubid idtype="doi">10.1109/TKDE.2003.1209005</pubid></xrefbib></bibl><bibl id="B26"><title><p>KEGG: Kyoto Encyclopedia of Genes and Genomes</p></title><aug><au><snm>Ogata</snm><fnm>H</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Sato</snm><fnm>K</fnm></au><au><snm>Fujibuchi</snm><fnm>W</fnm></au><au><snm>Bono</snm><fnm>H</fnm></au><au><snm>Kanehisa</snm><fnm>M</fnm></au></aug><source>Nucl Acids Res</source><pubdate>1999</pubdate><volume>27</volume><fpage>29</fpage><lpage>34</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.1.29</pubid><pubid idtype="pmcid">148090</pubid><pubid idtype="pmpid" link="fulltext">9847135</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Lucene in action: a guide to the Java search engine</p></title><aug><au><snm>Gospodnetic</snm><fnm>O</fnm></au><au><snm>Hatcher</snm><fnm>E</fnm></au></aug><source>Greenwich (USA): Manning</source><pubdate>2005</pubdate></bibl><bibl id="B28"><title><p>KEGG: Kyoto Encyclopedia of Genes and Genomes</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2000</pubdate><volume>28</volume><fpage>27</fpage><lpage>30</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.27</pubid><pubid idtype="pmcid">102409</pubid><pubid idtype="pmpid" link="fulltext">10592173</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>