<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-10-S1-S28</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Identification of histone modifications in biomedical text for supporting epigenomic research</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Kol&#225;&#345;ik</snm>
               <fnm>Corinna</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>corinna.kolarik@scai.fhg.de</email>
            </au>
            <au id="A2">
               <snm>Klinger</snm>
               <fnm>Roman</fnm>
               <insr iid="I1"/>
               <email>roman.klinger@scai.fhg.de</email>
            </au>
            <au id="A3">
               <snm>Hofmann-Apitius</snm>
               <fnm>Martin</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>martin.hofmann-apitius@scai.fhg.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Bioinformatics, Fraunhofer Institute Algorithms and Scientific Computing (SCAI) Schlo&#223; Birlinghoven, D-53754 Sankt Augustin, Germany</p>
            </ins>
            <ins id="I2">
               <p>Department of Applied Life Science Informatics, Bonn-Aachen International Center for Information Technology (B-IT) Dahlmannstrasse 2, D-53113 Bonn, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)</p>
            </title>
            <editor>Michael Q Zhang, Michael S Waterman and Xuegong Zhang</editor>
            <note>Research</note>
         </supplement>
         <conference>
            <title>
               <p>The Seventh Asia Pacific Bioinformatics Conference (APBC 2009)</p>
            </title>
            <location>Beijing, China</location>
            <date-range>13&#8211;16 January 2009</date-range>
            <url>http://bioinfo.au.tsinghua.edu.cn/apbc2009/</url>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>Suppl 1</issue>
         <fpage>S28</fpage>
         <url>http://www.biomedcentral.com/1471-2105/10/S1/S28</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">19208128</pubid><pubid idtype="doi">10.1186/1471-2105-10-S1-S28</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>30</day>
               <month>1</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Kol&#225;&#345;ik et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Posttranslational modifications of histones influence the structure of chromatine and in such a way take part in the regulation of gene expression. Certain histone modification patterns, distributed over the genome, are connected to cell as well as tissue differentiation and to the adaption of organisms to their environment. Abnormal changes instead influence the development of disease states like cancer. The regulation mechanisms for modifying histones and its functionalities are the subject of epigenomics investigation and are still not completely understood. Text provides a rich resource of knowledge on epigenomics and modifications of histones in particular. It contains information about experimental studies, the conditions used, and results. To our knowledge, no approach has been published so far for identifying histone modifications in text.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have developed an approach for identifying histone modifications in biomedical literature with Conditional Random Fields (CRF) and for resolving the recognized histone modification term variants by term standardization. For the term identification <it>F</it><sub>1 </sub>measures of 0.84 by 10-fold cross-validation on the training corpus and 0.81 on an independent test corpus have been obtained. The standardization enabled the correct transformation of 96% of the terms from training and 98% from test the corpus. Due to the lack of terminologies exhaustively covering specific histone modification types, we developed a histone modification term hierarchy for use in a semantic text retrieval system.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The developed approach highly improves the retrieval of articles describing histone modifications. Since text contains context information about performed studies and experiments, the identification of histone modifications is the basis for supporting literature-based knowledge discovery and hypothesis generation to accelerate epigenomic research.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The expression of genes is regulated by their accessibility for the transcription machinery, which is controlled by the chromatine structure. Histones, structure forming proteins of chromatin, play an important role in the building of a closed or open chromatin state. Key players are several chemical groups, small molecules or processes modifying amino acids at histone tails. They change the physico-chemical properties of the amino acids and mark histone proteins for the recruting of other proteins participating in the formation of the different chromatin structure states <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Known histone modifying groups, molecules or amino acids modified by transformation processes are given in Table <tblr tid="T1">1</tblr>. More than 70 sites for histone post-translational modifications (PTMs) have been reported <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Histone modifications. Histone modifying groups, molecules, and processes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>.</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>Modification Types</p>
                  </c>
                  <c ca="left">
                     <p>Modification examples</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Groups</p>
                  </c>
                  <c ca="left">
                     <p>acetyl, methyl, phosphate, ADP ribosyl, carbonyl, sumoyl</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Molecules</p>
                  </c>
                  <c ca="left">
                     <p>biotin, ubiquitin</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Process</p>
                  </c>
                  <c ca="left">
                     <p>proline isomerization, arginine deamination (citrulline generation)</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>Combinations of histone modifications on one or different histones form a kind of histone code and carry different functionalities <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Until now the histone code and its functional relations are not fully understood and not all factors taking part in the regulation of the modifications itself are known.</p>
         <p>One factor influencing histone modifications is the environment of organisms, like the nutritional deprivation, chemical toxins, xenobiotics, and drugs as well as psychosocial exposure during early developmental stages. External and intrinsic substances affect histone modifying enzymes which can lead to a heritable change in histone modification patterns. This alters the expression of a number of genes, which is in general a property of organisms to adapt to varying environmental conditions by forming environment-dependent phenotypes without changing the genetic code <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. However, studies show that abnormal changes in modification patterns could be the basis or an additional factor for the development of cancer, mental disorders or late onset diseases, e.g. diabetes type II and cardiovascular medical conditions <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. In addition to DNA methylation, PTMs of histone proteins are one of the major components studied in epigenomics. Information about histone binding proteins, genes concerned, studied cell and tissue types of different organisms, and chemical substances being related with certain histone modification patterns are needed for discovering the modification functionality and influence on diseases. </p>
         <p>Resources containing information about histone modifications are scientific articles and databases. The UCSC Genome Browser <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> is a common resource for genomic data usable for annotation. ChromatinDB <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> is a database of genome-wide histone modification patterns of <it>Saccharomyces cerevisiae</it>. The Histone Database <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> contains histone sequences of many organisms and their alignments as well as amino acid specific histone modifications, whereas they are not linked to phenotypes, cell or tissue types or experimental conditions. In its current version, ChromatinDB and Genome Browser do not support the analysis of histone modifications across species, related to various cell or tissue types or certain diseases. A resource providing context information related to histone modifications is the collection of abstracts of scientific publications in public text repositories like PubMed <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. It covers the conditions of performed experiments and studies, like cell types or diseases which are not in the focus of the existing repositories and hence are not represented there. An automated support for exploring epigenomic texts would allow for finding new hypotheses on effects of histone modifications, its genomic positions, altering chemical substances, the developed phenotypes or diseases, cell and tissue types, the influenced expression state of a gene and the studied organism, especially if researchers are working in distinct fields. Smalheiser <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and Hristovski <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> have shown that the automated analysis of terms or concepts linking two disparate text corpora can support the discovery of implicit relations between two subjects.</p>
         <p>In July 2008, PubMed contained over 24,600 abstracts dealing with epigenomics (PubMed search using the term '<it>epigenetics</it>'). About half of them talk about histone modifications. Since methods have been ready for experiments at high throughput rate (e.g. ChIP-chip) and the influence of epigenomical factors onto disease states, cell differentiation, etc. received increased awareness, the article number has been growing and is expected to grow in the future (cf. Figure <figr fid="F1">1</figr>). On average, over 1000 articles have been published every month in the last two years. Therefore, automated methods need to be applied for supporting semantic text retrieval as well as for extracting histone modification related information. A fundamental step complying with this demand is the identification of histone modification mentions in text.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Statistics on articles published about histone modifications in PubMed</p>
            </caption>
            <text>
               <p><b>Statistics on articles published about histone modifications in PubMed</b>. Number of published articles about histone modifications in M<smcaps>EDLINE</smcaps> obtained by a coocurrence search of histone terms and modification terms with ProMiner.</p>
            </text>
            <graphic file="1471-2105-10-S1-S28-1"/>
         </fig>
         <p>Several approaches have been developed for recognizing biomedical named entities, like proteins, genes, Single Nucleotide Polymorphisms, chemical substances, and diseases <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. They are based on machine learning methods, rules, dictionaries or are combinations of them. To support epigenomic research the literature database PubMeth <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> has been recently set up which is based on text mining focussing on DNA methylation and cancer.</p>
         <p>To the best of our knowledge no approaches have been published so far dealing with the identification of histone modification mentions in text and its application.</p>
         <sec>
            <st>
               <p>Nomenclature and terminology used for histone modifications</p>
            </st>
            <p>A nomenclature for histone modifications was devised at the first meeting of the Epigenome Network of Excellence <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> in 2004 <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and was first published as the Brno nomenclature by Turner <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> in 2005. Nevertheless, the way how histone modifications are described in text is not uniform and the usage of the official nomenclature is not common, which makes its identification difficult. This is a widespread habit also observable for the use of nomenclatures of other biomedical entities, like Single Nucleotide Polymorphisms (SNPs) or the use of the HUGO nomenclature for genes <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The following examples are typical mentions of histone modifications as they can be found in scientific text:</p>
            <p>&#8226; H3K9me3; (41),</p>
            <p>&#8226; Me3-K9 H3; (1),</p>
            <p>&#8226; Me(3)-K9 H3; (78),</p>
            <p>&#8226; H3K9 tri-methylation; (7),</p>
            <p>&#8226; H3-K9 trimethylation; (28),</p>
            <p>&#8226; H3 Lys9 trimethylation; (11),</p>
            <p>&#8226; H3 tri-methylated at lysine 9; (14),</p>
            <p>&#8226; histone H3 trimethylated at lysine (K) 9; (3),</p>
            <p>&#8226; K9 trimethylation at histone H3; (36),</p>
            <p>&#8226; K9-trimethylated histone H3; (15),</p>
            <p>&#8226; tri-methylation of H3 at lysine residues K9; (0),</p>
            <p>&#8226; trimethylated H3K9; (18).</p>
            <p>The numbers in brackets provide the quantity of abstracts obtained with a PubMed search. Only the term '<it>H3K9me3</it>' corresponds to the Brno nomenclature. H3 stands for the protein '<it>histone 3</it>', the letter '<it>K</it>' specifies the amino acid lysine and '<it>9</it>' its position within the protein sequence. Furthermore, words starting with '<it>trimethyl</it>' and '<it>me3</it>' explain that the lysine carries three methyl groups as chemical modification.</p>
            <p>The examples given above show, that a simple search strategy or a dictionary based approach is not able to find all description variants related to a certain histone and modification type. We apply Conditional Random Fields (CRFs) for recognizing histone modification descriptions in text.</p>
            <p>To resolve the term variant problem and improving text retrieval we developed rules for mapping the identified terms to a standard form corresponding to the Brno nomenclature. Furthermore, a histone modification term hierarchy was built for organizing the standardized terms and enabling semantic text search.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>We developed an approach which aims at identifying histone modifications in text with CRF and resolving term variants by transforming them into terms corresponding to the Brno nomenclature. In a following step they are normalized and mapped to a generated histone modification term hierarchy.</p>
         <sec>
            <st>
               <p>Conditional Random Fields</p>
            </st>
            <p>Conditional Random Fields <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp> are a probabilistic model for computing the probability P (<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i1"><m:semantics><m:mover accent="true"><m:mi>y</m:mi><m:mo>&#8594;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmyEaKNbaSaaaaa@2D62@</m:annotation></m:semantics></m:math></inline-formula>|<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i2"><m:semantics><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmiEaGNbaSaaaaa@2D60@</m:annotation></m:semantics></m:math></inline-formula>) of a possible label sequence <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i1"><m:semantics><m:mover accent="true"><m:mi>y</m:mi><m:mo>&#8594;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmyEaKNbaSaaaaa@2D62@</m:annotation></m:semantics></m:math></inline-formula> = (<it>y</it><sub>1</sub>,..., <it>y</it><sub><it>n</it></sub>) given the input sequence <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i2"><m:semantics><m:mover accent="true"><m:mi>x</m:mi><m:mo>&#8594;</m:mo></m:mover><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmiEaGNbaSaaaaa@2D60@</m:annotation></m:semantics></m:math></inline-formula> = (<it>x</it><sub>1</sub>,..., <it>x</it><sub><it>n</it></sub>), which is also called the observation sequence. In the context of Named Entity Recognition it corresponds to the tokenized text. This is the sequence of tokens which is obtained by a process which splits text at white space, punctuation marks and parentheses in general.</p>
            <p>The label sequence is coded using the label alphabet &#8466; = {<it>I-Hmod</it>, <it>O</it>, <it>B-Hmod</it>} where <it>y</it><sub><it>i </it></sub>= <it>O </it>means that <it>x</it><sub><it>i </it></sub>is not an entity of interest, <it>y</it><sub><it>i </it></sub>= <it>B-Hmod </it>means that <it>x</it><sub><it>i </it></sub>is the beginning of a histone modification mention and <it>y</it><sub><it>i </it></sub>= <it>I-Hmod </it>means that <it>x</it><sub><it>i </it></sub>is the continuation of it. A linear-chain CRF is an undirected probabilistic graphical model</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i4">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>P</m:mi>
                              <m:mover accent="true">
                                 <m:mi>&#955;</m:mi>
                                 <m:mo>&#8594;</m:mo>
                              </m:mover>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mover accent="true">
                              <m:mi>y</m:mi>
                              <m:mo>&#8594;</m:mo>
                           </m:mover>
                           <m:mo>|</m:mo>
                           <m:mover accent="true">
                              <m:mi>x</m:mi>
                              <m:mo>&#8594;</m:mo>
                           </m:mover>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:mi>Z</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mover accent="true">
                                    <m:mi>x</m:mi>
                                    <m:mo>&#8594;</m:mo>
                                 </m:mover>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>&#8901;</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munderover>
                                 <m:mo>&#8719;</m:mo>
                                 <m:mrow>
                                    <m:mi>j</m:mi>
                                    <m:mo>=</m:mo>
                                    <m:mn>1</m:mn>
                                 </m:mrow>
                                 <m:mi>n</m:mi>
                              </m:munderover>
                              <m:mrow>
                                 <m:msup>
                                    <m:mtext>e</m:mtext>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mstyle displaystyle="true">
                                                <m:msubsup>
                                                   <m:mo>&#8721;</m:mo>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mn>1</m:mn>
                                                   </m:mrow>
                                                   <m:mi>n</m:mi>
                                                </m:msubsup>
                                                <m:mrow>
                                                   <m:mstyle displaystyle="true">
                                                      <m:msubsup>
                                                         <m:mo>&#8721;</m:mo>
                                                         <m:mrow>
                                                            <m:mi>i</m:mi>
                                                            <m:mo>=</m:mo>
                                                            <m:mn>1</m:mn>
                                                         </m:mrow>
                                                         <m:mi>m</m:mi>
                                                      </m:msubsup>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>&#955;</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                         <m:msub>
                                                            <m:mi>f</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mstyle>
                                                   <m:mrow>
                                                      <m:mo>(</m:mo>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>y</m:mi>
                                                            <m:mrow>
                                                               <m:mi>j</m:mi>
                                                               <m:mo>&#8722;</m:mo>
                                                               <m:mn>1</m:mn>
                                                            </m:mrow>
                                                         </m:msub>
                                                         <m:mo>,</m:mo>
                                                         <m:msub>
                                                            <m:mi>y</m:mi>
                                                            <m:mi>j</m:mi>
                                                         </m:msub>
                                                         <m:mo>,</m:mo>
                                                         <m:mover accent="true">
                                                            <m:mi>x</m:mi>
                                                            <m:mo>&#8594;</m:mo>
                                                         </m:mover>
                                                         <m:mo>,</m:mo>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                      <m:mo>)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                             </m:mstyle>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiqbeU7aSzaalaaabeaakiabcIcaOiqbdMha5zaalaGaeiiFaWNafmiEaGNbaSaacqGGPaqkcqGH9aqpjuaGdaWcaaqaaiabigdaXaqaaiabdQfaAjabcIcaOiqbdIha4zaalaGaeiykaKcaaOGaeyyXIC9aaebCaeaacqqGLbqzdaahaaWcbeqaamaabmaabaWaaabmaeaadaaeWaqaaiabeU7aSnaaBaaameaacqWGPbqAaeqaaSGaemOzay2aaSbaaWqaaiabdMgaPbqabaaabaGaemyAaKMaeyypa0JaeGymaedabaGaemyBa0gaoiabggHiLdWcdaqadaqaaiabdMha5naaBaaameaacqWGQbGAcqGHsislcqaIXaqmaeqaaSGaeiilaWIaemyEaK3aaSbaaWqaaiabdQgaQbqabaWccqGGSaalcuWG4baEgaWcaiabcYcaSiabdQgaQbGaayjkaiaawMcaaaadbaGaemOAaOMaeyypa0JaeGymaedabaGaemOBa4gaoiabggHiLdaaliaawIcacaGLPaaaaaaabaGaemOAaOMaeyypa0JaeGymaedabaGaemOBa4ganiabg+GivdGccqGGUaGlaaa@6C77@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>with normalization to [0, 1] given by</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i5">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>Z</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mover accent="true">
                              <m:mi>x</m:mi>
                              <m:mo>&#8594;</m:mo>
                           </m:mover>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mover accent="true">
                                       <m:mi>y</m:mi>
                                       <m:mo>&#8594;</m:mo>
                                    </m:mover>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi mathvariant="script">Y</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msup>
                                    <m:mtext>e</m:mtext>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>(</m:mo>
                                          <m:mrow>
                                             <m:mstyle displaystyle="true">
                                                <m:msubsup>
                                                   <m:mo>&#8721;</m:mo>
                                                   <m:mrow>
                                                      <m:mi>j</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mn>1</m:mn>
                                                   </m:mrow>
                                                   <m:mi>n</m:mi>
                                                </m:msubsup>
                                                <m:mrow>
                                                   <m:mstyle displaystyle="true">
                                                      <m:msubsup>
                                                         <m:mo>&#8721;</m:mo>
                                                         <m:mrow>
                                                            <m:mi>i</m:mi>
                                                            <m:mo>=</m:mo>
                                                            <m:mn>1</m:mn>
                                                         </m:mrow>
                                                         <m:mi>m</m:mi>
                                                      </m:msubsup>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>&#955;</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                         <m:msub>
                                                            <m:mi>f</m:mi>
                                                            <m:mi>i</m:mi>
                                                         </m:msub>
                                                      </m:mrow>
                                                   </m:mstyle>
                                                   <m:mrow>
                                                      <m:mo>(</m:mo>
                                                      <m:mrow>
                                                         <m:msub>
                                                            <m:mi>y</m:mi>
                                                            <m:mrow>
                                                               <m:mi>j</m:mi>
                                                               <m:mo>&#8722;</m:mo>
                                                               <m:mn>1</m:mn>
                                                            </m:mrow>
                                                         </m:msub>
                                                         <m:mo>,</m:mo>
                                                         <m:msub>
                                                            <m:mi>y</m:mi>
                                                            <m:mi>j</m:mi>
                                                         </m:msub>
                                                         <m:mo>,</m:mo>
                                                         <m:mover accent="true">
                                                            <m:mi>x</m:mi>
                                                            <m:mo>&#8594;</m:mo>
                                                         </m:mover>
                                                         <m:mo>,</m:mo>
                                                         <m:mi>j</m:mi>
                                                      </m:mrow>
                                                      <m:mo>)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                             </m:mstyle>
                                          </m:mrow>
                                          <m:mo>)</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOwaOLaeiikaGIafmiEaGNbaSaacqGGPaqkcqGH9aqpdaaeqbqaaiabbwgaLnaaCaaaleqabaWaaeWaaeaadaaeWaqaamaaqadabaGaeq4UdW2aaSbaaWqaaiabdMgaPbqabaWccqWGMbGzdaWgaaadbaGaemyAaKgabeaaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGTbqBa4GaeyyeIuoalmaabmaabaGaemyEaK3aaSbaaWqaaiabdQgaQjabgkHiTiabigdaXaqabaWccqGGSaalcqWG5bqEdaWgaaadbaGaemOAaOgabeaaliabcYcaSiqbdIha4zaalaGaeiilaWIaemOAaOgacaGLOaGaayzkaaaameaacqWGQbGAcqGH9aqpcqaIXaqmaeaacqWGUbGBa4GaeyyeIuoaaSGaayjkaiaawMcaaaaaaeaacuWG5bqEgaWcaiabgIGioprtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaGabaiab=Hr8zbqab0GaeyyeIuoakiabc6caUaaa@6915@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>Here, <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i6"><m:semantics><m:mi mathvariant="script">Y</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWenfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae8hgXNfaaa@3779@</m:annotation></m:semantics></m:math></inline-formula> is the set of all possible label sequences over which is summed up, so that a feasible probability is obtained. The factor functions combine different features <it>f</it><sub><it>i </it></sub>of the considered part of the text and label sequence. We use mainly morphological features of the text tokens for every possible label transition. They have usually a form similar to</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i7">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>f</m:mi>
                              <m:mi>i</m:mi>
                           </m:msub>
                           <m:mrow>
                              <m:mo>(</m:mo>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>y</m:mi>
                                    <m:mrow>
                                       <m:mi>j</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>y</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo>,</m:mo>
                                 <m:mover accent="true">
                                    <m:mi>x</m:mi>
                                    <m:mo>&#8594;</m:mo>
                                 </m:mover>
                                 <m:mo>,</m:mo>
                                 <m:mi>j</m:mi>
                              </m:mrow>
                              <m:mo>)</m:mo>
                           </m:mrow>
                           <m:mo>=</m:mo>
                           <m:mrow>
                              <m:mo>{</m:mo>
                              <m:mrow>
                                 <m:mtable columnalign="left">
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mn>1</m:mn>
                                             <m:mtext>&#160;if</m:mtext>
                                          </m:mrow>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>y</m:mi>
                                                <m:mrow>
                                                   <m:mi>j</m:mi>
                                                   <m:mo>&#8722;</m:mo>
                                                   <m:mn>1</m:mn>
                                                </m:mrow>
                                             </m:msub>
                                             <m:mo>=</m:mo>
                                             <m:mi>B</m:mi>
                                             <m:mtext>-</m:mtext>
                                             <m:mi>H</m:mi>
                                             <m:mi>m</m:mi>
                                             <m:mi>o</m:mi>
                                             <m:mi>d</m:mi>
                                             <m:mtext>&#160;and</m:mtext>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow/>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>y</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mo>=</m:mo>
                                             <m:mi>I</m:mi>
                                             <m:mtext>-</m:mtext>
                                             <m:mi>H</m:mi>
                                             <m:mi>m</m:mi>
                                             <m:mi>o</m:mi>
                                             <m:mi>d</m:mi>
                                             <m:mtext>&#160;and</m:mtext>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow/>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>x</m:mi>
                                                <m:mi>j</m:mi>
                                             </m:msub>
                                             <m:mtext>&#160;starts&#160;with&#160;a&#160;capital</m:mtext>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mrow/>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow>
                                             <m:mtext>letter</m:mtext>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr columnalign="left">
                                       <m:mtd columnalign="left">
                                          <m:mn>0</m:mn>
                                       </m:mtd>
                                       <m:mtd columnalign="left">
                                          <m:mrow/>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                              </m:mrow>
                           </m:mrow>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aaSbaaSqaaiabdMgaPbqabaGcdaqadaqaaiabdMha5naaBaaaleaacqWGQbGAcqGHsislcqaIXaqmaeqaaOGaeiilaWIaemyEaK3aaSbaaSqaaiabdQgaQbqabaGccqGGSaalcuWG4baEgaWcaiabcYcaSiabdQgaQbGaayjkaiaawMcaaiabg2da9maaceaabaqbaeaabuGaaaaabaGaeGymaeJaeeiiaaIaeeyAaKMaeeOzaygabaGaemyEaK3aaSbaaSqaaiabdQgaQjabgkHiTiabigdaXaqabaGccqGH9aqpcqWGcbGqcqqGTaqlcqWGibascqWGTbqBcqWGVbWBcqWGKbazcqqGGaaicqqGHbqycqqGUbGBcqqGKbazaeaaaeaacqWG5bqEdaWgaaWcbaGaemOAaOgabeaakiabg2da9iabdMeajjabb2caTiabdIeaijabd2gaTjabd+gaVjabdsgaKjabbccaGiabbggaHjabb6gaUjabbsgaKbqaaaqaaiabdIha4naaBaaaleaacqWGQbGAaeqaaOGaeeiiaaIaee4CamNaeeiDaqNaeeyyaeMaeeOCaiNaeeiDaqNaee4CamNaeeiiaaIaee4DaCNaeeyAaKMaeeiDaqNaeeiAaGMaeeiiaaIaeeyyaeMaeeiiaaIaee4yamMaeeyyaeMaeeiCaaNaeeyAaKMaeeiDaqNaeeyyaeMaeeiBaWgabaaabaGaeeiBaWMaeeyzauMaeeiDaqNaeeiDaqNaeeyzauMaeeOCaihabaGaeGimaadabaaaaaGaay5Eaaaaaa@8E9F@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>We use a standard feature set like in <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. An overview about the used features classes and some examples is depicted in Table <tblr tid="T2">2</tblr>. Many of the applied features are extracted by standard methods, especially the morphological ones.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Features applied as parameters of the CRF. Applied features which are used as parameters of the CRF are ordered by their classes, corresponding feature examples and explanations are given.</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Name</p>
                     </c>
                     <c ca="left">
                        <p>Explanation</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <ul>Static morphol. features</ul>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <ul>Reg.Ex.</ul>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>All Caps</p>
                     </c>
                     <c ca="left">
                        <p>[A-Z]+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Natural Number</p>
                     </c>
                     <c ca="left">
                        <p>[0&#8211;9]+</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Alpha-Num</p>
                     </c>
                     <c ca="left">
                        <p>[A-Za-z0&#8211;9]+</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <ul>Autom. generated morphol. features</ul>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Autom. Prefixes/Suffixes</p>
                     </c>
                     <c ca="left">
                        <p>Autom. generation of a feature for every token: match that prefix or suffix</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>WordsAsClass</p>
                     </c>
                     <c ca="left">
                        <p>Autom. generation of a feature for every token: match that token</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <ul>Context</ul>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Spaces</p>
                     </c>
                     <c ca="left">
                        <p>Is a token preceded or succeeded by white space</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>In Brackets</p>
                     </c>
                     <c ca="left">
                        <p>Is a token pre-ceded or succeeded by brackets</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Our own implementation of the Named Entity Recognizer of histone modification terms is based on Mallet <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, a widely used and successfully applied system for linear-chain CRF.</p>
            <p>To assess the quality of the obtained model the <it>F</it><sub>1 </sub>measure has been calculated which is defined by</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S28-i8">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>F</m:mi>
                              <m:mi>&#946;</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>+</m:mo>
                                 <m:msup>
                                    <m:mi>&#946;</m:mi>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>p</m:mi>
                                 <m:mi>r</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>c</m:mi>
                                 <m:mi>i</m:mi>
                                 <m:mi>s</m:mi>
                                 <m:mi>i</m:mi>
                                 <m:mi>o</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>r</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>c</m:mi>
                                 <m:mi>a</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mi>l</m:mi>
                              </m:mrow>
                              <m:mrow>
                                 <m:msup>
                                    <m:mi>&#946;</m:mi>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>p</m:mi>
                                 <m:mi>r</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>c</m:mi>
                                 <m:mi>i</m:mi>
                                 <m:mi>s</m:mi>
                                 <m:mi>i</m:mi>
                                 <m:mi>o</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mo>+</m:mo>
                                 <m:mi>r</m:mi>
                                 <m:mi>e</m:mi>
                                 <m:mi>c</m:mi>
                                 <m:mi>a</m:mi>
                                 <m:mi>l</m:mi>
                                 <m:mi>l</m:mi>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOray0aaSbaaSqaaiabek7aIbqabaGccqGH9aqpjuaGdaWcaaqaaiabcIcaOiabigdaXiabgUcaRiabek7aInaaCaaabeqaaiabikdaYaaacqGGPaqkcqGHflY1cqWGWbaCcqWGYbGCcqWGLbqzcqWGJbWycqWGPbqAcqWGZbWCcqWGPbqAcqWGVbWBcqWGUbGBcqGHflY1cqWGYbGCcqWGLbqzcqWGJbWycqWGHbqycqWGSbaBcqWGSbaBaeaacqaHYoGydaahaaqabeaacqaIYaGmaaGaeyyXICTaemiCaaNaemOCaiNaemyzauMaem4yamMaemyAaKMaem4CamNaemyAaKMaem4Ba8MaemOBa4Maey4kaSIaemOCaiNaemyzauMaem4yamMaemyyaeMaemiBaWMaemiBaWgaaaaa@6A6B@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>with <it>&#946; </it>= 1.</p>
         </sec>
         <sec>
            <st>
               <p>Corpus generation and annotation</p>
            </st>
            <p>For CRF, a supervised machine learning method, annotated training and testing corpora are required. Two corpora have been annotated for training and testing the CRF.</p>
            <sec>
               <st>
                  <p>Corpus generation</p>
               </st>
               <p>To train a model an initial corpus (refered to as E<smcaps>PI-TRAIN</smcaps>) of 187 M<smcaps>EDLINE</smcaps> titles and abstracts has been selected manually from a bigger corpus in which both histones and modification terms occur together. This was obtained by a coocurrence M<smcaps>EDLINE</smcaps> search with ProMiner <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> using two separate dictionaries. One contains histone terms and the second one 75 modification terms and spelling variants, like '<it>di-methylation</it>' and '<it>dimethylation</it>'. With this approach 10,576 articles have been obtained. From that corpus 187 titles and abstracts have been selected manually. It was ensured that every modification type is covered by the corpus. E<smcaps>PI-TRAIN</smcaps> has been annotated with the entity class <b>Hmod </b>described below. It comprises 1,605 sentences, 44,876 tokens, and 601 annotated entities. For validation of the trained CRF model and parameter selection a 10-fold cross-validation was performed. For testing the trained model, the corpus E<smcaps>PI-TEST</smcaps> has been generated on the basis of a PubMed search using the MeSH term 'epigenetics'. From 24,653 obtained articles 1,000 titles and abstracts have randomly been chosen and annotated. They are distinct from the articles contained in the E<smcaps>PI-TRAIN</smcaps> corpus. E<smcaps>PI-TEST</smcaps> contains 8,880 sentences, 236,160 tokens, and 221 annotated entities.</p>
            </sec>
            <sec>
               <st>
                  <p>Corpus annotation</p>
               </st>
               <p>WordFreak <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> has been used for the annotation of the corpora. The histone modifications occurring in the selected corpora have been annotated as shown in Figure <figr fid="F2">2</figr>. For entity type <b>Hmod </b>the term had to contain at least one histone type and one modification term, e.g. '<it>histone acetylation</it>' or '<it>histone 3 dimethylation</it>'. The removal of a modification, like '<it>H3K9 demethylation</it>', has also been annotated, because an existing modification is changed. Instead, if a histone modification is part of an enzyme, e.g. in '<it>H3K9 methyltransferase</it>', the term is not annotated. Enumerations are handled as follows: If modification terms, similar to the official nomenclature, occur in an enumeration, like '<it>H3K36me3, H3K79me3 and H3K9ac</it>', they have been annotated as single terms. By contrast, long forms, like '<it>H3K36-mono- or dimethylation</it>', have been annotated as a whole phrase. The two annotated corpora E<smcaps>PI-TRAIN</smcaps> and E<smcaps>PI-TEST</smcaps> are available in an IOB format from the download webpage <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
               <fig id="F2">
                  <title>
                     <p>Figure 2</p>
                  </title>
                  <caption>
                     <p>Annotated example text</p>
                  </caption>
                  <text>
                     <p><b>Annotated example text</b>. Example title and abstract (PMID:18157086, Edmunds et.al 2008) with histone modifications annotated as entity type <b>Hmod</b>.</p>
                  </text>
                  <graphic file="1471-2105-10-S1-S28-2"/>
               </fig>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Standardization and normalization of histone modification term variants</p>
            </st>
            <sec>
               <st>
                  <p>Term standardization</p>
               </st>
               <p>The recognition of histone modification descriptions alone is not sufficient. An inevitable next step is to map the different description variants onto standard terms. We developed a procedure for transforming the identified terms into standard terms corresponding to the Brno nomenclature. It includes a set of rules which is applied to every term. First a validity check is performed to filter false positive terms found with the CRF. One basic rule is the absence of a histone type, for instance. If a term passed this filter, additional rules check the histone type, the presence and quality of modifications, the mentioned amino acid, and the position if provided. A translation of '<it>dimethylated lysine 20 of histone H4</it>' results in '<it>H4 K 20 me 2</it>'. Terms containing enumerations of histone types, modified amino acids, modifications or positions have been resolved by a further set of rules. They define for instance the position dependency between parts of a modification description. From the term '<it>di- and trimethylation of lysine 4 at histone 3</it>' two terms '<it>H3 K 4 me 2</it>' and '<it>H3 K 4 me 3</it>' are generated.</p>
            </sec>
            <sec>
               <st>
                  <p>Term normalization</p>
               </st>
               <p>The mapping of terms to unique database or ontology identifiers is used e.g. for proteins and genes to annotate further context information to them. We also want to use this way of adding more information to the identified terms. It is what we call normalization in this paper. We investigated following data resources and ontologies for their normalization usability of histone modification standard terms: <it>Gene Ontology (GO) </it><abbrgrp><abbr bid="B31">31</abbr></abbrgrp> (version 1.2 (09-07-2008)), <it>PSI-Mod </it><abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, and <it>HistOn </it><abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The analysis of the resources is given below. Terms from terminologies not corresponding to the standard term form have been transformed by the standardization process described above. Subsequently, they have been used for normalization of the identified terms from text.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Histone modification hierarchy generation</p>
            </st>
            <p>Scientists working in epigenomic research have different information needs concerning histone modifications. They would like to obtain scientific articles with different focuses for getting an overview of the research in their own or related fields. Some would possibly ask a text retrieval system general questions, like: 'Give me all documents that contain modifications of histone 3'. Others might like to perform a more specific text search, like: 'Give me all documents dealing with trimethylated lysine at position 9 of histone 3'. The first question implicitly includes the second one in this case. It describes a demand that semantic text retrieval systems, like Textpresso <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> and SCAIView <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, can cope with. In such a system the recognized named entities are mapped to concepts of a hierachy which is used for the organization of texts and allows for semantic search.</p>
            <p>We analysed the existing hierarchical structured terminologies and ontologies <it>MeSH </it><abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, <it>GO</it>, <it>PSI-Mod</it>, and <it>HistOn</it>, for their applicability as histone modification concept hierarchy in such a system. We realized that there is no resource exhaustively covering histone modifications. Therefore, we created an organism-independent hierarchy of standardized histone modification concepts.</p>
            <p>In general, the hierarchy could be generated from two different points of view: Modification-centric or histone-centric. We decided for a histone centric organization of the standardized terms, functioning as concepts. Herewith, getting a fast overview about all modification types of a certain histone type is enabled. Furthermore, applied in a semantic text retrieval system, it allows for organizing scientific texts related to one histone type at different granularity levels. A section of the whole generated hierarchy is given below for histone 3 as an example. Five possible methylation states are given (mono-methylation: me 1, di-methylation: me 2, asymmetric di-methylation: me 2a, symmetric di-methylation: me 2s, tri-methylation: me 3, and unspecified modification type: me) at two amino acids (K: lysine and R: arginine) and two positions (2 and 4):</p>
            <p>&#160;&#160;&#160;...</p>
            <p>&#160;&#160;&#160;0.3.0 H3</p>
            <p>&#160;&#160;&#160;0.3.0.1 H3 me</p>
            <p>&#160;&#160;&#160;0.3.0.1.0.1 H3 R 2 me</p>
            <p>&#160;&#160;&#160;0.3.0.1.0.2 H3 K 4 me</p>
            <p>&#160;&#160;&#160;0.3.0.1.1 H3 me 1</p>
            <p>&#160;&#160;&#160;0.3.0.1.1.1 H3 R 2 me 1</p>
            <p>&#160;&#160;&#160;0.3.0.1.1.2 H3 K 4 me 1</p>
            <p>&#160;&#160;&#160;0.3.0.1.2 H3 me 2</p>
            <p>&#160;&#160;&#160;0.3.0.1.2.1 H3 R 2 me 2</p>
            <p>&#160;&#160;&#160;0.3.0.1.2.2 H3 K 4 me 2</p>
            <p>&#160;&#160;&#160;0.3.0.1.2.a H3 me 2a (asymmetric)</p>
            <p>&#160;&#160;&#160;0.3.0.1.2.a.1 H3 R 2 me 2a</p>
            <p>&#160;&#160;&#160;0.3.0.1.2.s H3 me 2s (symmetric)</p>
            <p>&#160;&#160;&#160;0.3.0.1.2.s.1 H3 R 2 me 2s</p>
            <p>&#160;&#160;&#160;0.3.0.1.3 H3 me 3</p>
            <p>&#160;&#160;&#160;0.3.0.1.3.1 H3 R 2 me 3 </p>
            <p>&#160;&#160;&#160;0.3.0.1.3.2 H3 K 4 me 3</p>
            <p>&#160;&#160;&#160;...</p>
            <p>To every term in the hierarchy a unique number has been assigned. It has at most 7 levels. A basic term set consisting of general histone modification concepts has been assigned to every included histone type. Subsequently, the hierarchy has been populated by standardized terms from <it>GO</it>, <it>MeSH</it>, <it>HistOn</it>, manually collected specific histone modification terms from the antibody online catalogue of <it>Abcam </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, and M<smcaps>EDLINE</smcaps> articles. The terms of the developed hierarchy have been automatically compared with the standardized ones from these resources. Those which have not been used so far within the hierarchy have been proposed by the system for its extension. An analysis of the impact of the single term resources is given below. The generated hierarchy is available in an xml format from the download webpage <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> as well.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>In the following the entity recognition and normalization approach is evaluated and the hierarchy generation is discussed. Additionally, we show the application of the developed approach for epigenomical research.</p>
         <sec>
            <st>
               <p>Evaluation of the histone modification recognition</p>
            </st>
            <p>The parameter selection has been performed by 10-fold cross-validation. The optimal parameter set has been used to train a model on complete E<smcaps>PI-TRAIN</smcaps> which is applied on the test corpus E<smcaps>PI-TEST</smcaps>.</p>
            <p>To prove the impact of features as a parameter of the linear chain CRF, single features or combinations of them have systematically been left out. For every modified feature set a single model has been trained on E<smcaps>PI-TRAIN</smcaps> and was validated by 10-fold cross-validation. The obtained results are shown in Figure <figr fid="F3">3</figr>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Results of the feature analysis</p>
               </caption>
               <text>
                  <p><b>Results of the feature analysis</b>. Recall, precision, and <it>F</it><sub>1 </sub>measure are given for every sinlge feature analysis experiment. The non-used features or combinations of them from the two classes <it>Automatic generated morphological features (AM) </it>and <it>Context (C) </it>are provided.</p>
               </text>
               <graphic file="1471-2105-10-S1-S28-3"/>
            </fig>
            <p>The best feature set has a high performance in recall (0.81), precision (0.87), and <it>F</it><sub>1 </sub>measure (0.84). It includes following features and feature generating methods: <it>Prefix</it>, <it>Suffix</it>, <it>InBrackets</it>, <it>Words as Class</it>, <it>Spaces</it>, <it>wordClass</it>, <it>and doBriefWordClass</it>. The features from class <it>Static morphological </it>have no impact on the result (data not shown), hence they have been omitted altogether. On the contrary, leaving out <it>Spaces </it>and <it>Words as Class </it>affect the histone modification term recognition and lead to a considerable decrease in precision, recall, and <it>F</it><sub>1 </sub>measure. The features <it>Prefix </it>and <it>Suffix </it>have a negative impact onto the recall. It points out that it is relevant whether the token is preceded or succeeded by white space and if words occuring in histone modification descriptions are learned by the system. The first one is important especially in enumerations or abbreviations of terms to separate them from each other. This feature already indicated a high impact on the identification of IUPAC and IUPAC-like names <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
            <p>The optimal feature set was applied to tag the test corpus E<smcaps>PI-TEST</smcaps>. The results for recall, precision, and <it>F</it><sub>1 </sub>measure obtained on E<smcaps>PI-TRAIN</smcaps> and E<smcaps>PI-TEST</smcaps> are provided in Table <tblr tid="T3">3</tblr>. Compared to the results on the training corpus, tagging of E<smcaps>PI-TEST</smcaps> lead to lower recall (0.76), the same precision (0.87), and a lower <it>F</it><sub>1 </sub>measure (0.81).</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Results of the histone modification recognition approach. Results of the 10-fold cross-validation on the corpus E<smcaps>PI-TRAIN</smcaps> and testing of the model on the independent corpus E<smcaps>PI-TEST</smcaps>. Recall, precision, <it>F</it><sub>1 </sub>measure, and the standard deviation for the cross-validation are given.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>E<smcaps>PI-TRAIN</smcaps></p>
                     </c>
                     <c ca="center">
                        <p>E<smcaps>PI-TEST</smcaps></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Recall</p>
                     </c>
                     <c ca="center">
                        <p>0.81 (&#177; 0.05)</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Precision</p>
                     </c>
                     <c ca="center">
                        <p>0.87 (&#177; 0.05)</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>F1 measure</p>
                     </c>
                     <c ca="center">
                        <p>0.84 (&#177; 0.05)</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Evaluation of the term standardization and normalization</p>
            </st>
            <sec>
               <st>
                  <p>Term standardization evaluation</p>
               </st>
               <p>Entities from the two annotated corpora E<smcaps>PI-TRAIN</smcaps> and E<smcaps>PI-TEST</smcaps> have been used to establish rules for transforming term variants to standard terms according to the nomenclature. Standard terms have been manually assigned to every <b>Hmod </b>entity of E<smcaps>PI-TRAIN</smcaps> and E<smcaps>PI-TEST</smcaps>. They have been used for the automatic evaluation of the standardization results. In the first step, rules have been developed using all entities from E<smcaps>PI-TRAIN</smcaps>. Subsequently, they have been tested on entities from E<smcaps>PI-TEST</smcaps>. For reducing the number of false positives, further rules have been incorporated into the system after testing on E<smcaps>PI-TEST</smcaps> entities. The results of the standardization of unique entities from E<smcaps>PI-TRAIN</smcaps> and E<smcaps>PI-TEST</smcaps> given in Table <tblr tid="T4">4</tblr> shows a very good performance of the system.</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Results of the term standardization process. Given is the number of annotated histone modification terms (Ann. terms) and the fraction (in %) of correctly standardized terms (Std. terms).</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>E<smcaps>PI-TRAIN</smcaps></p>
                        </c>
                        <c ca="center">
                           <p>E<smcaps>PI-TEST</smcaps></p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Ann. terms</p>
                        </c>
                        <c ca="center">
                           <p>414</p>
                        </c>
                        <c ca="center">
                           <p>123</p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Std. terms</p>
                        </c>
                        <c ca="center">
                           <p>397 (95.89%)</p>
                        </c>
                        <c ca="center">
                           <p>121 (98.37%)</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>Over 95% of the entities from the E<smcaps>PI-TRAIN</smcaps> corpus and over 98% of the entities from the E<smcaps>PI-TEST</smcaps> corpus have been transformed correctly. Problems occurred for terms like '<it>H3K9me2S10p</it>' resulting in '<it>H3 K 9 S 10 me 2s</it>' instead of '<it>H3 K 9 me 2</it>' and '<it>H3 S 10 ph</it>'. Here, the fragmentation and correct assignment of modification states to the amino acids need to be improved.</p>
               <p>To test the system on histone modification terms recognized by the CRF, entities tagged on the E<smcaps>PI-TEST</smcaps> corpus have been extracted, standardized and evaluated. The tagged entities have been checked up on false positive terms. They have been divided into four classes. The following list provides a brief description of the classes, some examples, and the fraction of terms contained in the different classes related to all false positives (in %):</p>
               <p>1. Modification descriptions without histone mentions (3.7%): '<it>acetylation and methylation</it>'</p>
               <p>2. Enzymes introducing or removing histone modifications (7.4%): '<it>H3K9 methyltransferase</it>'</p>
               <p>3. Boundary problems (48%): 'H3 &#8211; K9) with no sign of histone H2AX phosphorylation', '<it>H3K9me3 at pericentric heterochromatin. H3K27me3 and H4K20me</it>'</p>
               <p>4. Terms with other meaning (25%): '<it>phosphorylation of IRS</it>', '<it>eradication of H7N1, H7N3 and H5</it>'</p>
               <p>False positive terms from classes 1, 2, and 4 have been identified by 100% and have been rejected by the system. Instead, false positive terms corresponding to class 3 which contain histone modification mentions have been passed to the standardization process. They have been standardized correctly by 76.5%. Especially, long terms containing many modification types in an enumerations like '<it>Methylation of H3 at lysine residues K4 and K79 depends on ubiquitylation of histone H2B</it>' caused errors in the standard term generation. It demonstrates that there is still some room for improvement of the standardization process.</p>
            </sec>
            <sec>
               <st>
                  <p>Term normalization evaluation</p>
               </st>
               <p>For the normalization of histone modification terms, <it>Gene Ontology </it>was considered to be the best resource. It is used as a standard for annotating protein and genes. Unfortunately, it implies only 17 histone modification concepts with different grades of granularity. Only one concept describes a specific histone modification type at a defined protein position. The remaining ones contain general information on histone modifications, like '<it>histone acetylation</it>' (GO: 0016573). Furthermore, several modification types are missing, e. g. biotinylation or glycosylation. The analysis result for <it>HistOn </it>is similar. It contains 17 histone modification descriptions <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The ontology <it>PSI-Mod </it>defines general and specific amino acid modification types like 'omega-N, omega-N-dimethyl-L-arginine' which are protein independent. Both are not applicable as normalization resources. <it>MeSH </it>provides 13 concepts for histones and one histone modification description. That is why the concepts from <it>GO </it>and <it>MeSH </it>have been transformed to standard terms and have been used for normalization. The standardizatin of the terms from both resources was 100% correct. The normalization of histone modification terms to <it>GO </it>allows for annotating additional information, like genes or proteins, which are linked to the ontology.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Discussion of the histone modification hierarchy generation</p>
            </st>
            <p>Since there was no existing comprehensive hierarchy ready to use, we developed our own, including 462 concepts. It is a manually created text file which was transformed into xml. The used term resources contribute to the hierarchy concepts as follows:</p>
            <p>&#8226; 13 histone types: 13 histone types connected to <it>GO </it>obtained with Gene product search using AmiGO <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, 13 in <it>MeSH</it>, 7 in the online catalogue of Abcam, 8 in <it>HistOn</it>, 10 in M<smcaps>EDLINE</smcaps> articles</p>
            <p>&#8226; 262 general histone modification types: 16 in <it>GO</it>, 47 in M<smcaps>EDLINE</smcaps> articles, 1 in <it>MeSH</it></p>
            <p>&#8226; 156 specific modification types from different resources: 148 from online catalogue of Abcam, 52 in M<smcaps>EDLINE</smcaps> articles, 1 in <it>GO </it>and <it>HistOn</it>.</p>
            <p>The terms from the different resources overlap in content. <it>GO </it>and <it>MeSH </it>are the best of the considered resources for histone types, whereas Abcam and M<smcaps>EDLINE</smcaps> articles are the most useful resources for general and specific histone modification types. The current version of the created hierarchy covers the most important histone types. With new histone modification findings published in the literature and deposited to the analysed resources, the histone modification concept hierarchy is expected to grow in the future.</p>
         </sec>
         <sec>
            <st>
               <p>Application of the developed approach for epigenomic research</p>
            </st>
            <p>The CRF model trained with the best feature set was applied for tagging the complete M<smcaps>EDLINE</smcaps>. The obtained article number for the most often occurring histone modification types is provided in Table <tblr tid="T5">5</tblr>.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Application of the developed approach: number of Medline articles for the most often occuring histone modifications. Number of obtained M<smcaps>EDLINE</smcaps> abstracts for the most often occurring histone modifications after term identification and standardization. (The histone modification example '<it>H3K9me3</it>' introduced in Section Background is marked in bold.)</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Modification type</p>
                     </c>
                     <c ca="center">
                        <p>Number of articles</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 9 me</p>
                     </c>
                     <c ca="center">
                        <p>231</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 4 me</p>
                     </c>
                     <c ca="center">
                        <p>173</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 4 me 3</p>
                     </c>
                     <c ca="center">
                        <p>104</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>H3 K 9 me 3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>90</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 9 me 2</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 S 10 ph</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 27 me 3</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 9 ac</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 27 me</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>H3 K 4 me 2</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>It demonstrates that the developed approach is able to link a huge number of abstracts from M<smcaps>EDLINE</smcaps> (version of the 23th June 2008) to one histone modification concept at once. It shows the importance of the mapping of the different term variants onto one standard concept. In comparison, a simple search strategy implies that all term variants need to be given one by one to the query machinary to find all related texts (cf. PubMed search results in Section Background).</p>
            <p>The integration of the histone modification concept hierarchy into the semantic text retrieval and analysis system SCAIView and the mapping of the hierarchy concepts to the recognized ones enables the retrieval of articles containing histone modifications at different levels of granularity. With SCAIView, which was developed in our group, a combined search with various other biomedical concepts, like proteins, genes, diseases, and chemical substances/drugs can be performed supporting an easy navigation through huge text corpora, like M<smcaps>EDLINE</smcaps>. These entities are detected by our dictionary-based Named Entity Recognition (NER) tool ProMiner <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and two other CRF-based NER approaches which are optmimized for the identification of SNPs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and IUPAC-like names <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Having this, further studies can be performed by a combined analysis of additional biomedical concepts relevant for epigenomic research. This could lead to new hypotheses directing the design of further experiments.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have introduced an approach for the automated identification of histone modification mentions in text with CRFs which reaches high <it>F</it><sub>1 </sub>measures on training (0.84) and test corpus (0.81). The standardization of the identified terms, which has a very good performance of 95&#8211;98%, enables the mapping of different spelling variants of one histone modification type onto each other. We showed that our new developed approach is superior to a PubMed search for retrieving a high number of abstracts related to a histone modification type in M<smcaps>EDLINE</smcaps>. The integration of the developed histone modification hierarchy into a semantic text retrieval system and its mapping to standardized terms enables semantic search. The combination of a search for other biomedical named entities allows for asking more complex questions in one single step, which has not been possible up to now. Furthermore, thanks to the normalization of histone modification terms to <it>GO </it>and <it>MeSH </it>additional epigenomical relevant information, like influenced genes or proteins can be annotated.</p>
         <p>Future work has to be invested in the extraction of histone modification related information. Finding related expression states of certain genes, DNA methylation states, cell/tissue types, chemical substances, phenotypes, and disease states for example, will improve literature-based knowledge discovery and thus support hypothesis generation for epigenomics.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CK selected and annotated the corpora, developed the term standardization and term normalization process, generated the hierarchy, performed the CRF studies and analyzed the features, as well as evaluated the data and drafted the manuscript. RK implemented the workflow from annotating and tokenizing text to training, tagging and evaluation of Conditional Random Fields. He was involved in the parameter selection of the models. MH-A critically revised the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We want to thank Juliane Fluck for her support as well as Christoph M. Friedrich and Philippe Thomas for fruitful discussions. The work of Corinna Kol&#225;&#345;ik is funded by the Bonn-Aachen International Center for Information Technologies <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and the work of Roman Klinger by the Fraunhofer-Max-Planck Machine Learning Cooperation <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/10?issue=S1</url></p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The key to development: interpreting the histone code?</p>
            </title>
            <aug>
               <au>
                  <snm>Margueron</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Trojer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Reinberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>2</issue>
            <fpage>163</fpage>
            <lpage>176</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gde.2005.01.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">15797199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers</p>
            </title>
            <aug>
               <au>
                  <snm>Taverna</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ruthenburg</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Allis</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nat Struct Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>14</volume>
            <issue>11</issue>
            <fpage>1025</fpage>
            <lpage>1040</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsmb1338</pubid>
                  <pubid idtype="pmpid" link="fulltext">17984965</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Translating the histone code</p>
            </title>
            <aug>
               <au>
                  <snm>Jenuwein</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Allis</snm>
                  <fnm>CD</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <issue>5532</issue>
            <fpage>1074</fpage>
            <lpage>1080</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1063127</pubid>
                  <pubid idtype="pmpid" link="fulltext">11498575</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The dynamic epigenome and its implications in toxicology</p>
            </title>
            <aug>
               <au>
                  <snm>Szyf</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Toxicol Sci</source>
            <pubdate>2007</pubdate>
            <volume>100</volume>
            <fpage>7</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/toxsci/kfm177</pubid>
                  <pubid idtype="pmpid" link="fulltext">17675334</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Epigenetic reprogramming and imprinting in origins of disease</p>
            </title>
            <aug>
               <au>
                  <snm>yee Tang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>mei Ho</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Rev Endocr Metab Disord</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>2</issue>
            <fpage>173</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s11154-007-9042-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">17638084</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Epigenetics and cancer: towards an evaluation of the impact of environmental and dietary factors</p>
            </title>
            <aug>
               <au>
                  <snm>Herceg</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mutagenesis</source>
            <pubdate>2007</pubdate>
            <volume>22</volume>
            <issue>2</issue>
            <fpage>91</fpage>
            <lpage>103</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/mutage/gel068</pubid>
                  <pubid idtype="pmpid" link="fulltext">17284773</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Implication of abnormal epigenetic patterns for human diseases</p>
            </title>
            <aug>
               <au>
                  <snm>Santos-Rebou&#231;as</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Pimentel</snm>
                  <fnm>MMG</fnm>
               </au>
            </aug>
            <source>Eur J Hum Genet</source>
            <pubdate>2007</pubdate>
            <volume>15</volume>
            <fpage>10</fpage>
            <lpage>17</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.ejhg.5201727</pubid>
                  <pubid idtype="pmpid" link="fulltext">17047674</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Phenotypic plasticity and the epigenetics of human disease</p>
            </title>
            <aug>
               <au>
                  <snm>Feinberg</snm>
                  <fnm>AP</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2007</pubdate>
            <volume>447</volume>
            <issue>7143</issue>
            <fpage>433</fpage>
            <lpage>440</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05919</pubid>
                  <pubid idtype="pmpid" link="fulltext">17522677</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The UCSC Genome Browser</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Curr Protoc Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>Chapter 1</volume>
            <note>Unit 14.</note>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18428780</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>ChromatinDB: a database of genome-wide histone modification patterns for Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>O'Connor</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Wyrick</snm>
                  <fnm>JJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>14</issue>
            <fpage>1828</fpage>
            <lpage>1830</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm236</pubid>
                  <pubid idtype="pmpid" link="fulltext">17485428</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The Histone Database: a comprehensive resource for histones and histone fold-containing proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Marino-Ram&#237;rez</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Baxevanis</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Landsman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2006</pubdate>
            <volume>62</volume>
            <issue>4</issue>
            <fpage>838</fpage>
            <lpage>842</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1800941</pubid>
                  <pubid idtype="pmpid" link="fulltext">16345076</pubid>
                  <pubid idtype="doi">10.1002/prot.20814</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>PubMed</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses</p>
            </title>
            <aug>
               <au>
                  <snm>Smalheiser</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Swanson</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Comput Methods Programs Biomed</source>
            <pubdate>1998</pubdate>
            <volume>57</volume>
            <issue>3</issue>
            <fpage>149</fpage>
            <lpage>153</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0169-2607(98)00033-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">9822851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Using literature-based discovery to identify disease candidate genes</p>
            </title>
            <aug>
               <au>
                  <snm>Hristovski</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Peterlin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Humphrey</snm>
                  <fnm>SM</fnm>
               </au>
            </aug>
            <source>Int J Med Inform</source>
            <pubdate>2005</pubdate>
            <volume>74</volume>
            <issue>2&#8211;4</issue>
            <fpage>289</fpage>
            <lpage>298</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ijmedinf.2004.04.024</pubid>
                  <pubid idtype="pmpid" link="fulltext">15694635</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Detection of IUPAC and IUPAC-like Chemical Names</p>
            </title>
            <aug>
               <au>
                  <snm>Klinger</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kol&#225;&#345;ik</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fluck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hofmann-Apitius</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Friedrich</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <issue>13</issue>
            <fpage>i268</fpage>
            <lpage>i276</lpage>
            <note>[Proceedings of the International Conference Intelligent Systems for Molecular Biology (ISMB).].</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btn181</pubid>
                  <pubid idtype="pmpid" link="fulltext">18586724</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Tagging gene and protein names in biomedical text</p>
            </title>
            <aug>
               <au>
                  <snm>Tanabe</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wilbur</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>8</issue>
            <fpage>1124</fpage>
            <lpage>1132</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.8.1124</pubid>
                  <pubid idtype="pmpid" link="fulltext">12176836</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>ProMiner: rule-based protein and gene entity recognition</p>
            </title>
            <aug>
               <au>
                  <snm>Hanisch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fundel</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mevissen</snm>
                  <fnm>HT</fnm>
               </au>
               <au>
                  <snm>Zimmer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fluck</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>Suppl 1</issue>
            <fpage>S14</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1869006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15960826</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-S1-S14</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>MutationFinder: a high-performance system for extracting point mutation mentions from text</p>
            </title>
            <aug>
               <au>
                  <snm>Caporaso</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Baumgartner</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>Randolph</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Cohen</snm>
                  <fnm>KB</fnm>
               </au>
               <au>
                  <snm>Hunter</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>14</issue>
            <fpage>1862</fpage>
            <lpage>1865</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2516306</pubid>
                  <pubid idtype="pmpid" link="fulltext">17495998</pubid>
                  <pubid idtype="doi">10.1093/bioinformatics/btm235</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>PubMeth: a cancer methylation database combining text-mining and expert annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Ongenaert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Neste</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Menschaert</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bekaert</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Criekinge</snm>
                  <fnm>WV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <issue>36 Database</issue>
            <fpage>D842</fpage>
            <lpage>D846</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238841</pubid>
                  <pubid idtype="pmpid" link="fulltext">17932060</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Epigenome Network of Excellence</p>
            </title>
            <url>http://www.epigenome-noe.net/</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Histone modifications: signalling receptors and potential elements of a heritable epigenetic code</p>
            </title>
            <aug>
               <au>
                  <snm>Nightingale</snm>
                  <fnm>KP</fnm>
               </au>
               <au>
                  <snm>O'Neill</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>BM</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>2</issue>
            <fpage>125</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gde.2006.02.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">16503131</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Reading signals on the nucleosome with a new nomenclature for modified histones</p>
            </title>
            <aug>
               <au>
                  <snm>Turner</snm>
                  <fnm>BM</fnm>
               </au>
            </aug>
            <source>Nat Struct Mol Biol</source>
            <pubdate>2005</pubdate>
            <volume>12</volume>
            <issue>2</issue>
            <fpage>110</fpage>
            <lpage>112</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsmb0205-110</pubid>
                  <pubid idtype="pmpid" link="fulltext">15702071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Identifying Gene Specific Variants in Biomedical Text</p>
            </title>
            <aug>
               <au>
                  <snm>Klinger</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>LI</fnm>
               </au>
               <au>
                  <snm>Friedrich</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Mevissen</snm>
                  <fnm>HT</fnm>
               </au>
               <au>
                  <snm>Fluck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sanz</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hofmann-Apitius</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Journal of Bioinformatics and Computational Biology</source>
            <pubdate>2007</pubdate>
            <volume>5</volume>
            <issue>6</issue>
            <fpage>1277</fpage>
            <lpage>1296</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1142/S0219720007003156</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The success (or not) of HUGO nomenclature</p>
            </title>
            <aug>
               <au>
                  <snm>Tamames</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Valencia</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>5</issue>
            <fpage>402</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1779514</pubid>
                  <pubid idtype="pmpid" link="fulltext">16707004</pubid>
                  <pubid idtype="doi">10.1186/gb-2006-7-5-402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data</p>
            </title>
            <aug>
               <au>
                  <snm>Lafferty</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>McCallum</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pereira</snm>
                  <fnm>FCN</fnm>
               </au>
            </aug>
            <source>Proceedings of the Eighteenth International Conference on Machine Learning</source>
            <publisher>Morgan Kaufmann Publishers Inc. </publisher>
            <pubdate>2001</pubdate>
            <fpage>282</fpage>
            <lpage>289</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Identifying Gene and Protein Mentions in Text Using Conditional Random Fields</p>
            </title>
            <aug>
               <au>
                  <snm>McDonald</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Pereira</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>Suppl 1</issue>
            <fpage>S6</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1869020</pubid>
                  <pubid idtype="pmpid" link="fulltext">15960840</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-S1-S6</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Classical Probabilistic Models and Conditional Random Fields</p>
            </title>
            <aug>
               <au>
                  <snm>Klinger</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Tomanek</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Tech Rep TR07-2-013</source>
            <publisher>Department of Computer Science, Dortmund University of Technology</publisher>
            <pubdate>2007</pubdate>
            <note>[ISSN 1864-4503].</note>
         </bibl>
         <bibl id="B28">
            <title>
               <p>MALLET: A Machine Learning for Language Toolkit</p>
            </title>
            <aug>
               <au>
                  <snm>McCallum</snm>
                  <fnm>AK</fnm>
               </au>
            </aug>
            <pubdate>2002</pubdate>
            <url>Http://mallet.cs.umass.edu</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>WordFreak: an Open Tool for Linguistic Annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Morton</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>LaCivita</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>HLT/NAACL 2003: demonstrations</source>
            <pubdate>2003</pubdate>
            <fpage>17</fpage>
            <lpage>18</lpage>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Corpus and Histone Modification Hierarchy Download</p>
            </title>
            <url>http://www.scai.fraunhofer.de/histone-corpora.html</url>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Gene Ontology</p>
            </title>
            <url>http://www.geneontology.org/</url>
         </bibl>
         <bibl id="B32">
            <title>
               <p>PSI-Mod</p>
            </title>
            <url>http://psidev.sourceforge.net/mod/data/PSI-MOD.obo</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data</p>
            </title>
            <aug>
               <au>
                  <snm>Post</snm>
                  <fnm>LJG</fnm>
               </au>
               <au>
                  <snm>Roos</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>van Driel</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Breit</snm>
                  <fnm>TM</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>22</issue>
            <fpage>3080</fpage>
            <lpage>3087</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm461</pubid>
                  <pubid idtype="pmpid" link="fulltext">17881406</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Textpresso: an ontology-based information retrieval and extraction system for biological literature</p>
            </title>
            <aug>
               <au>
                  <snm>M&#252;ller</snm>
                  <fnm>HM</fnm>
               </au>
               <au>
                  <snm>Kenny</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Sternberg</snm>
                  <fnm>PW</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <issue>11</issue>
            <fpage>e309</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">517822</pubid>
                  <pubid idtype="pmpid" link="fulltext">15383839</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020309</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Knowledge Environments Representing Molecular Entities for the Virtual Physiological Human</p>
            </title>
            <aug>
               <au>
                  <snm>Hofmann-Apitius</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fluck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Furlong</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fornes</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Kol&#225;&#345;ik</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hanser</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Boecker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schultz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sanz</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Klinger</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mevissen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gatterneyer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Oliva</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Friedrich</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Philosophical Transactions of the Royal Society A</source>
            <pubdate>2008</pubdate>
            <volume>366</volume>
            <issue>1878</issue>
            <fpage>3091</fpage>
            <lpage>3110</lpage>
         </bibl>
         <bibl id="B36">
            <title>
               <p>MeSH</p>
            </title>
            <url>http://www.nlm.nih.gov/mesh/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Abcam</p>
            </title>
            <url>http://www.abcam.com/</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>AmiGO</p>
            </title>
            <url>http://amigo.geneontology.org/cgi-bin/amigo/go.cgi</url>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Bonn-Aachen International Center for Information Technologies</p>
            </title>
            <url>http://www.b-it-center.de</url>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Fraunhofer-Max-Planck Machine Learning Cooperation</p>
            </title>
            <url>http://lip.fml.tuebingen.mpg.de</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Cross-regulation of histone modifications</p>
            </title>
            <aug>
               <au>
                  <snm>Latham</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Dent</snm>
                  <fnm>SYR</fnm>
               </au>
            </aug>
            <source>Nat Struct Mol Biol</source>
            <pubdate>2007</pubdate>
            <volume>14</volume>
            <issue>11</issue>
            <fpage>1017</fpage>
            <lpage>1024</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsmb1307</pubid>
                  <pubid idtype="pmpid" link="fulltext">17984964</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Histone deimination antagonizes arginine methylation</p>
            </title>
            <aug>
               <au>
                  <snm>Cuthbert</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Daujat</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Snowden</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Erdjument-Bromage</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hagiwara</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yamada</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gregory</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Tempst</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bannister</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Kouzarides</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2004</pubdate>
            <volume>118</volume>
            <issue>5</issue>
            <fpage>545</fpage>
            <lpage>553</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.08.020</pubid>
                  <pubid idtype="pmpid" link="fulltext">15339660</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Proline isomerization of histone H3 regulates lysine methylation and gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Nelson</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Santos-Rosa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kouzarides</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2006</pubdate>
            <volume>126</volume>
            <issue>5</issue>
            <fpage>905</fpage>
            <lpage>916</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2006.07.026</pubid>
                  <pubid idtype="pmpid" link="fulltext">16959570</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>