<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-5-118</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>Estimating mutual information using B-spline functions &#8211; an improved similarity measure for analysing gene expression data</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Daub</snm>
               <mi>O</mi>
               <fnm>Carsten</fnm>
               <insr iid="I1"/>
               <insr iid="I4"/>
               <email>carsten.daub@cgb.ki.se</email>
            </au>
            <au id="A2">
               <snm>Steuer</snm>
               <fnm>Ralf</fnm>
               <insr iid="I2"/>
               <email>steuer@agnld.uni-potsdam.de</email>
            </au>
            <au id="A3">
               <snm>Selbig</snm>
               <fnm>Joachim</fnm>
               <insr iid="I1"/>
               <email>selbig@mpimp-golm.mpg.de</email>
            </au>
            <au id="A4">
               <snm>Kloska</snm>
               <fnm>Sebastian</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>kloska@scienion.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Max Planck Institute of Molecular Plant Physiology, Potsdam, 14424, Germany</p>
            </ins>
            <ins id="I2">
               <p>Nonlinear Dynamics Group, Institute of Physics, University of Potsdam, Potsdam, 14415, Germany</p>
            </ins>
            <ins id="I3">
               <p>Scienion AG, Volmerstrasse 7a, Berlin, 12489, Germany</p>
            </ins>
            <ins id="I4">
               <p>Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, 17177, Sweden</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>118</fpage>
         <url>http://www.biomedcentral.com/1471-2105/5/118</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15339346</pubid>
               <pubid idtype="doi">10.1186/1471-2105-5-118</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>15</day>
               <month>12</month>
               <year>2003</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>31</day>
               <month>8</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>31</day>
               <month>8</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Daub et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open-access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures.</p>
               <p>A C++ source code of our algorithm is available for non-commercial use from <email>kloska@scienion.de</email> upon request.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The evaluation of the complex regulatory networks underlying molecular processes poses a major challenge to current research. With modern experimental methods in the field of gene expression, it is possible to monitor mRNA abundance for whole genomes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. To elucidate the functional relationships inherent in this data, a commonly used approach is the clustering of co-expressed genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In this context, the choice of the similarity measure used for clustering, as well as the clustering method itself, is crucial for the results obtained. Often, linear similarity measures such as the Euclidean distance or Pearson correlation are used in an ad-hoc manner. By doing so, it is possible that subsets of non-linear correlations contained in a given dataset are missed.</p>
         <p>Therefore, information theoretic concepts, such as mutual information, are being used to extend more conventional methods in various contexts ranging from expression <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> and DNA sequence analysis <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, to reverse engineering <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and independent component analysis <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Also aside the bioinformatics field, mutual information is widely utilised in diverse disciplines, such as physics <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, image recognition <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, speech recognition <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, and various others. In extension to other similarity measures, mutual information provides a general measure of statistical dependence between variables. It is thereby able to detect any type of functional relationship, extending the potentialities of linear measures as illustrated in Figure <figr fid="F1">1</figr>.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Two datasets <it>X </it>and <it>Y </it>(100 data points) show a hypothetical dependency <it>f</it>(<it>x</it>) = 4<it>x</it>(1 - <it>x</it>) (top)</p>
            </caption>
            <text>
               <p>Two datasets <it>X </it>and <it>Y </it>(100 data points) show a hypothetical dependency <it>f</it>(<it>x</it>) = 4<it>x</it>(1 - <it>x</it>) (top). The Pearson correlation coefficient is not able to detect a significant correlation as shown in the histogram plot of the dataset compared to 300 realisations of shuffled data (left). Mutual information clearly shows that the two datasets are not statistically independent (right).</p>
            </text>
            <graphic file="1471-2105-5-118-1"/>
         </fig>
         <p>In this work, we discuss mutual information as a measure of similarity between variables. In the first section, we give a short introduction into the basic concepts including a brief description of the commonly used approaches for numerical estimation from continuous data. In the following section, we then present an algorithm for estimating mutual information from finite data.</p>
         <p>The properties arising from this approach are compared to previously existing algorithms. In subsequent sections, we then apply our concept to large-scale cDNA abundance datasets and determine if these datasets can be sufficiently described using linear measurements or if a significant amount of non-linear correlations are missed.</p>
         <sec>
            <st>
               <p>Mutual information</p>
            </st>
            <p>Mutual information represents a general information theoretic approach to determine the statistical dependence between variables. The concept was initially developed for discrete data. For a system, <it>A</it>, with a finite set of <it>M </it>possible states {<it>a</it><sub>1</sub>, <it>a</it><sub>2</sub>, ... , <graphic file="1471-2105-5-118-i1.gif"/>}, the Shannon entropy <it>H</it>(<it>A</it>) is defined as <abbrgrp><abbr bid="B17">17</abbr></abbrgrp></p>
            <p>
               <graphic file="1471-2105-5-118-i2.gif"/>
            </p>
            <p>where <it>p</it>(<it>a<sub>i</sub></it>) denotes the probability of the state <it>a<sub>i</sub></it>. The Shannon entropy is a measure for how evenly the states of <it>A </it>are distributed. The entropy of system <it>A </it>becomes zero if the outcome of a measurement of <it>A </it>is completely determined to be <it>a<sub>j</sub></it>, thus if <it>p</it>(<it>a<sub>j</sub></it>) = 1 and <it>p</it>(<it>a<sub>i</sub></it>) = 0 for all <it>i </it>&#8800; <it>j</it>, whereas the entropy becomes maximal if all probabilities are equal. The joint entropy <it>H</it>(<it>A, B</it>) of two systems <it>A </it>and <it>B </it>is defined analogously</p>
            <p>
               <graphic file="1471-2105-5-118-i3.gif"/>
            </p>
            <p>This leads to the relation</p>
            <p><it>H</it>(<it>A, B</it>) &#8804; <it>H</it>(<it>A</it>) + <it>H</it>(<it>B</it>) &#160;&#160;&#160; (3)</p>
            <p>which fulfils equality only in the case of statistical independence of <it>A </it>and <it>B</it>. Mutual information <it>MI</it>(<it>A, B</it>) can be defined as <abbrgrp><abbr bid="B17">17</abbr></abbrgrp></p>
            <p><it>MI</it>(<it>A, B</it>) = <it>H</it>(<it>A</it>) + <it>H</it>(<it>B</it>) - <it>H</it>(<it>A, B</it>) &#8805; 0 &#160;&#160;&#160; (4)</p>
            <p>It is zero if <it>A </it>and <it>B </it>are statistically independent and increases the less statistically independent <it>A </it>and <it>B </it>are.</p>
            <p>If mutual information is indeed to be used for the analysis of gene-expression data, the continuous experimental data need to be partitioned into discrete intervals, or bins. In the following section, we briefly review the established procedures; a description of how we have extended the basic approach will be provided in the subsequent section.</p>
         </sec>
         <sec>
            <st>
               <p>Estimates from continuous data</p>
            </st>
            <p>In the case of discrete data the estimation of the probabilities <it>p</it>(<it>a<sub>i</sub></it>) is straightforward. Many practical applications, however, supply continuous data for which the probability distributions are unknown and have to be estimated. In a widely used approach <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, the calculation of mutual information is based on the binning of data into <it>M </it>discrete intervals <it>a<sub>i</sub></it>, <it>i </it>= 1... <it>M<sub>A</sub></it>. For experimental data consisting of <it>N </it>measurements of a variable <it>x<sub>u</sub></it>, <it>u </it>= 1... <it>N</it>, an indicator function &#920;<sub><it>i </it></sub>counts the number of data points within each bin. The probabilities are then estimated based on the relative frequencies of occurrence</p>
            <p>
               <graphic file="1471-2105-5-118-i4.gif"/>
            </p>
            <p>with</p>
            <p>
               <graphic file="1471-2105-5-118-i5.gif"/>
            </p>
            <p>For two variables the joint probabilities <graphic file="1471-2105-5-118-i6.gif"/> are calculated analogously from a multivariate histogram. Additionally it has been suggested <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to adaptively choose the sizes of the bins, so that each bin constructed nearly has a uniform distribution of points. In a different approach, kernel methods are used for the estimation of the probability density of Eq. (5) <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Entropies are then calculated by integration of the estimated densities. Recently, an entropy estimator <graphic file="1471-2105-5-118-i7.gif"/> was suggested <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and showed in an extensive comparison to other commonly used estimators to be superior.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Fuzzy mutual information</p>
            </st>
            <p>In the classical binning approach, described above, each data point is assigned to one, and only one, bin. For data points near to the border of a bin, small fluctuations due to biological or measurement noise might shift these points to neighbouring bins. Especially for datasets of moderate size, the positions of the borders of the bins can thereby strongly affect the resulting mutual information <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In a manner analogous to kernel density estimators (KDE), we now present a generalisation to the classical binning in which we aim to overcome some of the drawbacks associated with the simple approach. Within our algorithm, we allow the data points to be assigned to several bins simultaneously. For this, we extended the indicator function &#920;(<it>x</it>) to the set of polynomial B-spline functions. Here, we do not provide the mathematical details for these functions since they have been discussed extensively in the literature <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, but rather focus on the practical applicability. Within the B-spline approach, each measurement is assigned to more than one bin, <it>i</it>, with weights given by the B-spline functions <it>B</it><sub><it>i,k</it></sub>. The spline order <it>k </it>determines the shape of the weight functions and thereby the number of bins each of the data points is assigned to. A spline order <it>k </it>= 1 corresponds to the simple binning, as described in the previous section: Each data point is assigned to exactly one bin (Figure <figr fid="F2">2</figr>, left). For <it>k </it>= 3, each data point is assigned to three bins, with the respective weights given by the values of the B-spline functions at the data point (Figure <figr fid="F2">2</figr>, right).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The continuous experimental data for the variable <it>x </it>needs to be binned for the calculation of mutual information</p>
               </caption>
               <text>
                  <p>The continuous experimental data for the variable <it>x </it>needs to be binned for the calculation of mutual information. The indicator function of Eq. (5) counts the number of data points within each bin (example with <it>M</it><sub><it>x </it></sub>= 5 bins, left). The generalised indicator function based on B-spline functions of Eq. (8) extends the bins to polynomial functions (example with <it>M</it><sub><it>x </it></sub>= 5 bins and spline order <it>k </it>= 3, right). The bins now overlap and the weight of each data point to each of the bins is given by the value of the respective B-spline functions at the data point. By definition, all weights contributing to one data point sum up to unity.</p>
               </text>
               <graphic file="1471-2105-5-118-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>B-spline functions</p>
            </st>
            <p>The first step in the definition of the B-spline functions is the definition of a knot vector <it>t</it><sub><it>i </it></sub>for a number of bins <it>i </it>= 1... <it>M </it>and one given spline order <it>k </it>= 1... <it>M </it>- 1 <abbrgrp><abbr bid="B22">22</abbr></abbrgrp></p>
            <p>
               <graphic file="1471-2105-5-118-i8.gif"/>
            </p>
            <p>where the spline order determines the degree of the polynomial functions. The domain of the B-spline functions lies in the interval <it>z </it>&#8712; [0, <it>M </it>- <it>k </it>+ 1]. To cover the range of the variables, the new indicator function based on the B-spline functions needs to be linearly transformed to map their range. The recursive definition of the B-spline functions are as follows <abbrgrp><abbr bid="B22">22</abbr></abbrgrp></p>
            <p>
               <graphic file="1471-2105-5-118-i9.gif"/>
            </p>
            <p>An important property of B-spline functions is the implicit standardisation of coefficients: All weights belonging to one data point sum up to unity.</p>
         </sec>
         <sec>
            <st>
               <p>Algorithm</p>
            </st>
            <sec>
               <st>
                  <p>Input</p>
               </st>
               <p>&#8226; Variables <it>x </it>and <it>y </it>with values <it>x</it><sub><it>u </it></sub>and <it>y</it><sub><it>u</it></sub>, <it>u </it>= 1... <it>N</it></p>
               <p>&#8226; Bins <it>a</it><sub><it>i</it></sub>, <it>i </it>= 1... <it>M</it><sub><it>x </it></sub>and <it>b</it><sub><it>j</it></sub>, <it>j </it>= 1... <it>M</it><sub><it>y</it></sub></p>
               <p>&#8226; Spline order <it>k</it></p>
            </sec>
            <sec>
               <st>
                  <p>Output</p>
               </st>
               <p>&#8226; Mutual information between variable <it>x </it>and <it>y</it></p>
            </sec>
            <sec>
               <st>
                  <p>Algorithm</p>
               </st>
               <p>1. Calculation of marginal entropy for variable <it>x</it></p>
               <p>(a) Determine <graphic file="1471-2105-5-118-i10.gif"/> with</p>
               <p>
                  <graphic file="1471-2105-5-118-i11.gif"/>
               </p>
               <p>(b) Determine <it>M</it><sub><it>x </it></sub>weighting coefficients for each <it>x</it><sub><it>u </it></sub>from <graphic file="1471-2105-5-118-i12.gif"/></p>
               <p>(c) Sum over all <it>x</it><sub><it>u </it></sub>and determine <it>p</it>(<it>a</it><sub><it>i</it></sub>) for each bin <it>a</it><sub><it>i </it></sub>from</p>
               <p>
                  <graphic file="1471-2105-5-118-i13.gif"/>
               </p>
               <p>(d) Determine entropy <it>H</it>(<it>x</it>) according to Eq. (1)</p>
               <p>2. Calculation of joint entropy of two variables <it>x </it>and <it>y</it></p>
               <p>(a) Apply steps 1 (a) and (b) to both variables <it>x </it>and <it>y</it>, independently</p>
               <p>(b) Calculate joint probabilities <it>p</it>(<it>a</it><sub><it>i</it></sub>, <it>b</it><sub><it>j</it></sub>) for all <it>M</it><sub><it>x </it></sub>&#215; <it>M</it><sub><it>y </it></sub>bins according to</p>
               <p>
                  <graphic file="1471-2105-5-118-i14.gif"/>
               </p>
               <p>(c) Calculate the joint entropy <it>H</it>(<it>x,y</it>) according to Eq. (2)</p>
               <p>3. Calculate the mutual information <it>MI</it>(<it>x,y</it>) according to Eq. (4)</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Example</p>
            </st>
            <p>We show the estimation with the standard binning and our approach ex-emplarily on two artificial variables <it>x </it>= 0.0,0.2,0.4,0.6,0.8,1.0 and <it>y </it>= 0.8,1.0,0.6,0.4,0.0,0.2 for <it>M </it>= 3 bins, spline order <it>k </it>= 2, and the logarithm to basis two.</p>
            <sec>
               <st>
                  <p>Simple binning</p>
               </st>
               <p>For both variables, each of the three histogram bins contains two values <it>p</it>(<it>a</it><sub>1</sub>) = <it>p</it>(<it>a</it><sub>2</sub>) = <it>p</it>(<it>a</it><sub>3</sub>) = <graphic file="1471-2105-5-118-i15.gif"/>, analogously for <it>p</it>(<it>b<sub>i</sub></it>) due to the symmetry of data <it>H</it>(<it>x</it>) = <it>H</it>(<it>y</it>) = <graphic file="1471-2105-5-118-i16.gif"/> = log<sub>2 </sub>3 &#8776; 1.58. For the calculation of the joint probability, three of the nine two dimensional bins contain two values each <it>p</it>(<it>a</it><sub>1</sub>, <it>b</it><sub>3</sub>) = <it>p</it>(<it>a</it><sub>2</sub>, <it>b</it><sub>2</sub>) = <it>p</it>(<it>a</it><sub>3</sub>, <it>b</it><sub>1</sub>) = <graphic file="1471-2105-5-118-i15.gif"/> resulting in <it>H</it>(<it>x, y</it>) = log<sub>2 </sub>3 and <it>MI</it>(<it>x, y</it>) = log<sub>2 </sub>3.</p>
            </sec>
            <sec>
               <st>
                  <p>B-spline approach</p>
               </st>
               <p>The modified indicator function <graphic file="1471-2105-5-118-i17.gif"/> is determined to <it>B</it><sub><it>i,k</it></sub>(2<it>x</it>) according to Eq. (9) (rule 1(a)). For each value <it>x</it><sub><it>u </it></sub>three weighting coefficients are determined (rule 1(c)) and probabilities are calculated (rule 1(d)) (Table <tblr tid="T1">1</tblr>). The analogous procedure is applied to variable <it>y </it>and the single entropies are calculated to <it>H</it>(<it>x</it>) = <it>H</it>(<it>y</it>) = Iog<sub>2</sub>(10) - 0.61og<sub>2</sub>(3) - 0.41og<sub>2</sub>(4) &#8776; 1.57. Both, <it>H</it>(<it>A</it>) and <it>H</it>(<it>B</it>), are slightly smaller than the entropies calculated from the simple binning. The joint probabilities are <it>p</it>(<it>a</it><sub>1</sub>, <it>b</it><sub>1</sub>) = <it>p</it>(<it>a</it><sub>3</sub>, <it>b</it><sub>3</sub>) = 0, <it>p</it>(<it>a</it><sub>1</sub>, <it>b</it><sub>2</sub>) = <it>p</it>(<it>a</it><sub>2</sub>, <it>b</it><sub>1</sub>) = <it>p</it>(<it>a</it><sub>2</sub>, <it>b</it><sub>3</sub>) = <it>p</it>(<it>a</it><sub>3</sub>, <it>b</it><sub>2</sub>) = 0.56/6, <it>p</it>(<it>a</it><sub>1</sub>, <it>b</it><sub>3</sub>) = <it>p</it>(<it>a</it><sub>3</sub>, <it>b</it><sub>1</sub>) = 1.24/6, <it>p</it>(<it>a</it><sub>2</sub>, <it>b</it><sub>2</sub>) = 1.28/6 (rule 2 (b)) resulting in <it>H</it>(<it>x,y</it>) = 2.7 and <it>MI</it>(<it>x,y</it>) = 0.45.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>For the calculation of probabilities <it>p</it>(<it>a<sub>i</sub></it>) according to the B-spline approach, <it>M</it><sub><it>x </it></sub>weighting coefficients are determined for each value <it>x</it><sub><it>u </it></sub>of variable <it>x</it>.</p>
                  </caption>
                  <tblbdy cols="4">
                     <r>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p><it>B</it><sub><it>i</it></sub>=1,<it>k</it>=2(<it>x</it><sub><it>u</it></sub>)</p>
                        </c>
                        <c ca="center">
                           <p><it>B</it><sub><it>i</it></sub>=2,<it>k</it>=2(<it>x</it><sub><it>u</it></sub>)</p>
                        </c>
                        <c ca="center">
                           <p><it>B</it><sub><it>i</it></sub>=3,<it>k</it>=2(<it>x</it><sub><it>u</it></sub>)</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="4">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>x</it>
                              <sub>1</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>1.0</p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>x</it>
                              <sub>2</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.6</p>
                        </c>
                        <c ca="center">
                           <p>0.4</p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>x</it>
                              <sub>3</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.2</p>
                        </c>
                        <c ca="center">
                           <p>0.8</p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>x</it>
                              <sub>4</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                        <c ca="center">
                           <p>0.8</p>
                        </c>
                        <c ca="center">
                           <p>0.2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>x</it>
                              <sub>5</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                        <c ca="center">
                           <p>0.4</p>
                        </c>
                        <c ca="center">
                           <p>0.6</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>
                              <it>x</it>
                              <sub>6</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                        <c ca="center">
                           <p>0.0</p>
                        </c>
                        <c ca="center">
                           <p>1.0</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p><it>p</it>(<it>a</it><sub><it>i</it></sub>)</p>
                        </c>
                        <c ca="center">
                           <p>1.8/6</p>
                        </c>
                        <c ca="center">
                           <p>2.4/6</p>
                        </c>
                        <c ca="center">
                           <p>1.8/6</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>In the next sections, we discuss some of the properties arising from the utilisation of B-spline functions for the estimation of mutual information and compare our approach to other commonly used estimators. We support this discussion using examples for which the underlying distributions and thereby the true mutual information is known.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Size of data</p>
            </st>
            <p>It has been discussed elsewhere <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B20">20</abbr></abbrgrp> that the estimated mutual information is systematically overestimated for a finite size of <it>N </it>data points. For the simple binning approach, the mean observed mutual information can be calculated explicitly as the deviation from the true mutual information</p>
            <p>
               <graphic file="1471-2105-5-118-i18.gif"/>
            </p>
            <p>As can be seen for an example of artificially generated equidistributed random numbers (Figure <figr fid="F3">3</figr>, left), mutual information calculated from the simple binning scales linearly with 1/<it>N</it>, with the slope depending on the number of bins <it>M </it>in accordance with Eq. (12). Figure <figr fid="F3">3</figr> shows that this scaling is preserved for the extension to B-spline functions, while the slope is significantly decreased for <it>k </it>= 3, compared to the estimation with the simple binning (<it>k </it>= 1). Mutual information calculated from KDE does not show a linear behaviour but rather an asymptotic one with a linear tail for large datasets. The values are slightly increased compared to the ones from the B-spline approach. The entropy estimator <graphic file="1471-2105-5-118-i7.gif"/> gives values comparable to the ones obtained from the B-spline approach.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Mutual information is estimated for artificially generated equidis-tributed random numbers from the simple binning (<it>k </it>= 1), the B-spline approach (<it>k </it>= 3), and the entropy estimator <graphic file="1471-2105-5-118-i7.gif"/> using <it>M </it>= 6 bins, and additionally from the kernel density estimator</p>
               </caption>
               <text>
                  <p>Mutual information is estimated for artificially generated equidis-tributed random numbers from the simple binning (<it>k </it>= 1), the B-spline approach (<it>k </it>= 3), and the entropy estimator <graphic file="1471-2105-5-118-i7.gif"/> using <it>M </it>= 6 bins, and additionally from the kernel density estimator. The average over an ensemble of 600 trials is shown as a function of the size of the dataset (left) together with the standard deviation (right).</p>
               </text>
               <graphic file="1471-2105-5-118-3"/>
            </fig>
            <p>More importantly, a similar result also holds for the standard deviation of mutual information. As shown in Figure <figr fid="F3">3</figr> (right), the standard deviation of the mutual information estimated with the simple binning (<it>k </it>= 1) scales with 1/<it>N </it>for statistically independent events <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B29">29</abbr></abbrgrp>. For the B-spline approach (<it>k </it>= 3), this scaling still holds, but the average values are decreased significantly. For the KDE approach, an asymptotic run above the values from the B-spline approach is observed, again with linear tail for large datasets. <graphic file="1471-2105-5-118-i7.gif"/> shows a linear scaling slightly below the simple binning.</p>
         </sec>
         <sec>
            <st>
               <p>The spline order</p>
            </st>
            <p>The interpretation of any results obtained from the application of mutual information to experimental data is based on testing to see if the calculated results are consistent with a previously chosen null hypothesis. By following the intuitive approach that the null hypothesis assumes the statistical independence of variables, mutual information is tested against a surrogate dataset, which is consistent with this null hypothesis. As discussed previously in more detail <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, one way of generating such a surrogate dataset is by random permutations of the original data. From the mutual information of the original dataset <it>MI</it>(<it>X,Y</it>)<sup>data</sup>, the average value obtained from surrogate data &lt;<it>MI</it>(<it>X</it><sup>surr</sup>, <it>Y</it><sup>surr</sup>) >, and its standard deviation &#963;<sup>surr</sup>, the significance <it>S </it>can be formulated as</p>
            <p>
               <graphic file="1471-2105-5-118-i19.gif"/>
            </p>
            <p>For each <it>S </it>the null hypothesis can be rejected to a certain level &#945; depending on the underlying distribution. With increasing significance the probability of false positive associations drops.</p>
            <p>In the following, we address the influence of the spline order and the number of bins on the estimation of mutual information. Based on 300 data points of an artificially-generated dataset drawn from the distribution shown in Figure <figr fid="F1">1</figr>, we calculate the mutual information for <it>M </it>= 6 bins and different spline orders <it>k </it>= 1... 5 (Figure <figr fid="F4">4</figr>, left).</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Mutual information calculated for a dataset of 300 data points drawn from the distribution shown in Figure 1 (crosses)</p>
               </caption>
               <text>
                  <p>Mutual information calculated for a dataset of 300 data points drawn from the distribution shown in Figure 1 (crosses). The number of bins was fixed to <it>M </it>= 6. The average mutual information for 300 shuffled realisations of the dataset is shown (circles) together with the standard deviation as error-bars. The largest value found within the ensemble of shuffled data is drawn as a dotted line (left). The significance was calculated from Eq. (13) (right).</p>
               </text>
               <graphic file="1471-2105-5-118-4"/>
            </fig>
            <p>From 300 shuffled realisations of this dataset, the mean and maximum mutual information are shown with the standard deviation as error-bars. For all spline orders the null hypothesis can be rejected, in accordance with the dataset shown in Figure <figr fid="F1">1</figr>. To estimate the strength of the rejection, we calculate the significance according to Eq. (13) (Figure <figr fid="F4">4</figr>, right). It can be observed that the largest change in the significance of the mutual information occurs in the transition from <it>k </it>= 1 (simple boxes) to <it>k </it>= 2 with an increase by roughly two-fold. Using more sophisticated functions (<it>k </it>&#8805; 3) does not further improve the significance. Similar findings have been reported in the context of kernel density estimators <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The major contribution leading to this increase of the significance is given by the distribution of surrogate data which becomes more narrow for <it>k </it>> 1 leading to smaller standard deviations &#963;<sup>surr</sup>.</p>
            <p>The same dataset is used to show the dependency of mutual information on the number of bins for two spline orders <it>k </it>= 1 and <it>k </it>= 3 (Figure <figr fid="F5">5</figr>). Mutual information estimated from data as well as from surrogate data shows a robust run without strong fluctuations within the range of bins shown. From this we can conclude that the choice of the number of bins does not affect the resulting mutual information notably as long as it is chosen to be within a reasonable range.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Based on the distribution of Figure 1, the mutual information for 300 data points and two spline orders <it>k </it>= 1 and <it>k </it>= 3 is shown as a function of the number of bins <it>M </it>(crosses) together with mean (circles) and standard deviations (error-bars) of 300 surrogates</p>
               </caption>
               <text>
                  <p>Based on the distribution of Figure 1, the mutual information for 300 data points and two spline orders <it>k </it>= 1 and <it>k </it>= 3 is shown as a function of the number of bins <it>M </it>(crosses) together with mean (circles) and standard deviations (error-bars) of 300 surrogates. The dotted lines indicate the largest mutual information found within the ensemble of surrogate data.</p>
               </text>
               <graphic file="1471-2105-5-118-5"/>
            </fig>
            <p>Again, the significance is calculated (Figure <figr fid="F6">6</figr>) and compared to the significances obtained from the KDE approach and the <graphic file="1471-2105-5-118-i7.gif"/> estimator. It can be observed that the significance of the mutual information calculated with B-spline functions increased roughly by two-fold compared to the simple binning. The significance obtained from KDE is not depending on <it>M </it>and was determined to be similar to the significance estimated from the B-spline approach. The numerically expensive integration of KDE, however, limits the size of utilisable datasets. The KDE run time requirements were <graphic file="1471-2105-5-118-i20.gif"/>(10<sup>4</sup>) times higher than the ones from the B-spline approach. Strategies to simplify the integration step were proposed <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> but have to be used with caution since they assume particular properties of the distribution of experimental data that are in general not fulfilled. The recently introduced entropy estimator <graphic file="1471-2105-5-118-i7.gif"/> produces intermediate significances between the ones from the binning and the B-spline approach for higher bin numbers. For low bin numbers, the significances are relatively poor.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>The significance, <it>S</it>, as a function of the number of bins, <it>M</it>, for the two examples of Figure 5, and for the entropy estimator <graphic file="1471-2105-5-118-i7.gif"/></p>
               </caption>
               <text>
                  <p>The significance, <it>S</it>, as a function of the number of bins, <it>M</it>, for the two examples of Figure 5, and for the entropy estimator <graphic file="1471-2105-5-118-i7.gif"/>. For kernel density estimators (KDE), the significance, which is not depending on <it>M</it>, is calculated to <it>S </it>= 92.</p>
               </text>
               <graphic file="1471-2105-5-118-6"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Application on data</p>
            </st>
            <p>We now turn to the analysis of experimentally measured gene expression data. As shown previously, the application of mutual information to large-scale expression data reveals biologically-relevant clusters of genes <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B30">30</abbr></abbrgrp>. In this section, we will not repeat these analyses, but determine if the correlations detected using mutual information are missed using the established linear measures.</p>
            <p>Among the most frequently used measures of similarity for clustering co-expressed genes are the Euclidean distance and the Pearson correlation coefficient <it>R </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. If correlations are well described by the Pearson correlation and the distribution of data is approximately Gaussian like, the relationship between the mutual information and the Pearson correlation given by <abbrgrp><abbr bid="B32">32</abbr></abbrgrp></p>
            <p>
               <graphic file="1471-2105-5-118-i21.gif"/>
            </p>
            <p>is expected to be fulfilled. Therefore, we calculated both, the mutual information and the Pearson correlation, for two large-scale gene expression datasets (Figure <figr fid="F7">7</figr>). For each pair of genes <it>X </it>and <it>Y </it>we plot the tuple (<it>MI</it>(<it>X,Y</it>), <it>R</it>(<it>X,Y</it>)). In order to address significance, we additionally calculate all tuples from shuffled data.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>The Pearson correlation coefficient and the mutual information for all pairwise comparisons of genes for two large-scale gene expression datasets are shown (black points) overlayed by the same measures obtained from shuffled data (blue points)</p>
               </caption>
               <text>
                  <p>The Pearson correlation coefficient and the mutual information for all pairwise comparisons of genes for two large-scale gene expression datasets are shown (black points) overlayed by the same measures obtained from shuffled data (blue points). The expected mutual information calculated from Eq. (14) is shown as read curve. For the first dataset (left) genes containing undefined values were omitted resulting in 5345 genes measured under 300 experimental conditions [31]. For the second dataset (right) containing 22608 genes measured under 102 experimental conditions [33], a representative fraction is shown.</p>
               </text>
               <graphic file="1471-2105-5-118-7"/>
            </fig>
            <p>The first dataset contains cDNA measurements for <it>S. cerevisiae </it>for up to <it>E</it><sub>1 </sub>= 300 experiments <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. To avoid numerical effects arising from different numbers of defined expression values (missing data points) for each gene, we exclusively utilised genes that are fully defined for all experimental conditions resulting in <it>G</it><sub>1 </sub>= 5345 genes. Analysis on this dataset using mutual information has been done before <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B32">32</abbr></abbrgrp> on rank-ordered data. The rank-ordering lead to homogeneously distributed data and thereby enabled the application of a simplified algorithm for the numerical estimation from kernel density estimators. The utilisation of our B-spline approach allows us to extend this analysis to non rank-ordered data thereby keeping the original distribution of experimental data. In contrast to the previous studies we find for non rank-ordered data that the theoretical prediction of Eq. 14 is no longer a lower bound for the comparison. Many tuples with high Pearson correlation but low mutual information can be detected arising from outlying expression values (Figure <figr fid="F8">8A</figr>). However, pairs of genes with high mutual information and low Pearson correlation, thus indicating a non-linear correlation, are not observed. The only remarkable tuple (marked with an arrow in Figure <figr fid="F7">7</figr> and shown in Figure <figr fid="F8">8B</figr>) also arises from outlying values.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Examples of gene-gene plots for genes <it>X </it>and <it>Y </it>are shown for characteristic tuples (<it>MI</it>(<it>X,Y</it>), <it>R</it>(<it>X,Y</it>)) detected in Figure (7)</p>
               </caption>
               <text>
                  <p>Examples of gene-gene plots for genes <it>X </it>and <it>Y </it>are shown for characteristic tuples (<it>MI</it>(<it>X,Y</it>), <it>R</it>(<it>X,Y</it>)) detected in Figure (7). For the first gene expression dataset under consideration [31], no non-linear correlations are detected. Moreover, tuples with high Pearson correlation and low mutual information, examples A and B, resulting from outlying values are detected. For the second dataset [33], however, tuples with low Pearson correlation and high mutual information are observed, see examples C and D. Such non-linear correlations are missed by solely using linear correlation measures.</p>
               </text>
               <graphic file="1471-2105-5-118-8"/>
            </fig>
            <p>The second dataset contains cDNA measurements for <it>E</it><sub>2 </sub>= 102 experiments on G<sub>2 </sub>= 22608 genes derived from 20 different human tissues <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. In contrast to the first dataset, tuples with low Pearson correlation but high mutual information are indeed detected. For two exemplary chosen tuples (Figure <figr fid="F8">8C</figr> and <figr fid="F8">8D</figr>), clusters of experimental conditions can be clearly detected by eye. Such type of correlations are missed by analyses based exclusively on linear measures, such as the the analysis done in the original publication of this dataset.</p>
            <p>For both datasets, tuples calculated from shuffled data (Figure <figr fid="F7">7</figr>, blue data points) result in small values for both similarity measures. Thereby, they indicate a high significance of the original associations. Peaks with high Pearson correlation in the first dataset arise from gene-gene associations with outlying values. Significance values for the exemplarily chosen pairs of genes of the second dataset (Figure <figr fid="F8">8C</figr>, and <figr fid="F8">8D</figr>) were explicitly calculated (Figure <figr fid="F9">9</figr>). They show high significance values for the two examples of observed non-linear correlations on the basis of the mutual information. Compared to this, the significances calculated from the Pearson correlation are poor. In summary, our analysis confirms for the first dataset that the Pearson correlation does not miss any non-linear correlations. As a side effect we are able to detect gene-gene pairs containing outlying values. For the second dataset, however, a substantial amount of non-linear correlations was detected. Gene-gene pairs exemplarily chosen from this fraction show a clustering of data points (experiments) with a high significance. Even though such patterns can be easily found by eye, computational methods need to be applied for the inspection of several hundred million comparisons.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Significance values for the two gene-gene comparisons shown in Figure 8, C and D (top and bottom, respectively) are calculated from 300 shuffled realisations based on the Pearson correlation coefficient (left) and the mutual information (right) as distance measures</p>
               </caption>
               <text>
                  <p>Significance values for the two gene-gene comparisons shown in Figure 8, C and D (top and bottom, respectively) are calculated from 300 shuffled realisations based on the Pearson correlation coefficient (left) and the mutual information (right) as distance measures.</p>
               </text>
               <graphic file="1471-2105-5-118-9"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion and conclusion</p>
         </st>
         <p>After a brief introduction into the information theoretic concept of mutual information, we proposed a method for its estimation from continuous data. Within our approach, we extend the bins of the classical algorithm to polynomial B-spline functions: Data points are no longer assigned to exactly one bin but to several bins simultaneously, with weights given by the B-spline functions. By definition, the weighting coefficients for each data point automatically sum up to unity. Though our algorithm is reminiscent of kernel density estimators <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, it keeps the basic idea to associate data points to discrete bins. In this way, we are able to avoid time-consuming numerical integration steps usually intrinsic to estimates of mutual information using kernel density estimators <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         <p>To show that our approach improves the simple binning method and to compare it to KDE and the recently reported estimator <graphic file="1471-2105-5-118-i7.gif"/>, we provided a systematic comparison between all these algorithms for artificially generated datasets, drawn from a known distribution. We found that mutual information, as well as its standard deviation, scales linearly with the inverse size of a dataset for the standard binning method, for the B-spline approach, and for <graphic file="1471-2105-5-118-i7.gif"/>. For the KDE approach we find an asymptotic behaviour with a linear tail for large datasets. Moreover, the discrimination of correlations from the hypothesis of statistical independence is significantly improved by extending the standard binning method to B-spline functions, as shown by a two-fold increase of the significance. Compared to KDE, the B-spline functions produce similar significances. However, due to the numerical expenses of the KDE, an application of this algorithm is limited to datasets of mod-erate size. The application of <graphic file="1471-2105-5-118-i7.gif"/> leads to significances in-between the standard binning and the B-spline approach for reasonable bin numbers. Linear correlation measures are among the most applied measures of similarity in the literature. Often, they are used on an ad-hoc basis and it is unclear whether a considerable number of non-linear correlations are missed. Here, we asked the question whether previous analyses, based on linear correlations, sufficiently described the correlations within gene expression datasets or whether mutual information detects additional correlations that are not detected by linear measures, such as the Pearson correlation. For data that is well described by the Pearson correlation, we can give the relation of the Pearson correlation to the mutual information explicitly <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Both measures were then applied to publicly available large-scale gene expression datasets <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B33">33</abbr></abbrgrp>. We aimed to verify whether non-linear correlations shown as deviations from this relation can be detected.</p>
         <p>Our findings show that the first dataset is fairly well described by the given relation of the Pearson correlation to the mutual information. No data points with high mutual information and low Pearson correlation are detected. Comparisons of genes containing outlying values, however, result in deviations with low mutual information and high Pearson correlation. From this, it follows that previous analyses on this dataset, based on Pearson correlation, did not miss any non-linear correlations. This presents an important finding since it is by all means supposable that the regulations inherent in the genetic network under consideration might show more complex behaviour than the observed linear ones. Even for one of the largest expression datasets at hand, insufficient data might complicate the detection of such complex patterns of regulation. Alternatively, the biological mechanisms which underlay the regulatory networks might not lead to non-linear correlations. It also has to be considered that the experimental methods applied for the generation of this dataset may make non-linear correlations difficult to detect. The second dataset, in contrast, reveals highly significant tuples with high mutual information and low Pearson correlation. Detailed gene-gene plots of such tuples show that the expression values of the contributing genes fall into groups of experimental conditions. Without attempting to draw conclusions about the biological context of such clusters here, they might reflect interesting situations worth to be analysed in detail.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>Most of the manuscript text was written by CD and edited by all authors. CD carried out the calculations and produced the figures. RS strongly contributed to the theoretical background of entropy and mutual information.</p>
         <p>The implementation of the C++ program was carried out by SK. JS and SK supervised this work. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Joachim Kopka and Janko Weise for stimulating discussions and Megan McKenzie for editing the manuscript (all of the MPI-MP). RS acknowledges financial support by the HSP-N grant of the the state of Brandenburg.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray</p>
            </title>
            <aug>
               <au>
                  <snm>Schena</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shalon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>467</fpage>
            <lpage>470</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7569999</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Serial Analysis of Gene Expression</p>
            </title>
            <aug>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>484</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7570003</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Cluster analysis and display of genome-wide expression patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Spellman</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>PO</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>14863</fpage>
            <lpage>14868</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">24541</pubid>
                  <pubid idtype="pmpid" link="fulltext">9843981</pubid>
                  <pubid idtype="doi">10.1073/pnas.95.25.14863</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Information processing in cells and tissues</p>
            </title>
            <aug>
               <au>
                  <snm>D'haeseleer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Fuhrman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Plenum Publishing</source>
            <pubdate>1997</pubdate>
            <fpage>203</fpage>
            <lpage>212</lpage>
            <url>http://www.cs.unm.edu/~patrik/networks/IPCAT/ipcat.html</url>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Genetic network inference: from co-expression clustering to reverse engineering</p>
            </title>
            <aug>
               <au>
                  <snm>D'haeseleer</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>707</fpage>
            <lpage>726</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.8.707</pubid>
                  <pubid idtype="pmpid" link="fulltext">11099257</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Cluster analysis and data visualization of large-scale gene expression data</p>
            </title>
            <aug>
               <au>
                  <snm>Michaels</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Carr</snm>
                  <fnm>DB</fnm>
               </au>
               <au>
                  <snm>Askenazi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fuhrmann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>1998</pubdate>
            <fpage>42</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9697170</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements</p>
            </title>
            <aug>
               <au>
                  <snm>Butte</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Kohane</snm>
                  <fnm>IS</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2000</pubdate>
            <volume>5</volume>
            <fpage>427</fpage>
            <lpage>439</lpage>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Large-scale clustering of cDNA-fingerprinting data</p>
            </title>
            <aug>
               <au>
                  <snm>Herwig</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Poustka</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bull</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lehrach</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>O'brien</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>1093</fpage>
            <lpage>1105</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.9.11.1093</pubid>
                  <pubid idtype="pmpid" link="fulltext">10568749</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: An information theoretic analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Korber</snm>
                  <fnm>BT</fnm>
               </au>
               <au>
                  <snm>Farber</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Wolpert</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Lapedes</snm>
                  <fnm>AS</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1993</pubdate>
            <volume>90</volume>
            <fpage>7176</fpage>
            <lpage>7180</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">47099</pubid>
                  <pubid idtype="pmpid" link="fulltext">8346232</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Display the information contents of structural RNA alignments: the structure logos</p>
            </title>
            <aug>
               <au>
                  <snm>Gorodkin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Heyer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Wen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>583</fpage>
            <lpage>586</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9475985</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Reveal, a general reverse engineering algorithm for inference of genetic network architectures</p>
            </title>
            <aug>
               <au>
                  <snm>Liang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fuhrman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>1998</pubdate>
            <fpage>18</fpage>
            <lpage>29</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9697168</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Independent component analysis: Priciples and Practice</p>
            </title>
            <aug>
               <au>
                  <snm>Roberts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Everson</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <publisher>Cambridge: Cambridge University Press</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Independent component analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Hyv&#228;rinen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Karhunne</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Oja</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <publisher>New York: Wiley</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Independent coordinates for strange attractors from mutual information</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Swinney</snm>
                  <fnm>HL</fnm>
               </au>
            </aug>
            <source>Phys Rev A</source>
            <pubdate>1986</pubdate>
            <volume>33</volume>
            <fpage>2318</fpage>
            <lpage>2321</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1103/PhysRevA.33.1134</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Optimization of mutual information for multiresolution image registration</p>
            </title>
            <aug>
               <au>
                  <snm>Th&#233;nevaz</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Unser</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>IEEE Trans Image Processing</source>
            <pubdate>2000</pubdate>
            <volume>9</volume>
            <fpage>2083</fpage>
            <lpage>2099</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/83.887976</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Using mutual information to design feature combinations</p>
            </title>
            <aug>
               <au>
                  <snm>Ellis</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Bilmes</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>In Proceedings of the International Conference on Spoken Language Processing: Beijing</source>
            <url>http://www.icsi.berkeley.edu/ftp/global/pub/speech/papers/icslp00-cmi.pdf</url>
            <note>16&#8211;20 October 2000</note>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A mathematical theory of communication</p>
            </title>
            <aug>
               <au>
                  <snm>Shannon</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>The Bell System Technical Journal</source>
            <pubdate>1948</pubdate>
            <volume>27</volume>
            <fpage>623</fpage>
            <lpage>656</lpage>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Estimation of mutual information using kernel density estimators</p>
            </title>
            <aug>
               <au>
                  <snm>Moon</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Rajagopalan</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lall</snm>
                  <fnm>U</fnm>
               </au>
            </aug>
            <source>Phys Rev E</source>
            <pubdate>1995</pubdate>
            <volume>52</volume>
            <fpage>2318</fpage>
            <lpage>2321</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1103/PhysRevE.52.2318</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Density estimation for statistics and data analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Silverman</snm>
                  <fnm>BW</fnm>
               </au>
            </aug>
            <publisher>London: Chapman and Hall</publisher>
            <pubdate>1986</pubdate>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The mutual information: detecting end evaluating dependencies between variables</p>
            </title>
            <aug>
               <au>
                  <snm>Steuer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kurths</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Daub</snm>
                  <fnm>CO</fnm>
               </au>
               <au>
                  <snm>Weise</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Selbig</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <issue>Suppl.2</issue>
            <fpage>S231</fpage>
            <lpage>S240</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Estimation of Entropy and Mutual Information</p>
            </title>
            <aug>
               <au>
                  <snm>Paninski</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Neural Computation</source>
            <pubdate>2003</pubdate>
            <volume>15</volume>
            <fpage>1191</fpage>
            <lpage>1253</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1162/089976603321780272</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>A practical guide to splines</p>
            </title>
            <aug>
               <au>
                  <snm>DeBoor</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <publisher>New York: Springer</publisher>
            <pubdate>1978</pubdate>
         </bibl>
         <bibl id="B23">
            <title>
               <p>B-spline signal processing: Part 1 &#8211; Theory</p>
            </title>
            <aug>
               <au>
                  <snm>Unser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aldroubi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eden</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>IEEE Trans Signal Precessing</source>
            <pubdate>1993</pubdate>
            <volume>41</volume>
            <fpage>821</fpage>
            <lpage>832</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/78.193220</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>B-spline signal processing: Part 2 &#8211; Efficient design and applications</p>
            </title>
            <aug>
               <au>
                  <snm>Unser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aldroubi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eden</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>IEEE Trans Signal Precessing</source>
            <pubdate>1993</pubdate>
            <volume>41</volume>
            <fpage>834</fpage>
            <lpage>848</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/78.193221</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Finite sample effects in sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Herzel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schmidt</snm>
                  <fnm>AO</fnm>
               </au>
               <au>
                  <snm>Ebeling</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Chaos, Solitons &amp; Fractals</source>
            <pubdate>1994</pubdate>
            <volume>4</volume>
            <fpage>97</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0960-0779(94)90020-5</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Measuring correlations in symbol sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Herzel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Physica A</source>
            <pubdate>1995</pubdate>
            <volume>216</volume>
            <fpage>518</fpage>
            <lpage>542</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/0378-4371(95)00104-F</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Estimating entropies from finite samples</p>
            </title>
            <aug>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>In Dynamik, Evolution, Strukturen</source>
            <publisher>Berlin: Dr. K&#246;ster</publisher>
            <editor>Freund JA</editor>
            <pubdate>1996</pubdate>
            <fpage>181</fpage>
            <lpage>190</lpage>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Estimating the error on measured entropy and mutual information</p>
            </title>
            <aug>
               <au>
                  <snm>Roulston</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>Physica D</source>
            <pubdate>1999</pubdate>
            <volume>125</volume>
            <fpage>285</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0167-2789(98)00269-3</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Correlations in DNA sequences: The role of protein coding segments</p>
            </title>
            <aug>
               <au>
                  <snm>Herzel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Grosse</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Phy Rev E</source>
            <pubdate>1997</pubdate>
            <volume>55</volume>
            <fpage>800</fpage>
            <lpage>810</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1103/PhysRevE.55.800</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Mutual Information Analysis as a Tool to Assess the Role of Aneuploidy in the Generation of Cancer-Associated Differential Gene Expression Patterns</p>
            </title>
            <aug>
               <au>
                  <snm>Klus</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schick</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wahde</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Szallasi</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Pac Symp Biocomput</source>
            <pubdate>2001</pubdate>
            <fpage>42</fpage>
            <lpage>51</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11262960</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Functional Discovery via a Compendium of Expression Profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Marton</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Roberts</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Armour</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Bennett</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Coffey</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Kidd</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Slade</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lum</snm>
                  <fnm>PY</fnm>
               </au>
               <au>
                  <snm>Stepaniants</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Gachotte</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chakraburtty</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bard</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Friend</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>102</volume>
            <fpage>109</fpage>
            <lpage>126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)00015-5</pubid>
                  <pubid idtype="pmpid">10929718</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Measuring distances between variables by mutual information</p>
            </title>
            <aug>
               <au>
                  <snm>Steuer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Daub</snm>
                  <fnm>CO</fnm>
               </au>
               <au>
                  <snm>Selbig</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kurths</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>In Proceedings of the 27th Annual Conference of the Gesellschaft f&#252;r Klassifikation: Cottbus</source>
            <inpress/>
            <note>12&#8211;14 March 2003</note>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Microarray standard data set and figures of merit for comparing data processing methods and experiment design</p>
            </title>
            <aug>
               <au>
                  <snm>He</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Dai</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Cavet</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>SW</fnm>
               </au>
               <au>
                  <snm>Stepaniants</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Duenwald</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kleinhanz</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>RB</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>956</fpage>
            <lpage>965</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg126</pubid>
                  <pubid idtype="pmpid" link="fulltext">12761058</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
