<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-5-201</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>A novel Mixture Model Method for identification of differentially expressed genes from DNA microarray data</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Najarian</snm>
               <fnm>Kayvan</fnm>
               <insr iid="I1"/>
               <email>knajaria@uncc.edu</email>
            </au>
            <au id="A2">
               <snm>Zaheri</snm>
               <fnm>Maryam</fnm>
               <insr iid="I1"/>
               <email>mzaheri@uncc.edu</email>
            </au>
            <au id="A3">
               <snm>A Rad</snm>
               <fnm>Ali</fnm>
               <insr iid="I2"/>
               <email>ali@itsi.ws</email>
            </au>
            <au id="A4">
               <snm>Najarian</snm>
               <fnm>Siamak</fnm>
               <insr iid="I3"/>
               <email>snajaria@me.concordia.ca</email>
            </au>
            <au id="A5">
               <snm>Dargahi</snm>
               <fnm>Javad</fnm>
               <insr iid="I3"/>
               <email>jdargahi@alcor.concordia.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Computer Science Department, University of North Carolina Charlotte, University City Blvd, Charlotte, NC, USA</p>
            </ins>
            <ins id="I2">
               <p>Computer Engineering and IT Department, Amirkabir University of Technology, Tehran, Iran</p>
            </ins>
            <ins id="I3">
               <p>Mechanical and Industrial Engineering Department, Concordia University, CONCAVE Research Centre, CR-200, Concordia University, Quebec, Canada</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>201</fpage>
         <url>http://www.biomedcentral.com/1471-2105/5/201</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15603585</pubid>
               <pubid idtype="doi">10.1186/1471-2105-5-201</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>24</day>
               <month>3</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>16</day>
               <month>12</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>12</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Najarian et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. Mixture model method (MMM hereafter) is a nonparametric statistical method often used for microarray processing applications, but is known to over-fit the data if the number of replicates is small. In addition, the results of the MMM may not be repeatable when dealing with a small number of replicates. In this paper, we propose a new version of MMM to ensure the repeatability of the results in different runs, and reduce the sensitivity of the results on the parameters.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The proposed technique is applied to the two different data sets: Leukaemia data set and a data set that examines the effects of low phosphate diet on regular and <it>Hyp </it>mice. In each study, the proposed algorithm successfully selects genes closely related to the disease state that are verified by biological information.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The results indicate 100% repeatability in all runs, and exhibit very little sensitivity on the choice of parameters. In addition, the evaluation of the applied method on the Leukaemia data set shows 12% improvement compared to the MMM in detecting the biologically-identified 50 expressed genes by Thomas et al. The results witness to the successful performance of the proposed algorithm in quantitative pathogenesis of diseases and comparative evaluation of treatment methods.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Recently, microarray technology has provided the means for simultaneous screening and analysis of thousands of genes. Although an enormous volume of data is being produced by microarray technologies, the full potential of such technologies cannot be accessed without the ability to sift through the noisy signals to obtain useful information. The large data sets produced by microarray technology have resulted in the need for reliable, accurate, and robust methods for microarray data analysis. A major challenge is to detect genes with differentially expression profile across two experimental conditions. In many studies, the two sample sets are drawn from two types of tissues, tumours or cell lines, or at two time points during the course of a biological processes.</p>
         <p>The computationally simple methods used for such analysis, including the methods of identifying genes with fold changes (such as the popular Log-ratio graphs) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, are known to be unreliable due to the fact that in such methods the statistical variability of the data is not properly addressed. While various parametric methods and tests such as two-sample t-test <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> have been applied for microarray data analysis, strong parametric assumptions made in these methods as well as their strong dependency on large sample sets restrict the reliability of such techniques in microarray problems. The nonparametric statistical methods, including the Empirical Bayes (EB) method <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, the significance analysis specialized for microarray data (such as SAM <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>) and the mixture model method (MMM) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> have been specialized and applied for microarray data analysis. It is claimed and argued that the new extensions of the MMM are among the best methods producing biologically-meaningful results <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. In this paper, without ignoring the potential applicability of non-parametric methods in microarray processing applications, due to the claims made in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, we have restricted the comparison of our methods only to the MMM based methods.</p>
         <p>The major disadvantages of the above-mentioned methods, especially the MMM, include the lack of repeatability of the results under different runs of the algorithm, and the sensitivity of the algorithm on parameter initialization. A reliable microarray analysis method must be reproducible and applicable to different data sets under different experimental conditions. More specifically, an accurate microarray processing method is expected to pinpoint, with a relatively high degree of accuracy and robustness, genes with elevated expression levels that are related to the experimental condition in all runs. The main objective of this paper is to design and test an extension of the MMM whose results are reproducible, more biologically meaningful, and significantly less sensitive to the models' parameters.</p>
         <p>The paper is organized as follows. In Algorithms section, a review of the MMM and its recent extensions, Mod2MMM, together with the detailed description of the proposed method are given. In Results and Discussion section, the K5M algorithm is first applied to the well-studied Leukaemia data set <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> that is often treated as a benchmark problem to compare different algorithms with each other. Once the desirable performance of the proposed algorithm is verified against the Leukaemia data set, the algorithm is applied to a new data set [<abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp> and <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>] that deals with the pathogenesis of Hypophosphatemia, which is a common X-linked metabolic bone disorder in human and mouse. Finally, the Conclusion section is in the end.</p>
      </sec>
      <sec>
         <st>
            <p>Algorithms</p>
         </st>
         <sec>
            <st>
               <p>MMM &amp; its recent extensions</p>
            </st>
            <p>We start this section with a brief review of the existing MMM based techniques. Consider <it>Y</it><sub><it>ij </it></sub>as the expression level of gene in array <it>i </it>(<it>i </it>= 1, ..., <it>n</it>; <it>j </it>= 1, ..., <it>j</it><sub>1</sub>, <it>j</it><sub>1 </sub>+ 1, ..., <it>j</it><sub>1 </sub>+ <it>j</it><sub>2</sub>), where the first <it>j</it><sub>1 </sub>and last <it>j</it><sub>2 </sub>arrays are obtained under two conditions. A general statistical model for the resulting data is:</p>
            <p><it>Y</it><sub><it>ij </it></sub>= <it>a</it><sub><it>i </it></sub>+ <it>b</it><sub><it>i</it></sub><it>x</it><sub><it>j </it></sub>+ <it>&#949;</it><sub><it>ij </it></sub>&#160;&#160;&#160; (1)</p>
            <p>Where <it>x</it><sub><it>j </it></sub>= 1 for 1 &#8804; <it>j </it>&#8804; <it>j</it><sub>1 </sub>and <it>x</it><sub><it>j </it></sub>= 0 for <it>j</it><sub>1 </sub>+ 1 &#8804; <it>j </it>&#8804; <it>j</it><sub>1 </sub>+ <it>j</it><sub>2</sub>. In addition, <it>&#949;</it><sub><it>ij </it></sub>is a random error with mean 0. From the above formulation, it can be seen that <it>a</it><sub><it>i </it></sub>+ <it>b</it><sub><it>i </it></sub>is the mean expression level of the first condition, and <it>a</it><sub><it>i </it></sub>is the mean expression level of gene <it>i </it>in the second condition. The method requires that both <it>j</it><sub>1 </sub>and <it>j</it><sub>2</sub>, the number of data sets for each experiment condition, be even.</p>
            <p>The t-test statistic type scores (2) and (3) are calculated on the pre-processed data. Here, <it>a</it><sub><it>i </it></sub>is a random permutation of a column vector that contains <it>j</it><sub>1</sub>/2 1's and <it>j</it><sub>1</sub>/2 -1's and <it>b</it><sub><it>i </it></sub>contains <it>j</it><sub>2</sub>/2 1's and <it>j</it><sub>2</sub>/2 -1's.</p>
            <p>
               <graphic file="1471-2105-5-201-i1.gif"/>
            </p>
            <p>
               <graphic file="1471-2105-5-201-i2.gif"/>
            </p>
            <p>Since the data are not assumed to be normally distributed, the distribution functions <it>f</it><sub>0 </sub>and <it>f </it>are estimated as in (4) and (5), respectively. The null distributions, <it>f</it><sub>0 </sub>and <it>f</it>, are estimated directly in a nonparametric model for gene expression data.</p>
            <p>
               <graphic file="1471-2105-5-201-i3.gif"/>
            </p>
            <p>
               <graphic file="1471-2105-5-201-i4.gif"/>
            </p>
            <p>Where <it>&#966;</it>(z; <it>&#956;</it><sub><it>i</it></sub>, <it>V</it><sub><it>i</it></sub>) symbolizes the normal density function with mean <it>&#956;</it><sub><it>i</it></sub>, variance <it>V</it><sub><it>i</it></sub>, and the mixing proportions <it>&#960;</it><sub><it>i </it></sub>define the linear combination of the normal basis function. We use &#934;<sub><it>g</it>0 </sub>to represent all unknown parameters {(<it>&#960;</it><sub><it>i</it></sub>, <it>&#956;</it><sub><it>i</it></sub>, <it>V</it><sub><it>i</it></sub>): <it>i </it>= 1, ..., <it>g</it><sub>0 </sub>} in a <it>g</it><sub>0</sub>-component mixture model. The number of normal basis functions, i.e. <it>g</it><sub>0 </sub>can be estimated by the EM algorithm, which maximizes the log-likelihood function of (6) to obtain the maximum likelihood estimation of <graphic file="1471-2105-5-201-i5.gif"/>.</p>
            <p>
               <graphic file="1471-2105-5-201-i6.gif"/>
            </p>
            <p>Within <it>K </it>iterations, the EM algorithm is expected to find the local maxima for all unknown parameters. It is recommended to run the EM algorithm several times with various random starting parameters and choose the final estimate as the one resulting the largest log-likelihood <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. As mentioned above, using random starting points causes the result of the MMM instable and avoids reproducibility of the results. More specifically, in each run the MMM algorithm may give different number of expressed genes, which is not desirable in biological studies. This issue will be addressed in our proposed method.</p>
            <p>After finding the optimized <graphic file="1471-2105-5-201-i5.gif"/> for different <it>g</it><sub>0 </sub>'s, the algorithm selects the sub-optimal <it>g</it><sub>0 </sub>corresponding to the first local minimum of BIC or AIC <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
            <p>
               <graphic file="1471-2105-5-201-i7.gif"/>
            </p>
            <p>
               <graphic file="1471-2105-5-201-i8.gif"/>
            </p>
            <p>where <it>v</it><sub><it>g</it>0 </sub>is the number of independent parameters in &#934;<sub><it>g</it>0</sub>. Then, the algorithm uses the resulting <it>g</it><sub>0 </sub>as the number of normal functions to fit <it>f</it><sub>0</sub>. The same procedure is performed to estimate the sub-optimal number of normal functions to estimate <it>f</it>. As mentioned above, with the fixed number of normal functions, the parameters of functions <it>f </it>and <it>f</it><sub>0 </sub>are iteratively updated for a number of iterations. When the iterations are terminated, the likelihood ratio is estimated based on the final estimations of <it>f</it><sub>0 </sub>and <it>f</it>:</p>
            <p><it>LR</it>(<it>Z</it>) = <it>f</it><sub>0</sub>(<it>Z</it>) / <it>f</it>(<it>Z</it>) &#160;&#160;&#160; (9)</p>
            <p>A bisection method <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> with a Bonferroni adjustment is used to determine the cut-off points <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> for decision-making. This means that for a threshold value <it>s</it>, if <it>LR</it>(<it>Z</it>) &lt;<it>s</it>, then the gene is identified to have significantly altered expression in two experiments. It is possible to determine the rejection region numerically, i.e. for any false positive rate <it>&#945;</it>, the threshold value <it>s </it>= <it>s</it>(<it>&#945;</it>) can be calculated from the following integral:</p>
            <p>
               <graphic file="1471-2105-5-201-i9.gif"/>
            </p>
            <p>In literature of microarray processing, <it>&#945; </it>= 0.01 is often used as the genome wide significant level, so the gene-specific significance level is: <it>&#945;</it>* = <it>&#945;</it>/(2<it>n</it>) Recently a new modification of the MMM algorithm, Mod2MMM hereafter, was introduced <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. This method points out a problem in constructing the test and null statistics and indicates that the true distribution of <it>z </it>may be different from the null distribution of <it>Z</it>, which can lead to invalid inference. The modified algorithm starts with the assumption that <it>j</it><sub>1 </sub>&#8805; 2 <it>j</it><sub>2 </sub><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and constructs the new <it>z </it>and <it>Z </it>as you can follow in appendix1.</p>
            <p>For the cases where <it>j</it><sub>1 </sub>&#8805; <it>j</it><sub>2 </sub>but <it>j</it><sub>1 </sub>&lt; 2 <it>j</it><sub>2</sub>, <it>j</it><sub>1 </sub>observations drawn under condition one are split into two equally-sized parts to calculate <graphic file="1471-2105-5-201-i10.gif"/>, <it>v</it><sub><it>i</it>(1<it>a</it>) </sub>and <graphic file="1471-2105-5-201-i11.gif"/>, <it>v</it><sub><it>i</it>(1<it>b</it>) </sub>respectively. To calculate <graphic file="1471-2105-5-201-i12.gif"/> and <it>v</it><sub><it>i</it>(2) </sub>about <it>j</it><sub>1</sub>/2 observations are drawn under condition two. While this modification can address the differences in the distributions of <it>f </it>and <it>f</it><sub>0</sub>, the stability of the parameter estimation step still remains a major problem.</p>
            <p>The main difference between the conventional MMM and its recent extensions are that the conventional MMM disregards the fact that the true distribution of <it>z </it>(the statistical variable under study) may be different from the null distribution of the statistics <it>Z </it>(as defined below). This assumption can potentially lead to invalid inference. A modified version of the MMM (Mod2MMM hereafter), introduced in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, assumes that the denominator and the numerator of one of t-statistic-type score <it><sub>zi </sub></it>may not be independent. This method addresses the issue by constructing new <it>z</it><sub><it>i </it></sub>and <it>Z</it><sub><it>i </it></sub>variables as will be discussed later.</p>
            <p>A concern over all existing MMM based methods (including Mod2MMM) that greatly affects the results is associated with the way mixed distributions are estimated. In the MMM, Expectation Maximization (EM) algorithm <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> is often used to optimize the parameters of fitted mixture distribution functions of two t-statistic-type scores related with genes expression level. Starting the EM algorithm with random values as the parameters of the normal basis functions to estimate distributions makes the results depend highly on the exact initialization, and always makes variations in the results. On the other hand, if all parameters of the normal functions in the mixture model distributions are set without iterative optimization, the set values may never result to an accurate model of the data set in hand. We propose a modified version of MMM to address this problem. Our modified MMM (K5M hereafter) combines K-mean clustering and the EM estimation to not only optimize most of the parameters with the EM iteratively but also apply K-means to optimize other sensitive parameters to ensure complete reproducibility of the algorithm. The experimental results indicate superior robustness of the proposed algorithm compared to the conventional MMM and other recently introduced extensions of the MMM <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Proposed method (K5M)</p>
            </st>
            <p>In order to address the stability and reproducibility of the MMM, we propose a new modified approach for the MMM that estimates the distribution function of <it>z </it>by using mixture of normal distributions in a stable and reliable way. The following observations made in the experimental study of the MMM for gene expression analysis were the main motivations for the proposed changes to the MMM:</p>
            <p>1 The observed variations in the parameter estimation process in some versions of the MMM can be attributed to the algorithm's attempt to iteratively update the means and variances of the normal distributions using often noisy data. In experimental studies, often the direct observation of the data reveals specific points where centers (means) can be positioned and the scattering patterns that can give reliable estimates on the variance of each cluster. However, the iterative updating of model parameters with noisy data and based on some random starting points often misses the true optimal points and even creates variations and fluctuations in parameter estimation in many runs.</p>
            <p>2 Even when variations do not occur, two runs of the algorithm can result to significantly different estimations of <it>f </it>and <it>f</it><sub>0</sub>. This in turns results to lists of differentially expressed genes in different runs. More specifically, a set of two typical runs of the algorithm on the same data set can result to two lists that are very different both in number of the genes as well as the exact genes picked up by the algorithm. In our study of the conventional MMM and Mod2MMM, two runs with the same algorithm (on the same data) resulted to lists whose size vary between 50 and 200.</p>
            <p>3 The literature of other areas of research utilizing normal basis function for estimation including neural networks indicates that in order to have more robustness in different runs and have reproducible results, the means and variances of the basis functions must be estimated and fixed during the iteration on the coefficients <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. This is due to the fact that updating means and variances makes the estimation process a nonlinear one that is highly sensitive and very likely to become unstable. However, when updating the values of coefficients only, the problem is reduced to a reliable linear estimation that is much more robust and stable.</p>
            <p>4 Based on the observations mentioned above, in our proposed method, finding the distribution of <it>z </it>is regarded partially as a clustering problem, i.e. the means and variances of the normal distributions are estimated as the prototypes of a clustering step. Specifically, if <it>z </it>is distributed in a one-dimensional space, wherever there is a mass of <it>z</it>, there is a cluster with mean <it>&#956;</it><sub><it>i </it></sub>and variance <it>V</it><sub><it>i</it></sub>, which are identified by the members of that cluster.</p>
            <p>Hence applying a clustering method is capable of estimating the means and variances of each normal distribution. The key is to use a simple clustering technique such as K-mean to estimate the mixture distributions <it>f</it><sub>0 </sub>and <it>f </it>based on <it>K </it>normal distributions. While the algorithm can use K-means to find the optimal values of means and variances, the coefficients <it>&#960;</it><sub><it>i </it></sub>'s need to be optimized using an optimization process such as the EM.</p>
            <p>Based on the above discussion, the proposed algorithm can be described in the following two steps:</p>
            <p><b>Step 1</b>: Using BIC, find the sub-optimal number of normal distributions for both <it>f </it>and <it>f</it><sub>0 </sub>(as described above). These optimal numbers are then used as the number of clusters in K-means technique.</p>
            <p><b>Step 2</b>: Using K-means clustering technique, for both <it>f </it>and <it>f</it><sub>0 </sub>find the best mean <it>&#956;</it><sub><it>i </it></sub>and variance <it>V</it><sub><it>i </it></sub>for all clusters.</p>
            <p><b>Step 3</b>: With the obtained values of <it>&#956;</it><sub><it>i</it></sub>, <it>V</it><sub><it>i </it></sub>and using the EM algorithm, iteratively update the values of the optimized <it>&#960;</it><sub><it>i </it></sub>for all clusters (both <it>f </it>and <it>f</it><sub>0</sub>), i.e.</p>
            <p>
               <graphic file="1471-2105-5-201-i13.gif"/>
            </p>
            <p>A reasonable number of clusters is expected to be obtained from the first step of the algorithm, and the estimation results of the two bellow data sets in Tables <tblr tid="T1">1</tblr> and <tblr tid="T4">4</tblr> show that the used K (calculated based on AIC) is satisfactory. Table <tblr tid="T3">3</tblr> shows the results of the MMM and K5M methods for the run with an unequal variance and four normal distributions for both <it>f </it>and <it>f</it><sub>0</sub>. The MMM creates the likelihood ratio (LR) statistics plotted in Figure <figr fid="F1">1</figr>, the K5M with <it>K </it>= 4 forms the LR statistics plotted in Figure <figr fid="F2">2</figr>, and the K5M with <it>K </it>= 2 results to the LR plot of Figure <figr fid="F3">3</figr>.</p>
            <tbl id="T1" hint_layout="double">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Comparison of the result of the K5M with the MMM and the Mod2MMM based on the Leukaemia data.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Method</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total detected genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>ALL</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>AML</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total accepted genes out of 50 genes [22]</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>MMM</p>
                     </c>
                     <c ca="center">
                        <p>187</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Mod2MMM</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>K5M, K = 3</p>
                     </c>
                     <c ca="center">
                        <p>185</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>K5M, K = 4</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T3" hint_layout="double">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Estimation of fitted and by MMM (in the optimum run) and K5M.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>f</it>
                           <sub>
                              <it>0</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>f</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MMM</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i14.gif"/> = (0.1859, -0.2231, 0.0322, 0.0638)</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i14.gif"/> = (-0.0387, 0.4381, 0.1600, -0.1933)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i15.gif"/> = (0.3215, 0.3522, 0.7692, 0.337)</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i15.gif"/> = (3.2288, 3.397, 2.6393, 4.6982)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i16.gif"/> = (0.1672, 0.2353, 0.4048, 0.1925)</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i16.gif"/> = (0.0687, 0.0509, 0.0263, 0.0725)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>K5M</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i14.gif"/> = (1.1111, 1.1264, 0.3115, -0.3329)</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i14.gif"/> = (1.7867, -0.6817, -2.354, 0.3324)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i15.gif"/> = (0.4589, 0.4640, 0.1879, 0.1807)</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i15.gif"/> = (2.9432, 0.5583, 4.24, 0.5027)</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i16.gif"/> = (0.1914, 0.1963, 0.3120, 0.3001)</p>
                     </c>
                     <c ca="left">
                        <p><graphic file="1471-2105-5-201-i16.gif"/> = (0.0583, 0.1018, 0.0294, 0.0442)</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <tbl id="T4" hint_layout="double">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>The top ten most significant genes provided by K5M and MMM.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>
                           <b>GenBank Accession IDs</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Gene/ Protein Description</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Rank based on MMM</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Rank based on K5M</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D00073</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/ carrier activity</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AA815845</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AF085696</p>
                     </c>
                     <c ca="left">
                        <p>ion transportation/ K+ channel, inward rectifier/renal salt flow</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AW047688</p>
                     </c>
                     <c ca="left">
                        <p>Brain/Hypothalamus</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>M12660</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/ Complement protein H gene</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AI847513</p>
                     </c>
                     <c ca="left">
                        <p>Brain/ Hypothalamus</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AA919924</p>
                     </c>
                     <c ca="left">
                        <p>Phosphate metabolism/inositol-1(or4)-monophospha te Activity</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>X69966</p>
                     </c>
                     <c ca="left">
                        <p>Dilation of the proximal renal tubules and extensive vacuolization of tubule epithelium</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AF103809</p>
                     </c>
                     <c ca="left">
                        <p>Elevated kidney levels of lysosomal enzymes</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AA711516</p>
                     </c>
                     <c ca="left">
                        <p>Barstead mouse myotubes MPLRB5</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Likelihood ratio statistics as a function of Z value based on the MMM method</p>
               </caption>
               <text>
                  <p>Likelihood ratio statistics as a function of Z value based on the MMM method.</p>
               </text>
               <graphic file="1471-2105-5-201-1" hint_layout="single"/>
            </fig>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Likelihood ratio statistics as a function of <it>z </it>based on the K5M with K = 4</p>
               </caption>
               <text>
                  <p>Likelihood ratio statistics as a function of <it>z </it>based on the K5M with K = 4.</p>
               </text>
               <graphic file="1471-2105-5-201-2" hint_layout="single"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Likelihood ratio statistics as a function of <it>z </it>based on the K5M with K = 2</p>
               </caption>
               <text>
                  <p>Likelihood ratio statistics as a function of <it>z </it>based on the K5M with K = 2.</p>
               </text>
               <graphic file="1471-2105-5-201-3" hint_layout="single"/>
            </fig>
            <p>It is worth mentioning that due to the random initialization in K-means and the random initialization of the coefficients <it>&#960;</it><sub><it>i </it></sub>'s, in each run, it is expected that the number of identified differentially expressed genes fluctuate slightly. However, as indicated above, since the K- means clustering algorithm is known to a robust method, and considering the fact that in the EM estimation process, only a linear estimation is performed, it is expected that the robustness of the proposed algorithm be much more than the other version of the MMM based algorithms. This observation, as have been shown before, is supported by our experimental results. In addition, our experimental indicate that the most expressed genes are identified in all runs or the algorithm and in each run one or two new genes with less expression ratio are added to this set.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>In this section, first the two applications and their corresponding data sets are described and then the results produced by the proposed method (i.e. K5M) is compared with the other MMM based methods on two data sets. The detailed description of the methods is given in MMM &amp; its recent extensions Section.</p>
         <sec>
            <st>
               <p>Leukaemia dataset</p>
            </st>
            <p>In this section, we apply the nonparametric MMM method with and without the proposed modifications to the Leukaemia data presented in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The objective of this application is to identify the most important genes involved in development of different types of Leukaemia. The dataset used for this analysis includes 27 acute lymphoblastic leukaemia (ALL) samples and 11 acute myeloid leukaemia (AML) samples for 7129 genes. The main goal is to find genes with differential expression between ALL and AML cases. A second goal is to compare the result of MMM and Mod2MMM (as introduced in MMM &amp; its recent extensions Section) with K5M and test the robustness of K5M. The genome-wide significance level is chosen <it>&#945; </it>= 0.01 (according to Benferroni adjustment used in the MMM based methods). Each sample in the dataset is pre-processed as in <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, by subtracting its median and dividing the resulting variable by its quartile range (i.e. the difference between the first and the third quartile).</p>
         </sec>
         <sec>
            <st>
               <p>Results of Leukaemia study</p>
            </st>
            <p>Thomas et al <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> used known biological information to identify the most important genes in Leukaemia and provided biological justifications for these identified genes. They introduced 50 genes out of the identified genes as the most expressed and related genes to the disease, including 25 most expressed genes for AML and 25 for ALL. We treat Thomas et al's list as the biology knowledge base and compare the capabilities of the computational techniques to correctly identify the genes discussed in <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> by processing the dataset.</p>
            <p>The comparison of the result obtained by the K5M with those of the MMM and the Mod2MMM is summarized in Table <tblr tid="T1">1</tblr>. As can be seen in Table <tblr tid="T1">1</tblr>, The MMM has identified 187 differentially expressed genes <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, among which the total of 39 genes are in the list of genes obtained by Thomas et al <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The Mod2MMM method found 30 genes of the Thomas's list. The K5M algorithm, determines 45 genes that are identified in the Thomas's list, i.e. the proposed algorithm successfully identifies 90% of biological result. This means that K5M improved the detection of expressed genes 12% compare to the MMM and 30% compare to the Mod2MMM for the Leukaemia data, i.e. our method identified more genes from the list of the 50 truly expressed genes identified by Thomas et al <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
            <p>As the BIC suggested the optimum number of clusters K = 4 for the MMM, the K5M is applied with K = 4 also. Running K5M with different number of clusters leads to the different but reasonably similar results. As the number of the clusters increase, the number of expressed genes decreases. Table <tblr tid="T1">1</tblr> shows that the K5M with <it>K </it>= 3 identifies the total of 185 differentially expressed genes, while with <it>K </it>= 4 the total of 58 genes are identified, however; the 58 genes found with <it>K </it>= 3 are the most expressed genes among 185 genes found by <it>K </it>= 4. This result shows the consistency of the K5M method.</p>
            <p>In order to further compare the performance of the MMM and K5M on the leukaemia data, The ROC curve is plotted based on False Positive rate and True Positive rate of the data set calculated as in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The area under each curve is the measure of test accuracy. As can be seen in Figure <figr fid="F5">5</figr>, the area under the K5M curve is more than the area under the MMM curve, therefore the K5M is providing a more accurate classification than the MMM.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>ROC curves for the MMM and K5M based on the leukaemia data set</p>
               </caption>
               <text>
                  <p>ROC curves for the MMM and K5M based on the leukaemia data set. The area under the K5M curve is more than the area under the MMM which shows the K5M method is more accurate than the MMM.</p>
               </text>
               <graphic file="1471-2105-5-201-5" hint_layout="single"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Hypophosphatemia dataset</p>
            </st>
            <p>The following study is the main application for which the proposed method was specialized and therefore is described in more details. Hypophosphatemia is a common X-linked metabolic bone disorder in human. Hypophosphatemia results from phosphate wasting in the renal tubules. Phosphate that is normally reabsorbed from the urine is excreted. It appears that elevated levels of FGF-23 activate the excretion of phosphorous by the kidneys. Previous studies have demonstrated an impairment of the high- affinity, low capacity Na+ dependent phosphate co-transport system <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The main animal model used to study this disease is the <it>Hyp </it>mouse. <it>Hyp </it>mice have a mutation of the <it>Phex </it>gene <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B9">9</abbr></abbrgrp>. The disease is characterized by low reabsorption of phosphate, bone disease, and bone abnormalities in the lower extremities. The genes active in the regulation of phosphate re-absorption in the kidney are not well understood. It is also not clear whether mutations of the <it>Phex </it>gene block renal adaptation to low phosphate diet. <it>Hyp </it>mice have a primary osteoblast defect and defects in vitamin D metabolism. Parabiosis experiments on normal and <it>Hyp </it>mice have revealed that there is an intrinsic osteoblast defect in <it>Hyp </it>mice rather than an intrinsic renal abnormality. <it>Hyp </it>kidneys transplanted into normal mice reabsorbed phosphorus at normal levels. Kidneys transplanted from normal mice into <it>Hyp </it>mice began phosphate wasting in the <it>Hyp </it>mice.</p>
            <p>The mechanism that leads to the excessive excretion of phosphorous is unknown. On a low phosphate diet a normal mouse will activate systems to conserve phosphate by increasing re-absorption. The genes activated in the normal mouse on the low phosphate diet, and the genes with differential expression between normal and <it>Hyp </it>mice should indicate the systems involved in the phosphorus homeostasis. In an attempt to identify these genes, nutritional experiments were performed on normal and <it>Hyp </it>mice [<abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B8">8</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp> and <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>]. Normal and <it>Hyp </it>mice were placed on low phosphate diets for 3 &#8211; 5 days. Tissue samples from the kidneys of test and control mice were collected. 16 samples were analyzed using Affymetrix GeneChip mouse U74A arrays- 4 samples for each experiment state. The mRNA of 12,488 genes was analyzed. Two GeneChip microarrays were done for each diet for normal mice and three microarrays for each diet for the <it>Hyp </it>mice for a total of 10 arrays.</p>
            <p>To investigate this, 5-week-old normal and <it>Hyp </it>were fed a control (1.0% P) or low phosphate (0.03% P) diet for five days. The four group experiments are shown in Table <tblr tid="T2">2</tblr>.</p>
            <tbl id="T2" hint_layout="single">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Four experimental groups in the Hyp mice data sets. In this paper, The comparisons are done between group 1 and group2, and between group 3 and group 4.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Diet</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Control</p>
                     </c>
                     <c ca="center">
                        <p>Low Phosphate</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Genotype</p>
                     </c>
                     <c ca="center">
                        <p>Normal</p>
                     </c>
                     <c ca="center">
                        <p>Group1</p>
                     </c>
                     <c ca="center">
                        <p>Group2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <it>Hyp</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Group3</p>
                     </c>
                     <c ca="center">
                        <p>Group4</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>In this study, we consider the gene expression signal less than 100 as noise caused by the microarray machine, and in the pre-processing step we ignored the genes whose expression signals in both conditions are less than 100. The following two specific goals are considered in this study:</p>
            <p>1. To identify the genes in whose mRNA expressions are altered by low phosphate diet in normal mice.</p>
            <p>2. To determine the effect of <it>Hyp </it>mutation on this response, i.e. identifying the genes in <it>Hyp </it>condition that are differentially expressed across the normal and low phosphate diet experiments.</p>
         </sec>
         <sec>
            <st>
               <p>Results of Hypophosphatemia study</p>
            </st>
            <p>The <it>Hyp </it>dataset includes five samples for each group. In order to make the number of data samples even, we used four samples of each group. For this data set, since j1 = j2, the Mod2MMM cannot be applied. In MMM method, five mixture models are used to estimate f<sub>0 </sub>and f (distributions under two experimental different conditions) with number of normal basis functions ranging from 1 to 5, i.e. The MMM algorithm was run several times and the run with maximum log-likelihood was chosen as the final model. Bayesian Information Criterion (BIC) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> was used to determine the number of components. To find the rejection region for a given model, the bisection method is used. In this paper we assume <it>&#945; </it>= 0.01, and therefore the gene-specific significance level used here is calculated as:</p>
            <p><it>&#945;</it>* = 0.01/(95.44 * 2) = 5 * 10<sup>-7</sup></p>
            <p>Using bisection method <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, as discussed in Section 4, the value of <it>s </it>is obtained as <it>s </it>= 3 &#215; 10<sub><sup>-6</sup></sub>.. Both the MMM and K5M were run 100 times. Figure <figr fid="F4">4</figr> presents the number of genes expressed in each run of the MMM. The difference between the number of identified differentially expressed genes in two runs with the minimum and the maximum number of genes amounts to 150 genes. This clearly indicates the high degree of inconsistency and irreproducibility of the results obtained by the MMM. The number of genes expressed in each run of the K5M indicates that all genes are the same in all runs and therefore indicates 100% repeatability and robustness of the proposed method.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Histogram of the number of genes expressed in each run by the MMM method which shows the strong variability (x-axis shows number of runs)</p>
               </caption>
               <text>
                  <p>Histogram of the number of genes expressed in each run by the MMM method which shows the strong variability (x-axis shows number of runs).</p>
               </text>
               <graphic file="1471-2105-5-201-4" hint_layout="single"/>
            </fig>
            <p>The ten most significant genes expressed by the low phosphate diet in the normal mouse identified by the MMM, and the ten most significant genes provided by K5M are represented in Table <tblr tid="T4">4</tblr>. As can be seen in Table <tblr tid="T5">5</tblr>, the most differentially expressed genes are same for the MMM and K5M. Out of these 10 genes, six are directly related to the kidney's functions. For this data set, the main advantage of the K5M is its consistency and robustness as discussed above. A similar procedure is conducted to accomplish the second goal of this study, i.e. identifying the role of <it>Hyp </it>condition on the most definitely expressed gene in normal and low phosphate diet microarrays. The ten most significant genes that are differentially expressed across the two experimental conditions, i.e. Normal Low Phosphate and <it>Hyp </it>Low Phosphate, are listed in table 6. As shown in the table 6, again eight genes are related directly to the kidney's function. These further witnesses to the capability of the proposed technique to discover the genes that are truly involved in the biological study.</p>
            <tbl id="T5" hint_layout="double">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>The top ten significant genes, by comparing group 3 and group 4 in table 2, provided by K5M and MMM.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Accession IDs</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Gene/ Protein Description</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Rank based on MMM</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Rank based on K5M</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AF028071</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/ apical plasma membrane, Basolateral plasma membrane</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D26352</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/calcium ion binding</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AA815845</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D00073</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/ carrier activity</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AB00603</p>
                     </c>
                     <c ca="left">
                        <p>Monooxygenase activity, oxidoreductase activity</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>U97079</p>
                     </c>
                     <c ca="left">
                        <p>GTP binding, protein binding, phosphate binding</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>AI315650</p>
                     </c>
                     <c ca="left">
                        <p>Detected in Kidney</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>X71922</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/ growth factor activity, hormone activity</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>D43797</p>
                     </c>
                     <c ca="left">
                        <p>Kidney/carrier activity, sodium, excitatory glutamate symporter activity</p>
                     </c>
                     <c ca="left">
                        <p>Identified as a non expressed gene</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>X81059</p>
                     </c>
                     <c ca="left">
                        <p>Protein phosphate 2</p>
                     </c>
                     <c ca="left">
                        <p>Identified as a non expressed gene</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>In this paper, we proposed a technique to improve the repeatability, and robustness of the mixture model method by using the K-mean clustering method in estimating the distributions. Our proposed method finds the distribution of the variables partially based on a clustering procedure and an EM optimization process. The method is applied to analyze two microarray data sets, Leukaemia data set and a data set reflecting the effect of the low phosphate diet on regular and <it>Hyp </it>mice <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> data. The experimental results indicate 100% robustness and repeatability of the results in different runs and provide 12% improvement (compared to the mixture model method) in detecting the relevant genes in both studies.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p><b>Maryam Zaheri, and Ali A. Rad </b>were in charge of writing the codes and programming aspects of the paper.</p>
         <p><b>Siamak Najarian and Javad Dargahi's </b>primary role was to perform a literature review on mixture model techniques, identify the aspects of the method that need to be improved, and provide suggestions to address these shortcomings.</p>
         <p><b>Kayvan Najarian's </b>primary roles were to design improvments to the algoritm (based on the literature review and overal modifications suggested by Siamak Najarian and Javad Dargahi), prepare and pre-process the data (for both datasets), partcipate in preperation of the <it>Hyp </it>dataset, define the <it>Hyp </it>problem interpret the results and finally write and edit the manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Appendix 1</p>
         </st>
         <p>The Mod2MMM makes a new z and Z based on the following formula:</p>
         <p>
            <graphic file="1471-2105-5-201-i17.gif"/>
         </p>
         <p>
            <graphic file="1471-2105-5-201-i18.gif"/>
         </p>
         <p>Where:</p>
         <p>
            <graphic file="1471-2105-5-201-i19.gif"/>
         </p>
         <p>
            <graphic file="1471-2105-5-201-i20.gif"/>
         </p>
         <p>And:</p>
         <p>
            <graphic file="1471-2105-5-201-i21.gif"/>
         </p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Belma Ford (University of North Carolina at Charlotte) for her valuable help in the interpretation of the biology data and results. The authors also thank R. Meyer and M. Meyer in Cannon Research Centre of Carolina Healthcare System for providing us with the <it>Hyp </it>dataset, as well for their assessment and interpretation of our results from biology standpoint.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Ratio-based decisions and the quantitative analysis of cDNA microarrays images</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Bittner</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Biomedical Optics</source>
            <pubdate>1997</pubdate>
            <volume>2</volume>
            <fpage>364</fpage>
            <lpage>367</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1117/1.429838</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <aug>
               <au>
                  <snm>Devore</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Peck</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Statistics: the Exploration and Analysis of Data</source>
            <publisher>Pacific Grove, CA: Duxbury Press</publisher>
            <edition>3</edition>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Empirical Bayes analysis of a microarray experiment</p>
            </title>
            <aug>
               <au>
                  <snm>Efron</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Storey</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tusher</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Journal of the American Statistical Association</source>
            <pubdate>2001</pubdate>
            <volume>96</volume>
            <fpage>1151</fpage>
            <lpage>1160</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1198/016214501753382129</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Significance analysis of microarrays applied to the ionizing radiation response</p>
            </title>
            <aug>
               <au>
                  <snm>Tusher</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Tibshirani</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Chu</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proc Nat Acad Sci</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>5116</fpage>
            <lpage>5121</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">33173</pubid>
                  <pubid idtype="pmpid" link="fulltext">11309499</pubid>
                  <pubid idtype="doi">10.1073/pnas.091062498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>A Mixture Model Approach to Detecting Differentially Expressed Genes with Microarray Data</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Le</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Functional &amp; Integrative Genomics</source>
            <pubdate>2001</pubdate>
            <volume>3</volume>
            <fpage>117</fpage>
            <lpage>124</lpage>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments</p>
            </title>
            <aug>
               <au>
                  <snm>Zhao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>1046</fpage>
            <lpage>1054</lpage>
            <note>(Also Report 2002-018, Division of Biostatistics, University of Minnesota, 2002)</note>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btf879</pubid>
                  <pubid idtype="pmpid" link="fulltext">12801864</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Molecular classification of cancer: class discovery and class predication by gene expression monitoring</p>
            </title>
            <aug>
               <au>
                  <snm>Golub</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Slonim</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Tamayo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gaasenbeek</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mesirov</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Coller</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Loh</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Downing</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Caligiuri</snm>
                  <fnm>MA</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>285</volume>
            <fpage>531</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5439.531</pubid>
                  <pubid idtype="pmpid" link="fulltext">10447482</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>mRNA expression of <it>Phex </it>in mice and rats: The effect of low phosphate diet</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Endocrine</source>
            <pubdate>2000</pubdate>
            <volume>13</volume>
            <fpage>81</fpage>
            <lpage>87</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1385/ENDO:13:1:81</pubid>
                  <pubid idtype="pmpid" link="fulltext">11051050</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Abnormal vitamin D metabolism in the X-linked hypophosphatemic mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Endocrinology</source>
            <pubdate>1980</pubdate>
            <volume>107</volume>
            <fpage>1577</fpage>
            <lpage>1581</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6893581</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>X-linked hypophosphatemic <it>Gy </it>mice: renal tubular maximum for phosphate vs. brush-border transport after low-P diet</p>
            </title>
            <aug>
               <au>
                  <snm>Thornton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tenenhouse</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Martel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bockian</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Am J Physiol</source>
            <pubdate>1994</pubdate>
            <volume>266</volume>
            <fpage>F309</fpage>
            <lpage>315</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8141332</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Renal phosphate transport and vitamin D metabolism in X-linked hypophosphatemic <it>Gy </it>mice: Response to phosphate deprivation</p>
            </title>
            <aug>
               <au>
                  <snm>Tenenhouse</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mandla</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Endocrinology</source>
            <pubdate>1992</pubdate>
            <volume>131</volume>
            <fpage>51</fpage>
            <lpage>56</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1210/en.131.1.51</pubid>
                  <pubid idtype="pmpid" link="fulltext">1612032</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Response of tissue phosphate content to acute dietary phosphate deprivation in the X-linked hypophosphatemic mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>CE</fnm>
               </au>
               <au>
                  <snm>Wilkie</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>RA</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>Calcif Tissue Int</source>
            <pubdate>1985</pubdate>
            <volume>37</volume>
            <fpage>423</fpage>
            <lpage>430</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3930041</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Abnormal regulation of plasma 1, 25- dihydroxyvitamin D in gyro (<it>Gy</it>, X-linked hypophosphatemic) mice</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gray</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>In: Vitamin D: Gene Regulation, Structure-Function Analysis and Clinical Application</source>
            <publisher>Walter de Gruyter, New York</publisher>
            <editor>Norman A, Bouillon R, Thomasset M</editor>
            <pubdate>1991</pubdate>
            <fpage>903</fpage>
            <lpage>904</lpage>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Effect of altered diet on serum levels of 1,25- dihydroxyvitamin- D and parathyroid hormone in X-linked hypophasphatemic mice (Hyp and Gy) mice</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Morgan</snm>
                  <fnm>PL</fnm>
               </au>
            </aug>
            <source>Bone</source>
            <pubdate>1996</pubdate>
            <volume>18</volume>
            <issue>1</issue>
            <fpage>23</fpage>
            <lpage>28</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/8756-3282(95)00420-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">8717533</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Renal expression of Na+-phosphate cotransporter mRNA and protein: Effect of the Gy mutation and low phosphate diet</p>
            </title>
            <aug>
               <au>
                  <snm>Beck</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Tenenhouse</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Biber</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Murer</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Pflugers Arch</source>
            <pubdate>1996</pubdate>
            <volume>431</volume>
            <fpage>936</fpage>
            <lpage>941</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s004240050088</pubid>
                  <pubid idtype="pmpid" link="fulltext">8927512</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>How many clusters? Which clustering methods? Answer via model-based cluster analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Fraley</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Raftery</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>The Computer Journal</source>
            <pubdate>1998</pubdate>
            <volume>41</volume>
            <fpage>578</fpage>
            <lpage>588</lpage>
         </bibl>
         <bibl id="B17">
            <aug>
               <au>
                  <snm>Press</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Teukolsky</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Vetterling</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Flannery</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Numerical Recipes in C, The Art of Scientific Computing</source>
            <publisher>New York: Cambridge University Press</publisher>
            <edition>2</edition>
            <pubdate>1992</pubdate>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Controlling the false discovery rate: a practical and powerful approach to multiple testing</p>
            </title>
            <aug>
               <au>
                  <snm>Benjamini</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hochberg</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Journal of the Royal Statistical Society, Series B</source>
            <pubdate>1995</pubdate>
            <volume>57</volume>
            <fpage>289</fpage>
            <lpage>300</lpage>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion)</p>
            </title>
            <aug>
               <au>
                  <snm>Dempster</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Laird</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J R Statist Soc</source>
            <pubdate>1977</pubdate>
            <volume>39</volume>
            <fpage>1</fpage>
            <lpage>38</lpage>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Learning-Based Complexity Evaluation of Radial Basis Function Networks</p>
            </title>
            <aug>
               <au>
                  <snm>Najarian</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Neural Processing Letters</source>
            <pubdate>2002</pubdate>
            <volume>16</volume>
            <issue>2</issue>
            <fpage>137</fpage>
            <lpage>150</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1023/A:1019999408474</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments."</p>
            </title>
            <aug>
               <au>
                  <snm>Pan</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm/>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <issue>4</issue>
            <fpage>546</fpage>
            <lpage>554.1</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.4.546</pubid>
                  <pubid idtype="pmpid" link="fulltext">12016052</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>An Efficient and Robust statistical modelling approach to discover differentially expressed genes using genomics expression profile</p>
            </title>
            <aug>
               <au>
                  <snm>Thomas</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Olson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tapscott</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1227</fpage>
            <lpage>1236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.165101</pubid>
                  <pubid idtype="pmpid" link="fulltext">11435405</pubid>
                  <pubid idtype="pmcid">311075</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Renal adaptation to phosphate deprivation in the <it>Hyp </it>mouse with X-linked hypophosphatemia</p>
            </title>
            <aug>
               <au>
                  <snm>Tenenhouse</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Scriver</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Can J Biochem</source>
            <pubdate>1979</pubdate>
            <volume>57</volume>
            <fpage>938</fpage>
            <lpage>944</lpage>
            <xrefbib>
               <pubid idtype="pmpid">476527</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Renal Na(+)-phosphate cotransport in murine X-linked hypophosphatemic rickets. Molecular characterization</p>
            </title>
            <aug>
               <au>
                  <snm>Tenenhouse</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Werner</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Biber</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ma</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Martel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Roy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Murer</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Clin Invest</source>
            <pubdate>1994</pubdate>
            <volume>93</volume>
            <fpage>671</fpage>
            <lpage>676</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">293897</pubid>
                  <pubid idtype="pmpid">8113402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Osteomalacia and altered magnesium metabolism in the X-linked hypophosphatemic mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jowsey</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Calcif Tissue Int</source>
            <pubdate>1979</pubdate>
            <volume>27</volume>
            <fpage>19</fpage>
            <lpage>26</lpage>
            <xrefbib>
               <pubid idtype="pmpid">111782</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Estimating the dimentions of a model</p>
            </title>
            <aug>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Annals of Statistics</source>
            <pubdate>1978</pubdate>
            <volume>6</volume>
            <fpage>461</fpage>
            <lpage>464</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>National Center for Biotechnology Information</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi</url>
         </bibl>
      </refgrp>
   </bm>
</art>
