<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1753-6561-5-S3-S11</ui>
   <ji>1753-6561</ji>
   <fm>
      <dochead>Proceedings</dochead>
      <bibl>
         <title>
            <p>A comparison of random forests, boosting and support vector machines for genomic selection</p>
         </title>
         <aug>
            <au ca="yes" id="A1"><snm>Ogutu</snm><mi>O</mi><fnm>Joseph</fnm><insr iid="I1"/><email>jogutu2007@gmail.com</email></au>
            <au id="A2"><snm>Piepho</snm><fnm>Hans-Peter</fnm><insr iid="I1"/><email>Hans-Peter.Piepho@uni-hohenheim.de</email></au>
            <au id="A3"><snm>Schulz-Streeck</snm><fnm>Torben</fnm><insr iid="I1"/><email>torben.schulz-streeck@uni-hohenheim.de</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Bioinformatics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart, Germany</p></ins>
         </insg>
         <source>BMC Proceedings</source>
         
         
         <supplement><title><p>Proceedings of the 14th European workshop on QTL mapping and marker assisted selection (QTL-MAS)</p></title><editor>Maciej Szydlowski</editor><sponsor><note>Publication of this supplement was supported by Animal Breeding and Inseminating Centre Ltd, Bydgoszcz, Poland.</note></sponsor><note>Proceedings</note><url>http://www.biomedcentral.com/content/pdf/1753-6561-5-S3-info.pdf</url></supplement><conference><title><p>14th QTL-MAS Workshop</p></title><location>Poznan, Poland</location><date-range>17-18 May 2010</date-range><url>http://jay.up.poznan.pl/qtlmas2010/index.html</url></conference><issn>1753-6561</issn>
         <pubdate>2011</pubdate>
         <volume>5</volume>
         <issue>Suppl 3</issue>
         <fpage>S11</fpage>
         <url>http://www.biomedcentral.com/1753-6561/5/S3/S11</url>
         <xrefbib><pubid idtype="doi">10.1186/1753-6561-5-S3-S11</pubid></xrefbib>
      </bibl>
      <history><pub><date><day>27</day><month>5</month><year>2011</year></date></pub></history>
      <cpyrt><year>2011</year><collab>Ogutu et al; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Genomic selection is a method for estimating GEBVs using dense molecular markers spanning the entire genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Given the wide range of approaches for predicting GEBVs, it is important to evaluate their performance, pros and cons to identify those able to accurately predict GEBVs. Here, we compare predictive performances among three of the most powerful machine learning methods with demonstrated high predictive accuracies in many application domains, namely RF <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>; boosting <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and SVMs <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> and with RR-BLUP <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> for predicting breeding values for quantitative traits.</p>
         <p>RF has several appealing properties that make it potentially attractive for GS <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B4">4</abbr></abbrgrp>: (<it>i</it>) the number of markers can far exceed that of observations, (<it>ii</it>) all markers, including those with weak effects, highly correlated and interacting markers have a chance to contribute to the model fit, (<it>iii</it>) complex interactions between markers can be easily accommodated, (<it>iv</it>) they can perform both simple and complex classification and regression accurately, (<it>v</it>) they often require modest fine-tuning of parameters and the default parameterization often performs well <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, and (<it>vi</it>) they make no distributional assumptions about the predictor variables. Boosting is a stagewise additive model fitting procedure that can enhance the predictive performance of weak learning algorithms <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. SVMs perform robustified regression using kernel functions of inner products of predictors <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>We comparatively evaluated the predictive performance of the three machine learning methods and RR-BLUP for estimating GEBVs using the common dataset simulated for the QTLMAS 2010 workshop. RF regression was used to rank the SNPs in terms of their predictive importance.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data</p>
            </st>
            <p>The simulated data set contained 3226 individuals spanning five generations out of which 2326, constituting the first four generations, were phenotyped and genotyped for 10031 biallelic SNPs arrayed on a genome encompassing five chromosomes. The remaining 900 individuals, representing the fifth generation, had genomic but lacked phenotypic records on the single quantitative trait. The covariate for each genotype with alleles <it>A</it><sub>1</sub> and <it>A</it><sub>2</sub> was set to 1 for <it>A</it><sub>1</sub><it>A</it><sub>1</sub>, -1 for <it>A</it><sub>2</sub><it>A</it><sub>2</sub> and 0 for <it>A</it><sub>1</sub><it>A</it><sub>2</sub> or<it>A</it><sub>2</sub><it>A</it><sub>1</sub>.</p>
         </sec>
         <sec>
            <st>
               <p>Random forests</p>
            </st>
            <p>RF regression uses an ensemble of unpruned decision trees, each grown using a bootstrap sample of the training data, and randomly selected subsets of predictor variables as candidates for splitting tree nodes. The RF regression prediction for a new observation <it>x</it> (<inline-formula><graphic file="1753-6561-5-S3-S11-i1.gif"/></inline-formula>) is made by averaging the output of the ensemble of <it>B</it> trees <inline-formula><graphic file="1753-6561-5-S3-S11-i2.gif"/></inline-formula> as <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>:</p>
            <p>
               <display-formula id="M1">
                  <graphic file="1753-6561-5-S3-S11-i3.gif"/>
               </display-formula>
            </p>
            <p>where &#936;<it><sub>b</sub></it> characterizes the <it>b</it>th RF tree in terms of split variables, cutpoints at each node, and terminal node values.</p>
            <p>We implemented RF in the R package <it>randomForest</it> with decision trees as base learners <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Following various recommendations <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, we evaluated different combinations of the values of the number of trees to grow, <it>ntree</it> = {500, 1000, 2000}, the number of SNPs randomly selected at each tree node, <it>mtry</it> = {0.5, 1, 2} &#215; the default value of <it>mtry</it> of sample size/3 for regression, and the minimum size of terminal nodes of trees, below which no split is attempted, <it>nodesize</it> = 1. The parameter configuration with the highest prediction accuracy was <it>ntree</it> =1000, <it>mtry</it> = 3000 and <it>nodesize</it> =1. We ranked SNPs by the relative importance of their contributions to predictive accuracy, quantified by how much prediction error increased when the observations left out of the bootstrap samples, the out-of-bag data for a SNP, were randomly permuted while data for all the other SNPs were left unchanged <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Stochastic Gradient Boosting</p>
            </st>
            <p>Boosting is an ensemble learning method for improving the predictive performance of classification or regression procedures, such as decision trees <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Gradient-boosted models can also handle interactions, automatically select variables, are robust to outliers, missing data and numerous correlated and irrelevant variables and can construct variable importance in exactly the same way as RF <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Boosting iteratively adds basis functions in a greedy fashion such that each additional basis function further reduces the selected loss (error) function <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr></abbrgrp>:</p>
            <p>
               <display-formula id="M2">
                  <graphic file="1753-6561-5-S3-S11-i4.gif"/>
               </display-formula>
            </p>
            <p>where <it>&#946;<sub>m</sub></it>, <it>m</it> =1,2,&#8230;, <it>M</it> are the basis expansion coefficients, and <it>b</it>(<it>x</it>, <it>&#947;</it>) are simple functions of the multivariate argument <it>x</it>, with a set of parameters <it>&#947;</it>=(<it>&#947;</it><sub>1</sub>,<it>&#947;</it><sub>2</sub>,&#8230;,<it>&#947;<sub>M</sub></it>).</p>
            <p>We used regression trees as basis functions. Boosting regression trees involves generating a sequence of trees, each grown on the residuals of the previous tree <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B9">9</abbr></abbrgrp>. Prediction is accomplished by weighting the ensemble outputs of all the regression trees. We used stochastic gradient boosting, assuming the Gaussian distribution for minimizing squared-error loss in the R package <it>gbm</it><abbrgrp><abbr bid=" B9">9</abbr></abbrgrp>. We determined the main tuning parameter, the optimal number of iterations (or trees), using an out-of-bag estimate of the improvement in predictive performance. This evaluates the reduction in deviance based on observations not used in selecting the next regression tree. The minimum number of observations in the trees' terminal nodes was set to 1, the shrinkage factor applied to each tree in the expansion to 0.001 and the fraction of the training set observations randomly selected to propose the next tree in the expansion to 0.5. With these settings boosting regression trees with at most 8-way interactions between SNPs required 3656 iterations for the training dataset based on inspecting graphical plots of the out-of-bag change in squared error loss against the number of iterations <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Support Vector Machines (SVMs)</p>
            </st>
            <p>SVMs perform robustified regression for quantitative responses by exploiting the relationships between observations by arraying predictors in observation space using a set of inner products. For regression with a quantitative response, SVM uses the model</p>
            <p>
               <display-formula id="M3">
                  <graphic file="1753-6561-5-S3-S11-i5.gif"/>
               </display-formula>
            </p>
            <p>where the basis functions, <it>h</it>(<it>x</it>)<it><sup>T</sup></it>, which can be linear (or nonlinear) transformations of one (or more) predictors (<it>x</it>), are additively combined with the vector of weights (<it>&#946;</it>). We used the &#8220;<it>&#949;</it>-insensitive&#8221; SVM regression that uses only residuals smaller in absolute value than some constant (<it>&#949;</it>) and a linear loss function for larger residuals. This is a robustified regression for which the minimization exercise can be written in regularized sum of squares form <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> as:</p>
            <p>
               <display-formula id="M4">
                  <graphic file="1753-6561-5-S3-S11-i6.gif"/>
               </display-formula>
            </p>
            <p>where</p>
            <p>
               <display-formula id="M5">
                  <graphic file="1753-6561-5-S3-S11-i7.gif"/>
               </display-formula>
            </p>
            <p>is an &#8220;<it>&#949;</it>-insensitive&#8221; error measure, ignoring errors less than <it>&#949;</it>, <it>&#955;</it> is a positive constant that controls the trade-off between the approximation error and the amount up to which deviations larger than <it>&#949;</it> are tolerated to get solutions for the SVM regression problem, <it>y</it> is a quantitative response and <inline-formula><graphic file="1753-6561-5-S3-S11-i8.gif"/></inline-formula> denotes the norm under a Hilbert space. The SVM optimization procedure produces solution functions of the form <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>:</p>
            <p>
               <display-formula id="M6">
                  <graphic file="1753-6561-5-S3-S11-i9.gif"/>
               </display-formula>
            </p>
            <p>
               <display-formula>
                  <graphic file="1753-6561-5-S3-S11-i10.gif"/>
               </display-formula>
            </p>
            <p>where <inline-formula><graphic file="1753-6561-5-S3-S11-i11.gif"/></inline-formula> are positive weights given to each observation and estimated from the data and the inner product kernel <it>K</it>(<it>x<sub>i</sub></it>,<it>x<sub>j</sub></it>) is a <it>N</it> &#215; <it>N</it> symmetric and positive definite matrix <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Typically only a subset of <inline-formula><graphic file="1753-6561-5-S3-S11-i12.gif"/></inline-formula> are nonzero, and the associated observations are called support vectors, hence the name support vector machines. Since the solution depends on the input values only through the inner products <it>K</it>(<it>x<sub>i</sub></it>,<it>x<sub>j</sub></it>), a flexible fitting is achieved by transforming the cross-products using the kernel function (<it>K</it>(<it>x<sub>i</sub></it>,<it>x<sub>j</sub></it>)) that alters how two observations are related to each other.</p>
            <p>We used the <it>&#949;</it>-insensitive SVM regression with a linear kernel to predict GEBVs in the R package <it>e1071</it><abbrgrp><abbr bid=" B8">8</abbr></abbrgrp> with an insensitivity zone of <it>&#949;</it> = 10 and a regularization (cost) parameter (<it>&#955;</it> &gt; 0) of <it>&#955;</it> = 0.001 determined by grid search.</p>
         </sec>
         <sec>
            <st>
               <p>Assessing prediction performance</p>
            </st>
            <p>We used 5-fold cross-validation and the Pearson correlation between the simulated values and predicted GEBVs from the validation set and between the predicted and true breeding values (TBVs) for the non-phenotyped individuals constituting the fifth generation to quantify the predictive accuracy of each method. The training and validation sets respectively contained 60 and 15 crosses and encompassed all phenotyped individuals except the 20 founders.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The correlations between the simulated values and predicted GEBVs indicated better performance for boosting and SVMs than for RF (Table <tblr tid="T1">1</tblr>). The correlations between the predicted and true breeding values (TBVs) for the non-phenotyped individuals were also highest for boosting. These accuracies were comparable with that for RR-BLUP (Table <tblr tid="T1">1</tblr>). Although boosting and SVMs apparently outperformed RF, SVMs was computationally intensive, especially the grid search for tuning its parameters.</p>
         <tbl id="T1"><title><p>Table 1</p></title><caption><p/></caption><tblbdy cols="11">
      <r>
         <c ca="left">
            <p>CV/TBV</p>
         </c>
         <c ca="left" cspan="2">
            <p>Sample size</p>
         </c>
         <c ca="left" cspan="2">
            <p>Random Forests</p>
         </c>
         <c ca="left" cspan="2">
            <p>Boosting</p>
         </c>
         <c ca="left" cspan="2">
            <p>Support Vector Machines</p>
         </c>
         <c ca="left" cspan="2">
            <p>Ridge Regression BLUP</p>
         </c>
      </r>
      <r>
         <c cspan="11">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>Mean</p>
         </c>
         <c ca="left">
            <p>Range</p>
         </c>
         <c ca="left">
            <p>Mean</p>
         </c>
         <c ca="left">
            <p>Range</p>
         </c>
         <c ca="left">
            <p>Mean</p>
         </c>
         <c ca="left">
            <p>Range</p>
         </c>
         <c ca="left">
            <p>Mean</p>
         </c>
         <c ca="left">
            <p>Range</p>
         </c>
         <c ca="left">
            <p>Mean</p>
         </c>
         <c ca="left">
            <p>Range</p>
         </c>
      </r>
      <r>
         <c cspan="11">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>CV</p>
         </c>
         <c ca="left">
            <p>439</p>
         </c>
         <c ca="left">
            <p>416-514</p>
         </c>
         <c ca="left">
            <p>0.466</p>
         </c>
         <c ca="left">
            <p>0.392-0.534</p>
         </c>
         <c ca="left">
            <p>0.503</p>
         </c>
         <c ca="left">
            <p>0.431-0.567</p>
         </c>
         <c ca="left">
            <p>0.503</p>
         </c>
         <c ca="left">
            <p>0.432-0.567</p>
         </c>
         <c ca="left">
            <p>0.530</p>
         </c>
         <c ca="left">
            <p>0.451-0.620</p>
         </c>
      </r>
      <r>
         <c cspan="11">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>TBV</p>
         </c>
         <c ca="left">
            <p>900</p>
         </c>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>0.483</p>
         </c>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>0.547</p>
         </c>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>0.497</p>
         </c>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>0.607</p>
         </c>
         <c ca="left">
            <p/>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>Predictive accuracies of random forests, boosted regression trees, epsilon support vector machines and RR-BLUP, expressed as the Pearson correlation between GEBVs and observed values from the 5-fold cross-validation (CV) and between GEBVs and TBV for non-phenotyped individuals (TBV).</p>
   </tblfn></tbl>
         <p>-Table <tblr tid="T1">1</tblr>-</p>
         <p>RF produced reasonable importance rankings of the SNPs (Figure <figr fid="F1">1</figr> and Figure <figr fid="F2">2</figr>), which can be used to pre-screen promising markers for further testing.</p>
         <fig id="F1"><title><p>Figure 1</p></title><caption><p><b>Importance ranking of the 10031 SNP markers by random forest using percent increase in mean squared error.</b> Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.</p></caption><text>
   <p><b>Importance ranking of the 10031 SNP markers by random forest using percent increase in mean squared error.</b> Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.</p>
</text><graphic file="1753-6561-5-S3-S11-1"/></fig>
         <fig id="F2"><title><p>Figure 2</p></title><caption><p><b>Importance ranking of the 10031 SNP markers by random forest using tree node impurity.</b> Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.</p></caption><text>
   <p><b>Importance ranking of the 10031 SNP markers by random forest using tree node impurity.</b> Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.</p>
</text><graphic file="1753-6561-5-S3-S11-2"/></fig>
         <p>The two ensemble methods can accommodate complex relationships and interactions (epistasis), which is a potential advantage, but the simulated data did not display many such interactions. A few simulated interacting SNPs with large effects were ranked highly but not top-ranked by RF possibly because RF and boosting had to randomly subsample the 10031 predictors. Thus, it may happen that the SNPs closest to a QTL are not sufficiently frequently sampled, so that the signal of the QTL is captured by other more distant SNPs. Consequently, the signal of a QTL gets blurred relative to classical QTL mapping approaches, which always scan all the markers. This may be one reason that these methods may not perform as well as some other much simpler competitors (e.g., RR-BLUP, BayesB). Nevertheless, for data with complex traits controlled by many genes that show epistatic interactions, the machine learning methods hold much promise and perhaps may even outperform BLUP. Not surprisingly, Moser et al. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> found the accuracy of SVMs to be the highest among five methods (including BLUP) used to predict GEBVs of dairy bulls from empirical data.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Predictive accuracies of all three methods were remarkably similar, but boosting and SVMs performed somewhat better than RF. Although boosting was only slightly better than the other methods, it holds perhaps the greatest promise for GS because of its wide versatility, allowing it to assume simpler, faster and more interpretable forms, such as componentwise boosting, able to incorporate automatic predictor selection.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors&#8217; contributions</p>
         </st>
         <p>JOO conceived the study, conducted the statistical analysis and drafted the manuscript. HPP participated in discussions, helped refine the manuscript and oversaw the project. TSS participated in discussions, data preparation and analysis, and writing of the manuscript. All the authors read and approved the manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The German Federal Ministry of Education and Research (BMBF) funded this research within the AgroClustEr &#8220;Synbreed &#8211; Synergistic plant and animal breeding&#8221; (Grant ID: 0315526).</p>
            <p>This article has been published as part of <it>BMC Proceedings</it> Volume 5 Supplement 3, 2011: Proceedings of the 14th QTL-MAS Workshop. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1753-6561/5?issue=S3</url>.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Prediction of total genetic value using genome-wide dense marker maps</p></title><aug><au><snm>Meuwissen</snm><fnm>THE</fnm></au><au><snm>Hayes</snm><fnm>BJ</fnm></au><au><snm>Goddard</snm><fnm>ME</fnm></au></aug><source>Genetics</source><pubdate>2001</pubdate><volume>157</volume><fpage>1819</fpage><lpage>1829</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1461589</pubid><pubid idtype="pmpid" link="fulltext">11290733</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Random forests</p></title><aug><au><snm>Breiman</snm><fnm>L</fnm></au></aug><source>Machine Learning</source><pubdate>2001</pubdate><volume>45</volume><fpage>5</fpage><lpage>32</lpage><xrefbib><pubid idtype="doi">10.1023/A:1010933404324</pubid></xrefbib></bibl><bibl id="B3"><title><p>Classification and regression by randomForest</p></title><aug><au><snm>Liaw</snm><fnm>A</fnm></au><au><snm>Wiener</snm><fnm>M</fnm></au></aug><source>R News</source><pubdate>2002</pubdate><volume>2</volume><fpage>18</fpage><lpage>22</lpage></bibl><bibl id="B4"><title><p>A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification</p></title><aug><au><snm>Statnikov</snm><fnm>A</fnm></au><au><snm>Wang</snm><fnm>L</fnm></au><au><snm>Aliferis</snm><fnm>CF</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>319</fpage><lpage>324</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-319</pubid><pubid idtype="pmcid">2492881</pubid><pubid idtype="pmpid" link="fulltext">18647401</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>The elements of statistical learning</p></title><aug><au><snm>Hastie</snm><fnm>TJ</fnm></au><au><snm>Tibshirani</snm><fnm>R</fnm></au><au><snm>Friedman</snm><fnm>J</fnm></au></aug><publisher>New York: Springer</publisher><edition>Second</edition><pubdate>2009</pubdate></bibl><bibl id="B6"><title><p>Pattern recognition and machine learning</p></title><aug><au><snm>Bishop</snm><fnm>CM</fnm></au></aug><publisher>New York: Springer</publisher><pubdate>2006</pubdate></bibl><bibl id="B7"><title><p>Ridge regression and extensions for genome-wide selection in maize</p></title><aug><au><snm>Piepho</snm><fnm>HP</fnm></au></aug><source>Crop Science</source><pubdate>2009</pubdate><volume>49</volume><fpage>1165</fpage><lpage>1176</lpage><xrefbib><pubid idtype="doi">10.2135/cropsci2008.10.0595</pubid></xrefbib></bibl><bibl id="B8"><title><p>Misc Functions of the Department of Statistics (e1071), TU Wien</p></title><aug><au><snm>Dimitriadou</snm><fnm>E</fnm></au><au><snm>Hornik</snm><fnm>K</fnm></au><au><snm>Leisch</snm><fnm>K</fnm></au><au><snm>Meyer</snm><fnm>D</fnm></au><au><snm>Weingessel</snm><fnm>A</fnm></au></aug><source>R package</source><note>version 1.5-24. Available at <url>http://cran.r-project.org/web/packages/e1071/</url></note></bibl><bibl id="B9"><title><p>Gbm: Generalized boosted regression models</p></title><aug><au><snm>Ridgeway</snm><fnm>G</fnm></au></aug><source>R package</source><note>version 1.6-3.1. Available at <url>http://cran.r-project.org/web/packages/gbm/</url></note></bibl><bibl id="B10"><title><p>A comparison of five methods to predict genomic breeding values of dairy bulls from genom-wide SNP markers</p></title><aug><au><snm>Moser</snm><fnm>G</fnm></au><au><snm>Tier</snm><fnm>B</fnm></au><au><snm>Crump</snm><fnm>RE</fnm></au><au><snm>Khatkar</snm><fnm>MS</fnm></au><au><snm>Raadsma</snm><fnm>HW</fnm></au></aug><source>Genet Sel Evol.</source><pubdate>2009</pubdate><volume>31</volume><fpage>41</fpage><lpage>56</lpage><xrefbib><pubid idtype="doi">10.1186/1297-9686-41-56</pubid></xrefbib></bibl></refgrp>
   </bm>
</art>