<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
<ui>1755-8794-4-47</ui>
<ji>1755-8794</ji>
<fm>
<dochead>Software</dochead>
<bibl>
<title><p>CNVassoc: Association analysis of CNV data using R</p></title>
<aug><au id="A1"><snm>Subirana</snm><fnm>Isaac</fnm><insr iid="I1"/><insr iid="I2"/><insr iid="I3"/><email>isubirana@imim.es</email></au>
<au id="A2"><snm>Diaz-Uriarte</snm><fnm>Ramon</fnm><insr iid="I4"/><email>rdiaz02@gmail.com</email></au>
<au id="A3"><snm>Lucas</snm><fnm>Gavin</fnm><insr iid="I2"/><email>glucas@imim.es</email></au>
<au ca="yes" id="A4"><snm>Gonzalez</snm><mi>R</mi><fnm>Juan</fnm><insr iid="I5"/><insr iid="I1"/><email>jrgonzalez@creal.cat</email></au>
</aug>
<insg>
<ins id="I1"><p>CIBER Epidemiology and Public Health (CIBERESP), Barcelona, Spain</p></ins>
<ins id="I2"><p>Cardiovascular Epidemiology &amp; Genetics group, Inflammatory and Cardiovascular Disease Programme, Institut Municipal d'Investigaci/'o M&#232;dica (IMIM), Barcelona, Spain</p></ins>
<ins id="I3"><p>Statistics Department, University of Barcelona (UB), Barcelona, Spain</p></ins>
<ins id="I4"><p>Structural Biology and Biocomputing Programme, Spanish National Cancer Centre (CNIO), Madrid, Spain</p></ins>
<ins id="I5"><p>Center for Research in Environmental Epidemiology (CREAL), Barcelona, Spain</p></ins>
</insg>
<source>BMC Medical Genomics</source>
<issn>1755-8794</issn>
<pubdate>2011</pubdate>
<volume>4</volume>
<issue>1</issue>
<fpage>47</fpage>
<url>http://www.biomedcentral.com/1755-8794/4/47</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1755-8794-4-47</pubid><pubid idtype="pmpid">21609482</pubid></pubidlist></xrefbib></bibl>
<history><rec><date><day>23</day><month>12</month><year>2010</year></date></rec><acc><date><day>24</day><month>5</month><year>2011</year></date></acc><pub><date><day>24</day><month>5</month><year>2011</year></date></pub></history><cpyrt><year>2011</year><collab>Subirana et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec><st><p>Abstract</p></st>
<sec><st><p>Background</p></st>
<p>Copy number variants (CNV) are a potentially important component of the genetic contribution to risk of common complex diseases. Analysis of the association between CNVs and disease requires that uncertainty in CNV copy-number calls, which can be substantial, be taken into account; failure to consider this uncertainty can lead to biased results. Therefore, there is a need to develop and use appropriate statistical tools. To address this issue, we have developed <monospace>CNVassoc</monospace>, an R package for carrying out association analysis of common copy number variants in population-based studies. This package includes functions for testing for association with different classes of response variables (e.g. class status, censored data, counts) under a series of study designs (case-control, cohort, etc) and inheritance models, adjusting for covariates. The package includes functions for inferring copy number (CNV genotype calling), but can also accept copy number data generated by other algorithms (e.g. CANARY, CGHcall, IMPUTE).</p>
</sec>
<sec><st><p>Results</p></st>
<p>Here we present a new R package, CNVassoc, that can deal with different types of CNV arising from different platforms such as MLPA o aCGH. Through a real data example we illustrate that our method is able to incorporate uncertainty in the association process. We also show how our package can also be useful when analyzing imputed data when analyzing imputed SNPs. Through a simulation study we show that CNVassoc outperforms CNVtools in terms of computing time as well as in convergence failure rate.</p>
</sec>
<sec><st><p>Conclusions</p></st>
<p>We provide a package that outperforms the existing ones in terms of modelling flexibility, power, convergence rate, ease of covariate adjustment, and requirements for sample size and signal quality. Therefore, we offer CNVassoc as a method for routine use in CNV association studies.</p>
</sec>
</sec>
</abs>
</fm>
<bdy>
<sec><st><p>Background</p></st>
<p>The proportion of variation in risk of complex diseases explained by the single nucleotide polymorphisms (SNPs) that have been discovered in recent years using the genome-wide association approach appears to limited. This has lead to the suggestion that other, possibly more complex, genetic variants could partly explain the remaining disease susceptibility. Technological advances now allow a class of genetic variants known as copy number variants (CNV) to be genotyped with increasing levels of accuracy, and several studies have recently explored the relationship between these variants and risk of complex disease <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Genotyping these kinds of complex genetic markers is still a challenge and current laboratory techniques and platforms often contain a non-negligible percentage of errors. In order to minimise bias in the results of association studies involving CNVs, uncertainty in these copy number calls must be taken into account in the analysis. In addition, large-scale CNV genotyping projects need a tool to automate the analysis of thousands of CNVs. Here, we present <monospace>CNVassoc</monospace>, an R package <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> designed to analyze CNV data. Methodological details of the algorithms and applications implemented in <monospace>CNVassoc</monospace> are described in <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. In addition to these, other techniques, such as accounting for batch effects in inferring copy number status, or modelling other response distributions (Poisson or Weibull for censored data) have now been incorporated into <monospace>CNVassoc</monospace>. In this application note we present an overview of the package. The Additional file <supplr sid="S1">1</supplr> contains a tutorial (the vignette for the package) together with technical notes on the derivation of the likelihoods for the different models.</p>
<suppl id="S1">
<title><p>Additional file 1</p></title>
<text><p><b>User's manual</b>. <monospace>CNVassoc_manual.pdf</monospace> is the user's guide of <monospace>CNVassoc</monospace> package, where detailed examples with real and simulated data are shown, illustrating how to use the <monospace>CNVassoc</monospace> package functions.</p></text>
<file name="1755-8794-4-47-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec><st><p>Implementation</p></st>
<p>We developed a set of functions to analyse copy number variants and integrated them as an R package called <monospace>CNVassoc</monospace>. Also, we created a very extensive manual of the package (vignette) with several examples of real and simulated data explaining how to use the package functions and their capabilities.</p>
<p>The R software is a general purpose and open source program commonly used in all type of statistical analysis. Having incorporated the functions as an R package allows user to take advantage of R flexibility in manipulating the input and the results when analysing CNVs with <monospace>CNVassoc</monospace>. In addition, we structured <monospace>CNVassoc</monospace> functions and results in methods and classes to make the package usage easier and more intuitive.</p>
<sec><st><p>Software main features</p></st>
<p>To date, only one other R package, <monospace>CNVtools</monospace> <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, has been developed that can appropriately incorporate CNV copy number call uncertainty in the test for association between CNVs and disease. However, <monospace>CNVtools</monospace> has some limitations, mainly related to the fact that the copy number calling and association testing steps are combined in a single procedure. The current version of <monospace>CNVtools</monospace> <url>http://bioconductor.org</url> uses complex and computationally intensive algorithms, cannot adjust for covariates, and can only model binary and normally distributed responses. By separating these two steps, <monospace>CNVassoc</monospace> offers significant advances in terms of analytical flexibility and computational speed.</p>
</sec>
<sec><st><p>Inferring copy number status</p></st>
<p>By separating the CNV calling and association testing steps, <monospace>CNVassoc</monospace> allows the user to test for association between CNVs and disease using copy number probabilities from any source. While the use of probability data from more powerful calling algorithms such as CGHcall <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, IMPUTE <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> or CANARY <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> is recommended, <monospace>CNVassoc</monospace> provides several tools for inferring copy number status, where necessary. For example, <monospace>CNVassoc</monospace> can fit a mixture of normal distributions to CNV signal intensity data <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, or assign copy number status by defining a set of signal intensity cut points, which might be useful when analysing probe intensity data from MLPA <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> or qPCR <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. In addition, there is an option to take batch effects into account, in order to reduce false positives and provide robust estimates, as discussed in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
<sec><st><p>Considering batch effect</p></st>
<p>In <monospace>CNVassoc</monospace>, the batch effect has been handled in the following way:</p>
<p>Formally, the intensity signal distribution, <it>y</it>, is supposed to follow a mixture of gaussian distributions,</p>
<p><display-formula><graphic file="1755-8794-4-47-i1.gif"/></display-formula></p>
<p>where, <it>&#981; </it>is the gaussian density function, <it>&#956;<sub>cb </sub></it>and <it>&#963;<sub>cb </sub></it>is the mean and standard deviation respectively of intensity signal for <it>c </it>copy number variants in <it>b</it>-th batch, and <it>w<sub>c </sub></it>is the proportion of individuals with <it>c </it>copies in the population. Notice that mean and standard deviation can vary not only between copy number status but also between batches, but the copy number status prevalences (<it>w<sub>c</sub></it>) not. If <it>&#956;<sub>cb </sub></it>and <it>&#963;<sub>cb </sub></it>varies between batches and batches are associated with the disease/response, then the batch effect exists by definition, and can lead to false association if it is not taken into account <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
<p>In <monospace>CNVassoc</monospace>, specific means, standard deviations and prevalences estimates are calculated separately using data from each batch. Then, prevalences estimates are obtained averaging from specific prevalences:</p>
<p><display-formula><graphic file="1755-8794-4-47-i2.gif"/></display-formula></p>
<p>where <it>n<sub>b </sub></it>is the number of sample individuals in the <it>b</it>-th batch, <it>B </it>is the total number of batches in the sample, and <it>n </it>is the total number of individuals in the sample.</p>
</sec>
</sec>
<sec><st><p>Improved association test</p></st>
<p>To incorporate CNV copy number uncertainty in the association test, <monospace>CNVassoc</monospace> uses a simpler model formulation than that of <monospace>CNVtools</monospace>. This allows us to use the faster Newton-Raphson procedure, which yields not only the effect estimate for the CNV, but also its confidence interval.</p>
</sec>
<sec><st><p>Adjustment for covariates</p></st>
<p><monospace>CNVassoc</monospace> can fit association models adjusted for covariates (age, gender, smoking, etc.), which may be particularly important where it is necessary to adjust for population stratification <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
</sec>
<sec><st><p>Response phenotypes</p></st>
<p><monospace>CNVassoc</monospace> can be used to analyse dichotomous (Binomial), count (Poisson), or continuous (Gaussian) response phenotypes, as well as data from cohort studies (Weibull).</p>
</sec>
<sec><st><p>Inheritance models</p></st>
<p><monospace>CNVassoc</monospace> can perform association analysis under a codominant (additive) model, which assumes a constant effect on phenotype per unit change in copy number, or under a model-free design, which treats each copy number as an independent category.</p>
</sec>
<sec><st><p>Analysis of multiple CNVs</p></st>
<p>To perform association testing of multiple CNVs with greater computational efficiency, a function called multi<monospace>CNVassoc</monospace> has been implemented. When multiple processors are available, it can parallelize association tests using the Snow package <url>http://www.sfu.ca/~sblay/R/snow.html</url>. An example of association tests involving several CNVs is shown in Section 3 of the Additional file <supplr sid="S1">1</supplr> where data from a CGH array is analysed.</p>
</sec>
<sec><st><p>Computational Efficiency</p></st>
<p>Using the same sample sizes and probe signal intensity distributions as used in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, we performed a simulation study in order to compare the performance of the methods implemented in <monospace>CNVassoc</monospace> and <monospace>CNVtools</monospace>. We observed that both methods performed well, but we note that <monospace>CNVassoc</monospace> has a number of important advantages over <monospace>CNVtools</monospace> in terms of computational speed and robustness in situations of limited sample sizes.</p>
</sec>
<sec><st><p>Performing association tests</p></st>
<p>First, an object of class cnv must be created by <monospace>CNVassoc</monospace> or using probabilities from other algorithms. Then, an association test between the CNV and disease can be performed using the <monospace>CNVassoc</monospace> function, which returns an object of class '<monospace>CNVassoc</monospace>'. Associated <monospace>print</monospace> and <monospace>summary</monospace> functions give exhaustive outputs. The (<monospace>CNVtest</monospace>) function computes an overall p-value to test whether a CNV is associated with the disease</p>
</sec>
<sec><st><p>Functions to simulate CNV data</p></st>
<p>In <monospace>CNVassoc</monospace> package, function to simulate CNV data have been implemented. It is possible to simulate data from different type of responses and studies: case-control (<monospace>simCNVdataCaseCon</monospace>), cohort with binary response (<monospace>simCNVdataBinary</monospace>), counting process with poisson-distributed response (<monospace>simCNVdataPois</monospace>), quantitative normal-distributed response (<monospace>simCNVdataNorm</monospace>) and time-to-event with right-censored-weibull-distributed response (<monospace>simCNVdataWeibull</monospace>).</p>
</sec>
<sec><st><p>Association analysis on imputed SNPs</p></st>
<p>Also, it is possible to analyse association of imputed SNPs and response. Taking the genotypes probabilities obtained from any software capable to impute SNPs, such as IMPUTE <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, association analysis for case-control studies, cohort, quantitative or counting response can be performed with <monospace>CNVassoc</monospace>. In section 5 of the Additional file <supplr sid="S1">1</supplr> we show in detail how to analyse a data set downloadable from SNPTEST website which contains probabilities of different imputed genotypes from different SNPs among a set of cases and controls.</p>
</sec>
</sec>
<sec><st><p>Results and Discussion</p></st>
<p>In this section we show the results obtained in inferring copy number status and association analysis on a real data set including 360 cases and 291 controls (data described in <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>). The data contains peaks intensities for two genes arising from an MLPA assay. From this example, we present the main <monospace>CNVassoc</monospace> functions and illustrate how to use them to infer copy number copies and estimate association on case-control status.</p>
<p>A more detailed description of all these analyses and others (imputed SNPs, aCGH data, other phenotypes distributions -poisson, weibull and normal-) can be found in Additional file <supplr sid="S1">1</supplr>.</p>
<sec><st><p>Inferring copy number status</p></st>
<p>Previous to association analysis, inferring copy number status process must be done. To do so, the function cnv is used. In this subsection, gene 2 from MLPA data example is used. This data set can be load from the <monospace>CNVassoc</monospace> package.</p>
<p><monospace>&gt; <it>library(CNVassoc)</it></monospace></p>
<p><monospace>&gt; <it>data(dataMLPA)</it></monospace></p>
<p><monospace>&gt; <it>CNV &lt;- cnv(x = dataMLPA$Gene2, threshold.0 = 0.01, mix.method = "mixdist")</it></monospace></p>
<p>The peak intensities of gene 2 are assumed to follow a mixture of normal distributions, and the method used to estimate this distribution is specified by the <monospace>mix.method</monospace> argument. When threshold.0 = 0.01, all individuals with peak intensities lower than 0.01 are assumed to carry 0 copies. The CNV object is of class cnv, which can be printed and plotted (Figure <figr fid="F1">1</figr>).</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Plot of a cnv object generated from CNV signal intensity data</p></caption><text>
   <p><b>Plot of a cnv object generated from CNV signal intensity data</b>.</p>
</text><graphic file="1755-8794-4-47-1" hint_layout="single"/></fig>
<p><monospace>&gt; <it>CNV</it></monospace></p>
<p><monospace>Inferred copy number variant by a quantitative signal</monospace></p>
<p>&#160;&#160;&#160;<monospace>Method: function mix {package: mixdist}</monospace></p>
<p><monospace>-. Number of individuals: 651</monospace></p>
<p><monospace>-. Copies 0, 1, 2</monospace></p>
<p><monospace>-. Estimated means: 0, 0.2435, 0.4469</monospace></p>
<p><monospace>-. Estimated variances: 0, 0.0041, 0.0095</monospace></p>
<p><monospace>-. Estimated proportions: 0.1306, 0.4187, 0.4507</monospace></p>
<p><monospace>-. Goodness-of-fit test: p-value = 0.4887659</monospace></p>
<p><monospace>-. Note: number of classes has been selected using the best BIC</monospace></p>
<p><monospace>&gt; <it>plot(CNV)</it></monospace></p>
<p>A measure that quantifies the amount of uncertainty in the CNV calling estimation can be computed using the function getQualityScore. Various measures are available; the following is an example of how to obtain the quality score (uncertainty measure) described in the <monospace>CNVtools</monospace> paper <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>:</p>
<p><monospace>&gt; <it>getQualityScore(CNV, type = "CNVtools")</it></monospace></p>
<p><monospace>--CNVtools Quality Score: 3.057171</monospace></p>
<p>In some cases, it may be preferable to infer copy number status using another algorithm that is not implemented in <monospace>CNVassoc</monospace>, e.g. if the probe signal intensities do not follow a mixture of normal distributions. A matrix of copy number probabilities obtained from other algorithms can be used as input for the cnv function to create a cnv class object, which can then be used to perform association analysis. Also, it is possible to take suspected batch effects in the signal intensity distributions into account by specifying the batch variable using the batch argument in the cnv function. This is important in order to avoid false positives in the posterior association model estimation, as suggested in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. A more detailed explanation and example of this issue can be found in section 4.2 of Additional File <supplr sid="S1">1</supplr>.</p>
</sec>
<sec><st><p>Performing association models</p></st>
<p>To carry out association analysis between CNV and disease, the function <monospace>CNVassoc</monospace> is used. This function incorporates copy number call uncertainty by using a latent class model as described in <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The response variable (disease) can be: binary, quantitative (normally distributed), from a counting process, time to event (Weibull distributed). Also, an additive or model-free pattern of inheritance can be analysed. The result returned by the <monospace>CNVassoc</monospace> function is an object that can be printed and summarized and its structure is very similar to other well known R functions such as <monospace>glm</monospace>.</p>
<p>Here, we continue with the same MLPA data taking the CNV object for gene 2 in the previous section. To fit a logistic regression model with case-control status as a response and CNV copy number as a predictor, and assuming an additive genetic effect, we type</p>
<p><monospace>&gt; <it>mod &lt;- CNVassoc(casco ~ CNV, data = dataMLPA)</it></monospace></p>
<p><monospace>&gt; <it>summary(mod)</it></monospace></p>
<p><monospace>Call:</monospace></p>
<p><monospace>CNVassoc(formula = casco ~ CNV, data = dataMLPA)</monospace></p>
<p><monospace>Deviance: 876.396</monospace></p>
<p><monospace>Number of parameters: 3</monospace></p>
<p><monospace>Number of individuals: 651</monospace></p>
<p><monospace>Coefficients:</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;OR lower.lim upper.lim&#160;&#160;&#160;&#160;&#160;SE&#160;&#160;&#160;&#160;&#160;&#160;Stat&#160;&#160;&#160;&#160;pvalue</monospace></p>
<p><monospace>CNV0&#160;&#160;&#160;1.0000</monospace></p>
<p><monospace>CNV1&#160;&#160;&#160;0.4772&#160;&#160;&#160;0.2742&#160;&#160;&#160;0.8304&#160;&#160;&#160;0.2827&#160;&#160;&#160;-2.6172&#160;&#160;&#160;&#160;&#160;&#160;0.009</monospace></p>
<p><monospace>CNV2&#160;&#160;&#160;0.3169&#160;&#160;&#160;0.1834&#160;&#160;&#160;0.5477&#160;&#160;&#160;0.2791&#160;&#160;&#160;-4.1169&#160;&#160;&#160;3.84e-05</monospace></p>
<p><monospace>(Dispersion parameter for binomial family taken to be 1)</monospace></p>
<p><monospace>Covariance between coefficients:</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;CNV0&#160;&#160;&#160;&#160;&#160;CNV1&#160;&#160;&#160;&#160;&#160;CNV2</monospace></p>
<p><monospace>CNV0&#160;&#160;&#160;0.0613&#160;&#160;0.0000&#160;&#160;&#160;0.0000</monospace></p>
<p><monospace>CNV1&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;0.0186&#160;&#160;-0.0032</monospace></p>
<p><monospace>CNV2&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;0.0166</monospace></p>
<p>By applying the summary function to the result, we obtain odds ratios, confidence intervals, and p-values for every copy number status with respect to the reference copy number category.</p>
<p>To compute the global CNV significance p-value, the CNVtest function can be used as follows:</p>
<p><monospace>&gt; <it>CNVtest(mod, "LRT")</it></monospace></p>
<p><monospace>----CNV Likelihood Ratio Test----</monospace></p>
<p><monospace>Chi = 18.75453 (df = 2), pvalue = 8.462633e-05</monospace></p>
<p>In this example, a Likelihood Ratio Test (LRT) is computed, comparing a model containing CNV to a model lacking CNV (i.e. a model without predictors or the null model).</p>
<p>Using the <monospace>CNVassoc</monospace> function it is possible to change the inheritance model to additive (changing the model argument), or adjust for other covariates (such as age, sex, or principal components) in the formula argument in the usual way. Also, other types of response can be analysed changing the family argument. More detailed examples are in the Additional file <supplr sid="S1">1</supplr>.</p>
</sec>
<sec><st><p>Response phenotypes: Weibull</p></st>
<p>In this section, we illustrate how to analyse a time-to-event response variable (Weibull distributed) using simulated data generated with the function simCNVdataWeibull. In the following example, a CNV has been generated with 0, 1 and 2 possible copies with probabilities of 25%, 50% and 25% respectively, with intensity signal standard deviation of 0.4 for each copy status, and means of 0, 1 and 2 respectively. The response variable has been simulated under a Weibull distribution with shape parameter equal to 1 and disease incidence equal to 0.05 (per person-year) among the population with zero copies (reference). The proportion of observed events (non-censored) was set to 10%. Finally, these data have been generated assuming a additive CNV effect with a Hazard Ratio of 1.5 per copy.</p>
<p><monospace>&gt; <it>set.seed(123456)</it></monospace></p>
<p><monospace>&gt; <it>n &lt;- 5000</it></monospace></p>
<p><monospace>&gt; <it>w &lt;- c(0.25, 0.5, 0.25)</it></monospace></p>
<p><monospace>&gt; <it>mu.surrog &lt;- 0:2</it></monospace></p>
<p><monospace>&gt; <it>sd.surrog &lt;- rep(0.4, 3)</it></monospace></p>
<p><monospace>&gt; <it>hr &lt;- 1.5</it></monospace></p>
<p><monospace>&gt; <it>incid0 &lt;- 0.05</it></monospace></p>
<p><monospace>&gt; <it>lambda &lt;- c(incid0, incid0 * hr, incid0 * hr^2)</it></monospace></p>
<p><monospace>&gt; <it>shape &lt;- 1</it></monospace></p>
<p><monospace>&gt; <it>scale &lt;- lambda^(-1/shape)</it></monospace></p>
<p><monospace>&gt; <it>perc.obs &lt;- 0.1</it></monospace></p>
<p><monospace>&gt; <it>time.cens &lt;- qweibull(perc.obs, mean(shape), mean(scale))</it></monospace></p>
<p><monospace>&gt; <it>dsim &lt;- simCNVdataWeibull(n, mu.surrog, sd.surrog, w, lambda,</it></monospace></p>
<p><monospace>+&#160;&#160;&#160;<it>shape, time.cens)</it></monospace></p>
<p>Once the CNV data and phenotype has been generated, inferring copy number status and fitting the association model is performed in the following two steps:</p>
<p indent="1">(1) Inferring copy number status, as for case-control studies:</p>
<p indent="1"><monospace>&gt; <it>CNV &lt;- cnv(dsim$surrog, mix = "mclust")</it></monospace></p>
<p indent="1"><monospace>&gt; <it>attr(CNV, "num.copies") &lt;- 0:2</it></monospace></p>
<p indent="1">Note that 3 copy number statuses has been estimated by BIC criteria. By default 1, 2 and 3 copies are assigned. The number of copies for each status can be changed to 0, 1 and 2 respectively by modifying the num.copies attribute.</p>
<p indent="1">2) Testing for association between CNV and time-to-event, specifying the family argument as "weibull":</p>
<p indent="1"><monospace>&gt; <it>fit &lt;- CNVassoc(Surv(resp, cens) ~ CNV, data = dsim, family = "weibull",</it></monospace></p>
<p indent="1"><monospace>+&#160;&#160;&#160;<it>model = "add")</it></monospace></p>
<p indent="1"><monospace>&gt; <it>coef(summary(fit))</it></monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;HR lower.lim&#160;&#160;upper.lim&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;SE&#160;&#160;&#160;&#160;&#160;&#160;stat&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;pvalue</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;trend&#160;&#160;&#160;1.385556&#160;&#160;&#160;1.205619&#160;&#160;&#160;1.592348&#160;&#160;&#160;0.07097498&#160;&#160;&#160;4.594595&#160;&#160;&#160;4.335896e-06</monospace></p>
<p indent="1">Note that, Hazard Ratios (HR) are displayed instead of Odds Ratios. In this case, an additive CNV effect has been assumed in performing the association model.</p>
</sec>
<sec><st><p>Computational Efficiency</p></st>
<p>In this section, we compare the performance of <monospace>CNVassoc</monospace> in terms of speed and convergence rate to that of <monospace>CNVtools</monospace>, which is the only other tool that is currently available for performing CNV association analysis, while correctly taking copy number uncertainty into account. Simulated case-control data was generated for different sample sizes (500 cases and 500 controls; 2,000 cases and 2,000 controls), and different degrees of call uncertainty, from very little uncertainty (<it>Q </it>= 6) to a moderate-high degree of uncertainty (<it>Q </it>= 3). A single CNV marker has been simulated using 1,000 iterations (simulations), under the described scenarios. In each simulation, univariate probe signal intensities (similar to MLPA) have been generated from a gaussian mixture distribution, and copy number status has been inferred from them. After this, an association model has been performed using the proposed method (Latent Class model). The uncertainty measure, <it>Q</it>, was proposed by <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> (see page 3); values of <it>Q </it>below 3 indicate moderate-high uncertainty and this must be taking into account in the association analysis, while values of <it>Q </it>bigger than 4.5 or 5 indicate that uncertainty is almost insignificant. Table <tblr tid="T1">1</tblr> shows the number of times model estimation fails using <monospace>CNVassoc</monospace> and <monospace>CNVtools</monospace> under these various scenarios. <monospace>CNVassoc</monospace> converges in all simulations, except when sample size is small and uncertainty is high. When sample size is high (2,000 cases and 2,000 controls) <monospace>CNVassoc</monospace> converges in all situations, while <monospace>CNVtools</monospace> fails in some simulations when uncertainty is high. And when sample size is moderate-low (500 cases and 500 controls), <monospace>CNVassoc</monospace> converges almost in all times except when uncertainty is high (<it>Q </it>&lt; 3.5), while <monospace>CNVtools</monospace> fails in some simulations even when the degree of uncertainty is low (<it>Q </it>= 6) and starts to fail in the majority of situations when uncertainty is moderate (<it>Q </it>&lt; 4) and performs even worse when is high.</p>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Number of failed convergence simulations out of 500 using CNVassoc and CNVtools according to inferring copy number uncertainty <it>Q </it>and number of cases <it>N</it>.</p></caption><tblbdy cols="5">
      <r>
         <c>
            <p/>
         </c>
         <c cspan="2" ca="center">
            <p>
               <b><it>N </it>= 2000</b>
            </p>
         </c>
         <c cspan="2" ca="center">
            <p>
               <b><it>N </it>= 500</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c cspan="2">
            <hr/>
         </c>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>
                  <it>Q</it>
               </b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CNVassoc</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CNVtools</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CNVassoc</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>CNVtools</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>6.0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>15</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5.5</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>20</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>5.0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>65</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>4.5</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>92</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>4.2</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>187</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>4.0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>246</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>3.7</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>294</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>3.5</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>1</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>299</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>3.2</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>13</p>
         </c>
         <c ca="center">
            <p>212</p>
         </c>
         <c ca="center">
            <p>389</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>3.0</p>
         </c>
         <c ca="center">
            <p>0</p>
         </c>
         <c ca="center">
            <p>65</p>
         </c>
         <c ca="center">
            <p>331</p>
         </c>
         <c ca="center">
            <p>400</p>
         </c>
      </r>
   </tblbdy></tbl>
<p>We have also observed a marked difference in the speed of each procedure: when analyzing 10,000 CNVs in 2,000 cases and 2,000 controls, and with a <it>Q </it>= 4, <monospace>CNVtools</monospace> took 1 day and 17 hours to complete the analysis, whereas <monospace>CNVassoc</monospace> took just 90 minutes; with <it>Q </it>= 3, <monospace>CNVtools</monospace> took 6 days and 16 hours, but <monospace>CNVassoc</monospace> took only 2 hours. More comparisons between <monospace>CNVassoc</monospace> and <monospace>CNVtools</monospace> are shown in section 4.3.1 of Additional file <supplr sid="S1">1</supplr>.</p>
</sec>
</sec>
<sec><st><p>Conclusions</p></st>
<p>We present a new package for performing analysis of association between copy number variants and disease, appropriately taking uncertainty in CNV copy number calls into account. The numerical procedure for fitting the model is simple and computationally efficient, handling thousands of CNVs in reasonable time. In addition, it is possible to adjust for covariates which may be necessary to control for population stratification. A central feature of <monospace>CNVassoc</monospace> is that input data can come from any CNV calling algorithm that produces copy number probabilities. Note that the <monospace>CNVassoc</monospace> package can also be applied to SNPs. For instance, in the context of imputed SNPs (e.g., IMPUTE <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, BIMBAM <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, MACH1 <url>http://www.sph.umich.edu/csg/abecasis/MACH/</url>, etc.) the probability estimates of each genotype coming from this software can easily be incorporated to our functions. We intend to continue developing the package, and expect to incorporate CNV * non-genetic predictor interactions, and CNV * CNV interactions, in the near future.</p>
<p>In conclusion, considering the advantages in terms of modelling flexibility, power, convergence rate, ease of covariate adjustment, and requirements for sample size and signal quality, we offer <monospace>CNVassoc</monospace> as a method for routine use in CNV association studies.</p>
</sec>
<sec><st><p>Availability and requirements</p></st>
<p>1. Project name: <monospace>CNVassoc</monospace></p>
<p>2. Project home page: <url>http://www.creal.cat/jrgonzalez/software.htm</url> and <url>http://www.cran.r-project.org</url></p>
<p>3. Operating system(s): Platform independent</p>
<p>4. Programming language: R</p>
<p>5. R Dependencies: mixdist, mclust, survival</p>
<p>6. R Suggested: CGHcall, CGHregions, snow, <monospace>CNVtools</monospace></p>
<p>7. License: GPL or newer</p>
</sec>
<sec><st><p>Competing interests</p></st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec><st><p>Authors' contributions</p></st>
<p>JRG conceived the idea of incorporation probabilities to address uncertainty in CNV association studies. IS and JRG created the R functions and the package. IS implemented some R functions to simulate CNV data. GL drafted the manuscript. IS, GL, RD-U and JRG designed, performed and interpreted the simulation studies to compare <monospace>CNVtools</monospace> and <monospace>CNVassoc</monospace>. IS, RD-U and JRG helped to draft the manuscript. All authors read and approved the final manuscript.</p>
</sec>
</bdy>
<bm>
<ack><sec><st><p>Acknowledgements</p></st>
<p>The authors would like to express their gratitude to Dave MacFarlane and Alejandro Caceres for their helpful comments and for reviewing the manuscript. This work has been supported by the Spanish Ministry of Science and Innovation (MTM2008-02457 to JRG, BIO2009-12458 to RD-U and statistical genetics network MTM2010-09526-E (subprograma MTM) to JRG, IS, GL and RD-U). GL is supported by the Juan de la Cierva Program of the Spanish Ministry of Science and Innovation.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility</p></title><aug><au><snm>Gonzalez</snm><fnm>E</fnm></au><au><snm>Kulkarni</snm><fnm>H</fnm></au><au><snm>Bolivar</snm><fnm>H</fnm></au><au><snm>Mangano</snm><fnm>A</fnm></au><au><snm>Sanchez</snm><fnm>R</fnm></au><au><snm>Catano</snm><fnm>G</fnm></au><au><snm>Nibbs</snm><fnm>RJ</fnm></au><au><snm>Freedman</snm><fnm>BI</fnm></au><au><snm>Quinones</snm><fnm>MP</fnm></au><au><snm>Bamshad</snm><fnm>MJ</fnm></au><au><snm>Murthy</snm><fnm>KK</fnm></au><au><snm>Rovin</snm><fnm>BH</fnm></au><au><snm>Bradley</snm><fnm>W</fnm></au><au><snm>Clark</snm><fnm>RA</fnm></au><au><snm>Anderson</snm><fnm>SA</fnm></au><au><snm>O&apos;Connell</snm><fnm>RJ</fnm></au><au><snm>Agan</snm><fnm>BK</fnm></au><au><snm>Ahuja</snm><fnm>SS</fnm></au><au><snm>Bologna</snm><fnm>R</fnm></au><au><snm>Sen</snm><fnm>L</fnm></au><au><snm>Dolan</snm><fnm>MJ</fnm></au><au><snm>Ahuja</snm><fnm>SK</fnm></au></aug><source>Science</source><pubdate>2005</pubdate><volume>307</volume><issue>5714</issue><fpage>1434</fpage><lpage>40</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1101160</pubid><pubid idtype="pmpid" link="fulltext">15637236</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Hereditary pancreatitis caused by triplication of the trypsinogen locus</p></title><aug><au><snm>Le Marechal</snm><fnm>C</fnm></au><au><snm>Masson</snm><fnm>E</fnm></au><au><snm>Chen</snm><fnm>JM</fnm></au><au><snm>Morel</snm><fnm>F</fnm></au><au><snm>Ruszniewski</snm><fnm>P</fnm></au><au><snm>Levy</snm><fnm>P</fnm></au><au><snm>Ferec</snm><fnm>C</fnm></au></aug><source>Nat Genet</source><pubdate>2006</pubdate><volume>38</volume><issue>12</issue><fpage>1372</fpage><lpage>4</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng1904</pubid><pubid idtype="pmpid" link="fulltext">17072318</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><aug><au><cnm>R Development Core Team</cnm></au></aug><source>R: A Language and Environment for Statistical Computing</source><publisher>R Foundation for Statistical Computing, Vienna, Austria</publisher><pubdate>2008</pubdate><url>http://www.R-project.org</url><note>[ISBN 3-900051-07-0]</note></bibl><bibl id="B4"><title><p>Accounting for uncertainty when assessing association between copy number and disease: a latent class model</p></title><aug><au><snm>Gonzalez</snm><fnm>JR</fnm></au><au><snm>Subirana</snm><fnm>I</fnm></au><au><snm>Escaramis</snm><fnm>G</fnm></au><au><snm>Peraza</snm><fnm>S</fnm></au><au><snm>Caceres</snm><fnm>A</fnm></au><au><snm>Estivill</snm><fnm>X</fnm></au><au><snm>Armengol</snm><fnm>L</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2009</pubdate><volume>10</volume><fpage>172</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-10-172</pubid><pubid idtype="pmcid">2707368</pubid><pubid idtype="pmpid" link="fulltext">19500389</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>A robust statistical method for case-control association testing with Copy Number Variation</p></title><aug><au><snm>Barnes</snm><fnm>C</fnm></au><au><snm>Plagnol</snm><fnm>V</fnm></au><au><snm>Fitzgerald</snm><fnm>T</fnm></au><au><snm>Redon</snm><fnm>R</fnm></au><au><snm>Marchini</snm><fnm>J</fnm></au><au><snm>Clayton</snm><fnm>D</fnm></au><au><snm>Hurles</snm><fnm>ME</fnm></au></aug><source>Nature Genetics</source><pubdate>2008</pubdate><volume>40</volume><issue>10</issue><fpage>1245</fpage><lpage>52</lpage><url>http://cnv-tools.sourceforge.net/CNVtools.html</url><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.206</pubid><pubid idtype="pmcid">2784596</pubid><pubid idtype="pmpid" link="fulltext">18776912</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>CGHcall: calling aberrations for array CGH tumor profiles</p></title><aug><au><snm>van de Wiel</snm><fnm>MA</fnm></au><au><snm>Kim</snm><fnm>KI</fnm></au><au><snm>Vosse</snm><fnm>SJ</fnm></au><au><snm>van Wieringen</snm><fnm>WN</fnm></au><au><snm>Wilting</snm><fnm>SM</fnm></au><au><snm>Ylstra</snm><fnm>B</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>7</issue><fpage>892</fpage><lpage>894</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm030</pubid><pubid idtype="pmpid" link="fulltext">17267432</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>A new multipoint method for genome-wide association studies via imputation of genotypes</p></title><aug><au><snm>Marchini</snm><fnm>J</fnm></au><au><snm>Howie</snm><fnm>B</fnm></au><au><snm>Myers</snm><fnm>S</fnm></au><au><snm>McVean</snm><fnm>G</fnm></au><au><snm>Donnelly</snm><fnm>P</fnm></au></aug><source>Nature Genetics</source><pubdate>2007</pubdate><volume>39</volume><fpage>906</fpage><lpage>913</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng2088</pubid><pubid idtype="pmpid" link="fulltext">17572673</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>A flexible and accurate genotype imputation method for the next generation of genome-wide association studies</p></title><aug><au><snm>Howie</snm><fnm>BN</fnm></au><au><snm>Donnelly</snm><fnm>P</fnm></au><au><snm>Marchini</snm><fnm>J</fnm></au></aug><source>PLoS Genetics</source><pubdate>2009</pubdate><volume>6</volume></bibl><bibl id="B9"><title><p>Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs</p></title><aug><au><snm>Korn</snm><fnm>JM</fnm></au><au><snm>Kuruvilla</snm><fnm>FG</fnm></au><au><snm>McCarroll</snm><fnm>SA</fnm></au><au><snm>Wysoker</snm><fnm>A</fnm></au><au><snm>Nemesh</snm><fnm>J</fnm></au><au><snm>Cawley</snm><fnm>S</fnm></au><au><snm>Hubbell</snm><fnm>E</fnm></au><au><snm>Veitch</snm><fnm>J</fnm></au><au><snm>Collins</snm><fnm>PJ</fnm></au><au><snm>Darvishi</snm><fnm>K</fnm></au><au><snm>Lee</snm><fnm>C</fnm></au><au><snm>Nizzari</snm><fnm>MM</fnm></au><au><snm>Gabriel</snm><fnm>SB</fnm></au><au><snm>Purcell</snm><fnm>S</fnm></au><au><snm>Daly</snm><fnm>MJ</fnm></au><au><snm>Altshuler</snm><fnm>D</fnm></au></aug><source>Nat Genet</source><pubdate>2008</pubdate><volume>40</volume><issue>10</issue><fpage>1253</fpage><lpage>60</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/ng.237</pubid><pubid idtype="pmcid">2756534</pubid><pubid idtype="pmpid" link="fulltext">18776909</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Combined Algorithms for Fitting Finite Mixture Distributions</p></title><aug><au><snm>Du</snm><fnm>J</fnm></au></aug><source>PhD thesis</source><publisher>McMaster University, Ontario, Canada</publisher><pubdate>2002</pubdate></bibl><bibl id="B11"><title><p>Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification</p></title><aug><au><snm>Schouten</snm><fnm>JP</fnm></au><au><snm>McElgunn</snm><fnm>CJ</fnm></au><au><snm>Waaijer</snm><fnm>R</fnm></au><au><snm>Zwijnenburg</snm><fnm>D</fnm></au><au><snm>Diepvens</snm><fnm>F</fnm></au><au><snm>G</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2002</pubdate><volume>30</volume><issue>12</issue><fpage>e57</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gnf056</pubid><pubid idtype="pmcid">117299</pubid><pubid idtype="pmpid" link="fulltext">12060695</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data</p></title><aug><au><snm>Hellemans</snm><fnm>J</fnm></au><au><snm>Mortier</snm><fnm>G</fnm></au><au><snm>De Paepe</snm><fnm>A</fnm></au><au><snm>Speleman</snm><fnm>F</fnm></au><au><snm>Vandesompele</snm><fnm>J</fnm></au></aug><source>Genome Biol</source><pubdate>2007</pubdate><volume>8</volume><issue>2</issue><fpage>R19</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2007-8-2-r19</pubid><pubid idtype="pmcid">1852402</pubid><pubid idtype="pmpid" link="fulltext">17291332</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Multiple correspondence discriminant analysis: An application to detect stratification in copy number variation</p></title><aug><au><snm>Caceres</snm><fnm>A</fnm></au><au><snm>Basaga&#241;a</snm><fnm>X</fnm></au><au><snm>Gonzalez</snm><fnm>J</fnm></au></aug><source>Stat Med</source><pubdate>2010</pubdate><volume>29</volume><issue>10</issue><fpage>3284</fpage><lpage>93</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">21170921</pubid></xrefbib></bibl><bibl id="B14"><title><p>Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits</p></title><aug><au><snm>Servin</snm><fnm>B</fnm></au><au><snm>Stephens</snm><fnm>M</fnm></au></aug><pubdate>2007</pubdate><volume>3</volume><issue>7</issue><fpage>e114</fpage></bibl></refgrp>
<sec><st><p>Pre-publication history</p></st><p>The pre-publication history for this paper can be accessed here:</p><p><url>http://www.biomedcentral.com/1755-8794/4/47/prepub</url></p></sec></bm>
</art>