Additional file 1.

Number of sequences to correct erroneous positions. 1a: this file illustrates the number of sequences necessary to obtain a majority of correct sequences. The x-axis shows the error rate and the y-axis shows the number of sequences needed, according to three possible probabilities: 0.001 0.01 and 0.05. 1b the x-axis shows the error rate for a given position (ranging from 0 to 0.5); the y-axis shows the cumulative proportion of erroneous sequences sampled (ranging from 0 to 0.5) in the total sample. Sample size varies from 10 to 100, 500 and 1,000 sequences. For a given error rate and a cumulative proportion of erroneous sequences in the sample of size N, the probability of observing this combination is indicated in color: green: 1 to 0.95, blue: 0.95 to 0.8, yellow: 0.8 to 0.6, orange: 0.6 to 0.5, red: 0.5 to 0.4, gray: 0.4 to 0.2 and white: below 0.2. For example, if the error rate is 0.2, the probability of observing a cumulative proportion of erroneous sequences in the sample of between 0 and 0.2 ranges between 0.4 and 0.5 (red envelope). In this case, the probability of there being 20% erroneous sequences in the sample is between 0.4 and 0.5. If we consider the same error rate (0.2) with 40% erroneous sequences, then the probability ranges from 0.8 to 0.95 (blue envelope). If N increases, the variance of the probability envelopes decreases.

Format: PDF Size: 5.3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Gilles et al. BMC Genomics 2011 12:245   doi:10.1186/1471-2164-12-245