Next generation targeted resequencing is replacing Sanger sequencing at high pace in routine genetic diagnosis. The need for well validated, high quality enrichment platforms to complement the bench-top next generation sequencing devices is high.
We used the WaferGen Smartchip platform to perform highly parallelized PCR based target enrichment for a set of known cancer genes in a well characterized set of cancer cell lines from the NCI60 panel. Optimization of PCR assay design and cycling conditions resulted in a high enrichment efficiency. We provide proof of a high mutation rediscovery rate and have included technical replicates to enable SNP calling validation demonstrating the high reproducibility of our enrichment platform.
Here we present our custom developed quantitative PCR based target enrichment platform. Using highly parallel nanoliter singleplex PCR reactions makes this a flexible and efficient platform. The high mutation validation rate shows this platform’s promise as a targeted resequencing method for multi-gene routine sequencing diagnostics.
Keywords:Next generation sequencing; Target enrichment; Sequence capture; Quantitative PCR; NCI60; Mutation detection
The advent of next generation sequencing technology has unleashed a wealth of targeted resequencing experiments in all fields of genomics . The field of multi-gene disease diagnostic sequencing is changing rapidly with a shift from conventional Sanger sequencing to targeted next generation sequencing. In addition, many researchers face the daunting task of validating large sets of genomic variants resulting form large scale resequencing studies that investigate the human exome or whole genome [2-5]. Sanger sequencing has long been the gold standard sequencing technology and remains an important method for small scale sequencing experiments and routine genetic diagnostics. Compared to next generation sequencing, Sanger sequencing is a labor intensive and relatively expensive technology. Both the PCR sequencing reaction and interpretation of the sequencing trace files are a time consuming processes making Sanger not the most ideal technology for multi-gene studies or large scale variant confirmation. Even for a diagnostic target for which validated sequencing assays are available, interpretation of the Sanger trace file is a semi-automatic process at best, often requiring human review (see Mitchelson et al. for review) [6,7]. More so, in many genetic studies sample heterogeneity or exceptions to the classical bi-allelic state of the genome make this analysis even more challenging, if not impossible.
Next generation sequencing can tackle most of these challenges. The release of bench-top scale sequencing machines has paved the way to multi-gene targeted next generation sequencing diagnostics. The challenge of the upfront target enrichment has now become the bottleneck for many sequencing applications. Many probe or PCR based single tube sequence capture techniques currently exist. These methods typically require extensive optimization to reach the quality standards set in many Sanger sequencing diagnostic facilities. Most diagnostic labs have already invested in the optimization of PCR assays for the genomic regions of interest; it is therefore problematic to perform this optimization again in order to switch sequencing platforms.
Here, we present a new platform for detecting genetic variants directed at multi-gene disease diagnostics. By optimizing several steps in a custom PCR based sequence enrichment strategy and upscaling this strategy using a highly parallel nanoliter quantitative PCR instrument, we developed a highly flexible enrichment protocol that has a high efficiency, a near perfect target specificity and scales to address the challenges discussed earlier. Our workflow allows the selective resequencing of hundreds to a few thousands of targets in a single analysis, greatly reducing the overall validation cost and effort. It even allows the researcher to re-use previously optimized assays in a highly parallel fashion. In a proof of concept study, we have rediscovered known mutations in well characterized cancer cell lines. In addition, we have used objective quality parameters that enable transparent and robust inter-platform comparisons.
For this technical proof of concept study we aimed at resequencing a set of genes known to be mutated in cancer samples. We selected 15 cell lines from the NCI60 panel  for which high quality mutation data are made available through the cosmic database  for a large list of known cancer genes.
A selection of 15 cancer cell lines and 2 normal control samples were included in this study (Table 1). In addition, the enrichment was repeated on the two normal control samples and one of the cell lines (MCF7) to evaluate the technical reproducibility of the platform. A total of 360 pg of input DNA (~112 gene copies) was used per nanoliter PCR reaction.
Table 1. Samples and known mutations
The SmartChip nanowell platform (WaferGen Biosystems) is an ultra-high throughput quantitative PCR (qPCR) platform used for large scale gene expression studies or digital PCR. To address the problem of PCR product collection, WaferGen Biosystems specifically developed a capture system for the nanowell SmartChip. By reverse centrifuging of the capture chips in custom capture devices, PCR products were collected from the nanowell chips. A prototype capture device for the 5184 reaction well chips has been used in some preliminary testing, but for this study we have used a 4 quadrant chip layout (841 wells per quadrant, see Figure 1) and the matching disposable extraction fixtures to perform target capture of up to 4 samples on a single chip. Evidently, any combination of samples and amplicons is possible; ranging from 4 times a maximum of 841 amplicons, 2 times up to 1682 amplicons or 1 time 3364 amplicons, allowing for a maximum in experimental flexibility. The MyDesign dispenser provides exact control over the dispensing of primers and samples in the reaction wells.
Figure 1. Nanowell chip. 4 quadrant nanowell chip used in this experiment.
A list of 16 known cancer genes with diagnostic relevance was selected for resequencing (Table 2). The genes were selected to harbor mutations in the selected NCI60 cell lines. Primers were designed to amplify all exons of the genes using primerXL, our quantitative PCR primer design tool adapted for resequencing primer design (http://www.primerxl.org webcite) (Lefever et al., in preparation). A total of 376 amplicons were designed using tiling settings, taking into account known SNP positions and with a target annealing temperature of 60°C. The average amplicon length is 441 basepairs (bp) with a range of 319 to 745 bp. Primer and amplicon information is listed in Additional file 1. The total target region comprised of 165 811 bp. Primers were ordered from Integrated DNA Technologies.
Table 2. Genes and target regions
Format: XLSX Size: 122KB Download file
Dispensing of the reaction components in the SmartChip reaction wells is a 2 step process performed on the SmartChip Multiplesample Nanodispenser. The first step is the dispensing of 50 nl of a primer-combination. For this a 500 nM primer and 2X Bio-Rad SsoAdvanced SYBR PCR mastermix solution is presented in a 384 well plate format to the Nanodispeser machine and dispensed on the SmartChip wells. The second step consists of the dispensing of another 50 nl of a 7.2 ng/μl sample DNA solution to the nanowells containing the primer mastermix combination. The final reaction volume of 100 nl consists of 360 pg of template DNA and 250 nM of forward and reverse primer in a 1X mastermix solution. A rough visual quality control of the dispensing steps is carried out by visualizing the nanowell chips by means of a magnifying glass.
With 376 targets in this experiment we were able to use the 4 quadrant nanowell chips to capture 4 samples per chip. The primers were spotted in duplicate for most of the samples. Some samples were repeated with singlet PCR reaction per chip to evaluate both amplification and DNA extraction efficiency and reproducibility; unused wells were left empty (see Table 3 for experimental replicate configuration).
Table 3. Experimental duplicate layout, read and coverage statistics
SmartChips containing the assay and samples were cycled in the SmartChip Cycler (WaferGen Biosystems) using the following thermal parameters: 3 minutes at 95°C, 40 cycles composed of 30 seconds at 95°C and 60 seconds at 60°C. This amplification protocol was optimized for sequence enrichment of long PCR fragments and is deviant from the default qPCR protocol in that it has a significantly longer annealing/extension phase. Immediately following amplification, melt curve analysis was performed from 60°C to 97°C (0.4°C/step). After cycling, 4 disposable extraction fixtures were attached to each SmartChip, one fixture per quadrant, and PCR products were collected in 0.2 ml PCR tubes by means of centrifugation (1 sample per tube) at 3500 rpm for 15 minutes.
Library preparation and sequencing
PCR pools were purified using AMPure beads XT (Beckman Coulter). The concentration of each pool was measured using the dsDNA assay kit on the Qubit fluorometer (Invitrogen) and fragment analysis occurred on a BioAnalyzer 2100 using the high sensitivity chip (Agilent). Library preparation and sequencing was carried out by the Nucleomics core facility at the Flemisch institute for biotechnology (VIB). Nextera XT (Illumina) library preparation on all 24 PCR pools occurred following the manufacturers recommendation using 1 ng of each sample pool as input. In short the Nextera transposase ensures random fragmentation and adaptor ligation to the amplicons after which a dual barcoding occurred during amplification, followed by purification on AMPure beads. The molarity of each library was determined using the concentration (measured by Qubit) and fragment length (BioAnalyzer). Libraries were diluted to equal molarity and finally pooled by using equal volumes of each library. This sequencing pool was diluted to 10 nM, and finally 95% of sequencing pool (6 pM) and 5% of Phix control (8 pM) were mixed and loaded into the flowcell of a MiSeq (Illumina) instrument. Sequencing was performed for 150 bp in paired end mode.
Raw sequencing data was demultiplexed on the MiSeq instrument using the manufacturer’s software. Of the 23 076 775 reads obtained, 1.6% were lost due to an unrecognizable index. The sequencing resulted in an average read count per sample of 945 432 (range 472 237–1 608 833, see Table 3 for reads per sample). Mapping was performed to build 37 of the human reference genome (Genome reference consortium GRCh37) using BWA  (v. 0.5.9). Reads were quality recalibrated using the Genome analysis toolkit  (v. 1.6-13-g91f02df) and duplicate reads were removed using Picard tools (v. 1.59).
Variants were called using the Genome Analysis Toolkit unified genotyper  (v 1.6-13-g91f02df). Variants were annotated and sample calls were compared using our custom cloud based analysis platform seqplorer (http://www.seqplorer.org webcite) (De Wilde et al., in preparation).
Coverage data was extracted for each sample using samtools depth option on each genomic position. Both the amplicon locations and the coding exon locations of genes captured in this experiment were used as the target locations for calculating the coverage statistics as indicated in the results section. As our goal was to evaluate the capture platform and not the subsequent library prep and sample pooling we eliminated the inter-sample coverage by normalizing the coverage for each position by the mean coverage per sample.
Results and discussion
To evaluate the overall technical performance, efficiency and reproducibility of this novel PCR based sequence enrichment platform we included several levels of quality control. First, through the resequencing of known cancer genes on a set of cancer cell lines with known mutation status, we were able to evaluate the rediscovery rate. Second, SNP calls were compared to public datasets to assess accuracy or compared between replicated samples to evaluate the reproducibility of the variant calling. Third, an objective coverage evaluation among technical replicates as well as different samples was performed.
qPCR performance metrics
One of the advantages of using a massively parallel quantitative PCR platform for target capture is the upfront quality check that can be performed on the amplification curves. All chip amplification curve profiles looked satisfactory with only 1.16% reaction dropout (defined as a Cq > 29). Although a weak correlation exists between the Cq value of the amplification reaction and the amplicon coverage (R2 = 0.216), no clear correlation was observed between either amplicon length or end-point fluorescence and sequence coverage. Overlapping or tiling assays were excluded from this analysis as the coverage cannot be unambiguously attributed to one of the overlapping assays.
We obtained roughly 0.95 (stdev 0.3) million reads per sample. Almost all the reads were mappable to the reference genome and 78.6% (stdev 3%) mapped back to the target region. On average, 15.2% (stdev 3%) of PCR duplicates were present. The success rate in the assay design and target capture was 93.6% as defined by the fraction of assays with a mean coverage > 20-fold in more than 80% of the samples. 96.4% of the assays resulted in coverage in at least one of the samples. As the coverage is a function of the number of reads obtained for a given sample (see Table 3 for detailed per sample read and coverage statistics), data were normalized by dividing the coverage of each base by the mean coverage for that sample. In this way, we can compare the coverage statistics over the samples and evaluate the variation in coverage attributable to the capture platform. Figure 2A and B show the normalized coverage over the amplicons and targeted exons, respectively, and demonstrate that the uniformity and reproducibility of the capture platform is extremely good. 88.7% of the exonic bases receive a coverage between 0.2 and 5 times the mean coverage and 78.1% of the exonic bases fall within a two fold coverage range around the mean.
Figure 2. Mean normalized coverage distribution. Cumulative distribution plot of mean normalized coverage for the capture amplicons (A) and the exons of the genes targeted (B).
The high reproducibility of the target capture across the samples is demonstrated by the high average Spearman rank correlation of 0.892 (standard deviation of 0.038) between the coverage values for any two different samples (see Table 4). Figure 3 shows the correlation between the technical replicates. No difference in coverage correlation is apparent between the duplicate and singleton PCR capture reactions.
Table 4. Coverage correlation
Figure 3. Technical replicate coverage correlation. Per base coverage correlation plot and spearman rank correlation values (red) for technical capture replicates.
The NCI-60 cell lines analyzed in this study contain in total 25 well documented mutations  in the targeted genes from our resequencing experiment (Cosmic database, December 2012). Two of the variants in the PTEN gene, in samples PC3 and CCRF-CEM are large homozygous deletions. A lack of coverage at these positions is expected. For 22 of the 23 remaining variant positions, we obtained sufficient coverage (> 20 fold) to perform reliable variant calling, 21 of these variants are clearly present in the raw sequencing data. One variant was present in the sequencing data but with a coverage of 9 times, precluding reliable variant calling at this position. An overview of mutations and their validation status is available in Table 5. The 2 large homozygously deleted positions in the PTEN gene can be confirmed from the sequencing as well as the qPCR amplification data. Table 6 shows adequate end point fluorescence and Cq values as well as good coverage for the PTEN-4 amplicon comprising the deletion in all but the 2 deleted samples (CCRF-CEM and PC3). Summarizing, the mutation validation rate for this experiment is 23 out of 25 (92%). The only mutation that was missed is designated as complex in the Cosmic database and probably comprises of an inter-chromosomal rearrangement; PCR amplification of the variant allele is impossible with our targeted primer pair. As this variant is heterozygous, the reference allele is amplified, so no deletion is detected by qPCR and the sequencing data only shows the reference allele. Of note is that the 2 deletions in the PTEN gene detected in this dataset are homozygous deletions. A PCR based enrichment technique is probably unable to detect a large heterozygous deletion.
As a measure of enrichment platform reproducibility and accuracy, the number of correctly called known SNPs is used. The SNP status of the NCI-60 samples is available from the Developmental Therapeutics Program (DTP) website (http://dtp.nci.nih.gov/index.html webcite) as published by . Unfortunately, only 6 SNPs on the Affymetrix 125 K platform fall within the target region of our capture. For these 6 SNPs, the genotype call in the publicly available data was compared to the genotype calls in the 16 NCI-60 samples in our dataset and concurred in 95.6% of the cases. Of the non-concordant SNP calls, 3 occurred in one and the same sample (SN12C) suggesting that a sample naming mix-up might be the cause of this mismatch. We were able to confirm the NF2 mutation in this sample as described by the Cosmic database thus indicating our sample id agreed with that used in the Cosmic database and leading us to doubt the sample id mentioned in the DTP dataset.
A second set of SNP calls is available from an exome sequencing study . We downloaded the variants in the genes included in our target enrichment experiment from the CellMiner tool (http://discover.nci.nih.gov/cellminer/ webcite). A total of 473 variants were found in the exome dataset with a genomic position in the target region enriched in our experiment. Of these variants 431 (91.1% with a standard deviation of 5.85% over the 15 cell lines included) were called from our targeted resequencing experiment. From the downloaded data, Abaan et al. do not provide genotype calls but rather variant to reference read ratios. Markedly, the average difference in the variant to reference ratio at each variant position for these two datasets is only half a percentage (0.544%, SD 0.11%) leading us to conclude that genotype calling from our platform is highly congruent with the genotypes called from the exome sequencing dataset if similar algorithms were used in genotype calling.
To increase the number of SNPs in this analysis, we examined the SNP calls for the technical replicates. This analysis not only depends on the accuracy of the capture and the correct co-amplification of both alleles of heterozygous positions, but also on the algorithms used to perform the variant calling. To evaluate the capture platform as objectively as possible (with as little as possible interference of the variant calling algorithm), we looked at the raw variant coverage data for all known polymorphisms (known to dbSNP build 132) in the target region with a coverage of at least 20 and a variant allele called by the genome analysis toolkit in at least one of the technical replicates. For each of these positions, the genotype call was compared in all technical replicates and the raw coverage data was examined (see overview in Table 7 and detailed figures in Additional file 2). For 3 of the samples, technical replicates of the capture and sequencing were performed. These technical replicates deliberately consist of either singleton or duplicate capture PCR reactions on the capture chip to examine if capture efficiency and allelic ratio is depending on the number of PCR replicates in the same chip.
Table 7. SNP detection in technical replicates
Additional file 2. Technical replicate SNP failures. Excel file with a list of inconsistently called SNPs in the technical replicates. The reason for the inconsistency as well as the allelic ratio per replicate are given.
Format: XLSX Size: 42KB Download file
Four capture reactions (2 singleton and 2 duplicate) were carried out on the normal1 reference DNA sample. 168 variants were detected in at least one of the replicates, of which 15 indels. 120 (71.4%) of these variants (including 10 indels) were detected in all 4 of the replicates of which only 2 showed a discordance in genotype call (heterozygous vs homozygous) in one of the 4 replicates. Of the 48 variant positions with an inconsistent call in the 4 replicates we closely examined the reason for this discordance from the raw sequencing data. For 5 of the variant positions, the coverage obtained in one or more of the samples was insufficient to perform a variant call. Another five of the 48 variant positions reside in an oligonuleotide repeat (poly N or dinucleotide repeat stretch), traditionally associated with false positive and negative variant calls. For the remaining 38 variant positions, there was a loss or gain of variant information due to changes in the allelic ratio between the reference and variant allele in the replicates. For the MCF7 cell line sample, 3 capture reactions were carried out of which 1 singleton and 2 duplicates PCR captures. For this sample 86 SNP positions (16 indels) were called variant in at least one of the replicates of which 72 (83.7%) (including 10 indels) are common amongst all three replicates, 3 of which showed a discordant genotype call in one of the samples. Again we carefully examined the allele ratio in all 3 replicates of the 14 SNP positions not called consistently in the replicates. Five miscalls could be attributed to genomic repeats at the SNP position. Four could not be called in one or more of the samples due to coverage issues. The remaining 5 were either missed or false positive calls in one of the samples due to a low or borderline coverage for the variant allele in the sample. Excluding the miscalled positions in the repeat regions we can conclude that for the 243 allele calls made for SNPs in the 3 technical replicates, 6 calls (2.4%) could not be made due to coverage issues and 7 (2.9%) appeared dubious due to allelic ratio issues. Performing the same analysis for the normal2 reference DNA sample, we found a total of 149 SNP (14 indel) positions of which 121 (81.2%) (11 indel) were in common between the 2 replicates with only 2 of these not having a concordant genotype. We found 3 SNPs to be in a repeat region, 25 allelic ratio problems and no coverage problems; concluding to allelic ratio issues occurring in 25 out of 292 (total number of SNP calls in both replicates, not in repeat regions) allele calls (8.5%) in these replicates.
Concluding the SNP calling data on the technical replicates, we have a total of 1228 SNP positions (the sum of the number of SNP positions per sample across the replicates times the number of replicates for that sample) in all the technical replicates. Only 14 SNP calls (1.1%) could not be made due to coverage issues (this is independent from the capture being performed by single or duplicate PCR reactions). 9.2% of the SNP calls could not be made due to a difference in the allelic ratio (for ease of counting we assume the SNP call not being made due to an allelic loss as opposed to a false positive call at the variant position having occurred). Upon separate analysis of singleton vs duplicate PCR replicate capture reactions, we observed a slightly higher (non-significant) number of these types of errors occurring in the singleton, namely 8.1% or 33 out of 305 of the SNPs, versus 5.2% or 21 of 232 of the SNPs for the duplicate PCR reaction captures (Chi squared test p > 0.05). We can conclude that only a minor fraction of SNP positions could not be called due to coverage issues and there is no influence on SNP calling based on the number of PCR enrichment reactions. As we have no arguments to state that the enrichment should be performed in duplicate, we conclude that the nanowell PCR capture is highly reproducible and reliable for singleton PCR reactions. We do see some issues with technical reproducibility of allele calling on our platform due to variable allelic ratio of heterozygous SNP positions. Unfortunately, we cannot compare our data with that of other capture platforms as no detailed analysis like this has been published for any of these platforms.
Today, three different sequence enrichment methodologies are available : the PCR based (Access Array by Fluidigm, Directseq by Raindance Technologies and Ampliseq by Life Technologies), hybridization based (Sureselect by Agilent and Nimblegen by Roche) and hybridization-extension based (Haloplex by Agilent and Truseq by Illumina) methods. Each of these technologies has its strengths and limitations. PCR based methods are generally accepted to have the best overall sensitivity and specificity but are limited in target size mainly due to the primer cost . The PCR based platforms thus are targeted towards the diagnostic resequencing market where primers can be reused. The pure hybridization based approaches are unable to capture some types of regions, mainly due to probe design issues around genomic repeats and pseudogenes, but scale easily and thus are currently geared towards whole exome sequencing applications .
The hybridization-extension based technology from Agilent (Haloplex) and Illumina (Truseq) are both available for small targeted resequencing experiments as well as for whole exome resequencing. At the time of writing, experience with these novel platforms is limited so no solid data on their real life performance is available in the literature.
In this study we demonstrate a novel quantitative PCR based sequence capture platform that has distinctive advantages over the currently existing capture platforms. With up to 5000 reaction wells per chip and the possibility to efficiently amplify amplicons with a large range in length, the maximum capture target size is similar to both the Ampliseq and Raindance platforms and is surpassing the Fluidigm array based platform that is limited to 96 individual assays. However, similar to the Fluidim platform and in contrast to the Ampliseq and Raindance platforms, the new platform also offers a large flexibility in experimental design. In contrast to capture reactions that are carried out in a single invariable multiplexed reaction, the chip based amplification allows a flexible combination of samples and amplicons without interference. Other similar chip based platforms have already been described and have proven their reliability [17,18]. But none have ever been used to perform sequence capture.
There is only scarce amount of literature available comparing the performance statistics of the above referenced platforms. Jones et al.  report a targeted resequencing of 24 genes involved in congenital disorders of glycosilation in 12 positive control patient samples on both the Raindance and Fluidigm platform. A perfect mutation detection rate is reported for both. A direct comparison of the performance of these platforms with our dataset is not possible due to the differences in target region, sequencing and analysis strategy. The reported exon failure rates (low or no coverage) of 15 out of 225 (6.9%) for Raindance and 13 out of 215 (6.0%) for Fluidigm are quite similar to the failure rate observed in our experiments. The Fluidigm platform is also extensively evaluated in a study of mutation discovery in patients with nephronophthisis-associated ciliopathy (11 genes in 192 patients) . The authors report a mutation validation rate of 90% and a 93.2% exon capture success rate as defined by a coverage > 30 fold. Other real life but smaller gene set coverage statistics for the Fluidigm platform are reported by Hollants et al. ; 37 of 38 (97%) amplicons captured successfully and Schlipf et al.  with 15 of 17 (88%) amplicons captured successfully. None of the above referenced studies evaluate the coverage uniformity as a metric. The Raindance capture platform is evaluated in several studies. Hu et al. have selectively resequenced 86 genes implicated in X linked mental retardation . They report an 91% amplicon design success rate and an average of 88.5% of the target bases receiving adequate coverage across the samples. 90% of the bases have a coverage of at least 29% of the mean coverage. The reproducibility of the capture is indicated by the reported 0.84 and 0.90 average pairwise correlation for the per base and per amplicon coverage rates respectively across all the samples. In their whole chromosome X exome resequencing experiment Mondal et al. report an amplicon design success rate of 98% and a subsequent capture success rate of 97.3% described as the percentage of the target bases covered by at least one read .
In a diagnostic resequencing effort for congenital deafness genes, carried out with the Raindance platform, the authors indicate a primer design success rate for 99.9% of the target bases and 95% of these bases reaching adequate coverage . A diagnostic test for congenital muscular dystrophy including some genes with high GC nucleotide content is reporting 84 to 95% capture success on the target region for the samples included in the analysis . Highly consistent in all these Raindance capture based publications is a large number of off-target or unmappable reads ranging between 40 and 70% of the total reads indicating some issues with off-target amplification and the downstream library preparation or data analysis procedure [23-26].
Literature on the capture success rate of both the Ampliseq and Haloplex platform is, due to the novelty of the platforms, limited. No publications exist elaborating on the performance characteristics of the platforms. The underlying technology for the Haloplex platform is described in a publication by Johansson et al. who mention a high capture success rate of over 98% and an inter sample coverage correlation of 0.98 on a limited set of samples in controlled circumstances .
The performance characteristics of our platform are similar to, if not outperforming some of the best statistics currently published on targeted resequencing. Considering that our PCR assays were not optimized, the 93.6 percentage reproducible assay enrichment success rate can be considered high compared to competing platforms. By performing some assay optimization, assay replacement, or by including multiple assays for the same genomic target, our capture success rate can be increased, which is important for diagnostic applications. One major advantage over competing platforms is the flexibility in the assay dispensing in the capture chips; we can easily exchange badly performing assays with new designs, or include additional targets of interest. The use of individual qPCR reactions clearly resulted in a very high coverage uniformity. This together with the high amplification specificity and a high degree of successful read mappings results in a highly efficient capture platform reducing the amount of over sequencing needed to achieve adequate coverage. A potential downside of these individual PCR reactions can be the amount of DNA required to perform a capture reaction on a sample which scales linearly with the number of targets a user wants to capture. Although the amount of input DNA required for a single reaction is low (360 pg) the capture of the maximum number of 5184 targets on this platform is roughly 1800 ng which might not be available for all diagnostic samples. However, one may consider to introduce a sample pre-amplification step, as successfully done in Sjöblom et al. .
The high mutation and SNP calling validation rates show the potential of this platform to be integrated into diagnostic workflows. Based on the platform we describe here, WaferGen Biosystems meanwhile has developed a low-cost target enrichment platform consisting of a single-sample nanodispenser and a PCR system that is able to run 2 chips at the same time. This platform is compatible with different types of chips containing between 1296 and 5184 PCR reactions, making it possible to run more than 50,000 single PCR capture reactions per day. The discordance in the SNP calling for the technical replicates warrants a note of caution in applying captured resequencing platforms in routine diagnostics without proper validation. No adequate sensitivity or specificity assessment for any next generation capture and sequencing platform currently exist and thus we have no means of comparing our statistics to the ones of competing platforms. We would like to urge other researchers to include technical replicates in their evaluation of any next generation sequencing platform, especially when aiming to design workflows with potential diagnostic applications.
qPCR: Quantitative polymerase chain reaction; Cq: Quantitation cycle; SNP: Single nucleotide polymorphism.
SD, WD, JD and SH are employees of WaferGen Biosystems that has launched a target enrichment platform as a commercial product, in part based on the results of this study.
BDW optimized the target enrichment platform, performed the sequencing data analysis, the statistical analysis and drafted the manuscript. SD carried out the target enrichments. SL and JH designed the PCR assays. SD, WD, JD and SH were involved in optimizing the Smart Chip platform for target enrichment. JV and SD conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.
The authors would like to acknowledge Anthony Van Driessche for assistance in performing the target enrichment.
The data set supporting the results of this article is available in the EMBL repository, PRJEB4408 http://www.ebi.ac.uk/ena/data/view/PRJEB4408.
Mardis E, Ding L, Dooling D, Larson D, McLellan M, Chen K, Koboldt D, Fulton R, Delehaunty K, McGrath S, Fulton L, Locke D, Magrini V, Abbott R, Vickery T, Reed J, Robinson J, Wylie T, Smith S, Carmichael L, Eldred J, Harris C, Walker J, Peck J, Du F, Dukes A, Sanderson G, Brummett A, Clark E, McMichael J, et al.: Recurring mutations found by sequencing an acute myeloid leukemia genome.
Pleasance ED, Keira Cheetham R, Stephens PJ, Mcbride DJ, Humphray SJ, Greenman CD, Varela I, Lin M-L, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, et al.: A comprehensive catalogue of somatic mutations from a human cancer genome.
Pleasance ED, Stephens PJ, O'meara S, Mcbride DJ, Meynert A, Jones D, Lin M-L, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, et al.: A small-cell lung cancer genome with complex signatures of tobacco exposure.
Bell D, Berchuck A, Birrer M, Chien J, Cramer DW, Dao F, Dhir R, DiSaia P, Gabra H, Glenn P, Godwin AK, Gross J, Hartmann L, Huang M, Huntsman DG, Iacocca M, Imielinski M, Kalloger S, Karlan BY, Levine DA, Mills GB, Morrison C, Mutch D, Olvera N, Orsulic S, Park K, Petrelli N, Rabeno B, Rader JS, Sikic BI, et al.: Integrated genomic analyses of ovarian carcinoma.
Nat Genet 2006, 6:813-823. Publisher Full Text
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer.
Nucleic Acids Res 2010, 39:D945-D950.
DatabasePubMed Abstract | Publisher Full Text | PubMed Central Full Text
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.
Ikediobi ON, Davies H, Bignell G, Edkins S, Stevens C, O'Meara S, Santarius T, Avis T, Barthorpe S, Brackenbury L, Buck G, Butler A, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Hunter C, Jenkinson A, Jones D, Kosmidou V, Lugg R, Menzies A, Mironenko T, Parker A, Perry J, et al.: Mutation analysis of 24 known cancer genes in the NCI-60 cell line set.
Garraway LA, Widlund HR, Rubin MA, Getz G, Berger AJ, Ramaswamy S, Beroukhim R, Milner DA, Granter SR, Du J, Lee C, Wagner SN, Li C, Golub TR, Rimm DL, Meyerson ML, Fisher DE, Sellers WR: Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma.
Abaan OD, Polley EC, Davis SR, Zhu YJ, Bilke S, Walker RL, Pineda M, Gindin Y, Jiang Y, Reinhold WC, Holbeck SL, Simon RM, Doroshow JH, Pommier Y, Meltzer PS: The exomes of the NCI-60 panel: a genomic resource for cancer biology and systems pharmacology.
Nat Meth 2010, 7:111-118. Publisher Full Text
Jones MA, Bhide S, Chin E, Ng BG, Rhodenizer D, Zhang VW, Sun JJ, Tanner A, Freeze HH, Hegde MR: Targeted polymerase chain reaction-based enrichment and next generation sequencing for diagnostic testing of congenital disorders of glycosylation.
Halbritter J, Diaz K, Chaki M, Porath JD, Tarrier B, Fu C, Innis JL, Allen SJ, Lyons RH, Stefanidis CJ, Omran H, Soliman NA, Otto EA: High-throughput mutation analysis in patients with a nephronophthisis-associated ciliopathy applying multiplexed barcoded array-based PCR amplification and next-generation sequencing.
Schlipf NA, Schüle R, Klimpe S, Karle KN, Synofzik M, Schicks J, Riess O, Schöls L, Bauer P: Amplicon-based high-throughput pooled sequencing identifies mutations in CYP7B1 and SPG7 in sporadic spastic paraplegia patients.
Hu H, Wrogemann K, Kalscheuer V, Tzschach A, Richard H, Haas SA, Menzel C, Bienek M, Froyen G, Raynaud M, Van Bokhoven H, Chelly J, Ropers H, Chen W: Mutation screening in 86 known X-linked mental retardation genes by droplet-based multiplex PCR and massive parallel sequencing.
Schrauwen I, Sommen M, Corneveaux JJ, Reiman RA, Hackett NJ, Claes C, Claes K, Bitner-Glindzicz M, Coucke P, Van Camp G, Huentelman MJ: A sensitive and specific diagnostic test for hearing loss using a microdroplet PCR-based approach and next generation sequencing.
Valencia CA, Ankala A, Rhodenizer D, Bhide S, Littlejohn MR, Keong LM, Rutkowski A, Sparks S, Bonnemann C, Hegde M: Comprehensive mutation analysis for congenital muscular dystrophy: a clinical PCR-based enrichment and next-generation sequencing panel.
Johansson H, Isaksson M, Sorqvist EF, Roos F, Stenberg J, Sjoblom T, Botling J, Micke P, Edlund K, Fredriksson S, Kultima HG, Ericsson O, Nilsson M: Targeted resequencing of candidate genes using selector probes.
Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JKV, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE: The consensus coding sequences of human breast and colorectal cancers.