T7 based linear amplification of RNA is used to obtain sufficient antisense RNA for microarray expression profiling. We optimized and systematically evaluated the fidelity and reproducibility of different amplification protocols using total RNA obtained from primary human breast carcinomas and high-density cDNA microarrays.
Using an optimized protocol, the average correlation coefficient of gene expression of 11,123 cDNA clones between amplified and unamplified samples is 0.82 (0.85 when a virtual array was created using repeatedly amplified samples to minimize experimental variation). Less than 4% of genes show changes in expression level by 2-fold or greater after amplification compared to unamplified samples. Most changes due to amplification are not systematic both within one tumor sample and between different tumors. Amplification appears to dampen the variation of gene expression for some genes when compared to unamplified poly(A)+ RNA. The reproducibility between repeatedly amplified samples is 0.97 when performed on the same day, but drops to 0.90 when performed weeks apart. The fidelity and reproducibility of amplification is not affected by decreasing the amount of input total RNA in the 0.3–3 micrograms range. Adding template-switching primer, DNA ligase, or column purification of double-stranded cDNA does not improve the fidelity of amplification. The correlation coefficient between amplified and unamplified samples is higher when total RNA is used as template for both experimental and reference RNA amplification.
T7 based linear amplification reproducibly generates amplified RNA that closely approximates original sample for gene expression profiling using cDNA microarrays.
Gene expression profiling using complementary DNA (cDNA) microarrays is being applied for multiple purposes such as defining the taxonomy of different molecular subtypes of human breast and other cancers [1-10] and discovering biomarkers and therapeutic targets [11,12]. A limitation of the use of this technology is that small specimens of human tissue, such as obtained by core needle or fine needle aspiration (FNA) biopsies, may not be sufficient for microarray hybridization using direct labelling protocols. Typical microarray labelling procedures require 2–4 μg poly(A)+ RNA or 25–50 μg total RNA per cDNA microarray. This amount of poly(A)+ RNA or total RNA can be obtained from samples of human tissue that weigh greater than 50–100 mg. However, core needle biopsies of breast cancers, for example, weigh in the 10–25 mg range and yield only 3–15 μg of total RNA. Small tumors identified using early detection strategies may thus be too small to excise a specimen with enough RNA for microarray analysis. A pilot study by Assersohn et al.  showed that only 15% of FNA samples from human breast cancers produced sufficient mRNA for expression array analysis. One approach to low specimen RNA input has been to use indirect labelling techniques to increase fluorescence signal intensity, such as with aminoallyl nucleotides. Although less expensive, we and other colleagues have found that indirect labelling techniques are not always reliable compared to direct labelling methods. For valuable tumor specimen, reliability is paramount. A very recent report used amino C6dT-modified random hexamers to prime cDNA synthesis in conjunction with aminoallyl-dUTP and increased fluorescence intensity enough such that as little as 1 μg of total RNA from cell lines gave sufficient signal for cDNA microarray hybridization . The reliability of this method with human tumor specimen warrants further testing.
RNA amplification techniques have been developed to address the need for sufficient RNA from tiny specimen for microarray hybridization. Other examples of specimen requiring amplification for genome-wide characterization of gene expression include purified populations of cells obtained by either flow cytometry, laser capture microdissection, breast ductal or bronchial lavage, or microendoscopy. Although one group has used unamplified total RNA extracted from ~2 × 104 microdissected cells for hybridization on 5000 clone membrane-based arrays , most groups perform RNA amplification for this purpose [16-18], especially when using high-density slide-based arrays.
The most commonly used mechanism for RNA amplification is a T7 based linear amplification method first developed by Van Gelder, Eberwine and coworkers [19-21]. This method utilizes a synthetic oligo(dT) primer containing the phage T7 RNA polymerase promoter to prime synthesis of first strand cDNA by reverse transcription of the poly(A)+ RNA component of total RNA. Second strand cDNA is synthesized by degrading the poly(A)+ RNA strand with RNase H, followed by second strand synthesis with E. coli DNA polymerase I. Amplified antisense RNA (aRNA) is obtained from in vitro transcription of the double-stranded cDNA (ds cDNA) template using T7 RNA polymerase. Several protocols based on this mechanism have been developed and used in microarray analyses [16,22-28].
In spite of the increasing use of T7 based linear amplification techniques in the study of human disease, systematic evaluation of the fidelity and the reproducibility of amplification mechanisms has been limited. Such information is important to determine how well the amplified sample resembles the unamplified sample and the validity of applying this technique to the study of human tissues. A study by Wang and Miller et al.  described a T7 based amplification protocol modified with a template-switching (TS) primer used to theoretically generate a full-length ds cDNA. The gene expression of amplified total RNA from melanoma cell lines hybridized to 2000 gene microarrays was compared to that of unamplified total or poly(A)+ RNA using cluster analysis and determining the number of outlier genes between single experiments. Approximately 3–6% of genes were discordant outliers when analyzed using 3-fold or greater expression ratios in at least one hybridization and compared between total RNA, mRNA, and different amounts of input aRNA. Hu and co-workers  compared amplified and unamplified samples using total RNA obtained from human glioma cell lines and 2300 clone microarrays (printed with duplicates on the same array) to evaluate a similar T7 based protocol with a TS mechanism adopted from Wang and Miller et al. . Their results were based on nine microarray experiments and showed concordance between amplified and unamplified samples, verifying four expressed and two differentially expressed genes using Northern and Western blotting and immunohistochemical assay.
Since there are multiple T7 based amplification protocols in use, questions remain regarding the effects of differences between these protocols and how these differences translate when applied to solid tumors rather than cell lines on a genome-wide scale. A study from Incyte Genomics  examined gene expression of kidney vs. placenta RNA and RNA from matched normal and tumor renal tissue using their own T7 based amplification kit (not employing a TS primer) and 9700 clone cDNA microarrays. They found that a differential expression ratio cut-off of greater than or equal to 2-fold produced excellent correlation between samples amplified with different amounts of input poly(A)+ RNA but that a 3-fold differential expression ratio threshold should be set for comparing ratios between amplified and unamplified mRNA. Decreasing the input of tissue lysate increased gene expression discordance between amplified and unamplified samples. As more human tissues were tested, single round amplification produced a 200- to 500-fold yield, lower than the 700-fold yield originally found in their study and lower than yields reported in amplification studies using cell cultures.
The differences between several T7 based linear amplification protocols are mainly the following: 1. whether a template-switching mechanism is used in the synthesis of second strand cDNA, 2. what enzymes are used in the synthesis of second strand cDNA, 3. how ds cDNA is purified ("cleaned up") prior to in vitro transcription, and 4. how in vitro transcription is performed. Information regarding the effects of these differences on the fidelity or reproducibility of amplification should help eliminate both unnecessary procedures and those actually detrimental to amplification. Determination of the range of input total RNA necessary to achieve reasonable fidelity and reproducibility is crucial for researchers dealing with very small specimens of human tissue.
To answer these questions and to define an optimal protocol for T7 based linear amplification, we carried out a series of amplification reactions under different conditions using total RNA isolated from primary human breast carcinomas. The amplified samples were compared to unamplified samples on high-density cDNA microarrays containing 41,931–42,602 clones. We evaluated the effects of TS primer in second strand cDNA synthesis, DNA ligase activity in second strand cDNA synthesis, column purification of ds cDNA, and in vitro transcription time on the fidelity and reproducibility of amplification and the yield of aRNA. The effect of diminishing amounts of input total RNA was also tested.
Results and Discussion
Variation in cDNA microarray analysis of gene expression using unamplified poly(A)+ RNA
In order to assess the reproducibility of microarray hybridization using standard methods, poly(A)+ RNA was isolated from both primary breast carcinoma BC2 and Universal Human Reference total RNA (Stratagene®). The BC2 poly(A)+ RNA labelled with Cy5 and the reference poly(A)+ RNA labelled with Cy3 were hybridized on five 42,000 clone cDNA microarrays. 16,333 clones had a signal greater than 50% above background on all five arrays. Three hybridizations were done on the same day using arrays from the same print batch and the average correlation coefficient between any two hybridizations was 0.97 ± 0.01, demonstrating a high reproducibility between parallel hybridizations done on the same day. Another two hybridizations were performed using poly(A)+ RNA isolated from BC2 total RNA three months later and a different print batch of microarrays. The correlation coefficient between these two arrays was 0.95, similar to the average correlation coefficient of the first three arrays. However, when the unamplified poly(A)+ RNA arrays performed weeks apart from different print batches were compared, the average correlation coefficient dropped to 0.89 ± 0.01, indicating that experimental variations due to differences in microarray printing, poly(A)+ RNA isolation, and RNA labelling and hybridization contribute a small but detectable change in results.
In order to minimize experimental variations, we created a virtual poly(A)+ RNA expression array that idealizes the gene expression of sample BC2 by using the average expression level of each clone over multiple hybridizations. By "expression level" we mean the normalized log (base2) ratio of signal intensities of Cy5 (experimental sample) to Cy3 (reference) fluorescence. The idealized expression profile from the poly(A)+ RNA virtual array is used as our unamplified "gold standard" for data analyses involving BC2. The correlation coefficient between each individual poly(A)+ RNA array and the gold standard ranges from 0.95–0.97, similar to that observed between hybridizations performed on the same day. The gold standard virtual array therefore represents well-measured gene expression in the primary tumor and minimizes individual experimental variations.
Template-switching does not affect the fidelity of amplification
As previously mentioned, a protocol based on T7 based linear amplification published by Wang, Miller and coworkers  incorporated a TS mechanism  in the synthesis of second strand cDNA at the 5' end in order to generate full-length ds cDNA. This was speculated to be of advantage in the hybridization of unmapped sequences spotted on arrays and to enable higher temperature cDNA synthesis that would enhance sequence specificity. However, no experimental evidence was provided to support the idea that the TS mechanism increases the fidelity of amplification.
To determine whether the addition of TS primer improves the fidelity of amplification, we compared gene expression profiles of aRNA amplified in the presence or absence of TS primer with expression profiles of unamplified poly(A)+ RNA. Total RNA isolated from primary breast carcinoma BC91 and reference total RNA were amplified with or without TS primer using the Wang-Miller protocol , except that aRNA was purified using an RNeasy® kit (Qiagen®). A virtual "gold standard" poly(A)+ RNA array was created for BC91 using the average expression level of four hybridization replicates of unamplified poly(A)+ RNA. A "virtual correlation coefficient" for a given amplification protocol was obtained by comparing the virtual amplified array (averaged expression level for each clone from multiple amplified samples) to the virtual gold standard unamplified array for BC91. To determine the correlation between individual amplified samples and the gold standard, an "average correlation coefficient" for a given amplification protocol was also calculated (the sum of correlation coefficients of individual amplified samples with the gold standard divided by the number of amplified samples tested for each condition).
As shown in Table 1, using aRNA amplified from total RNA as reference, the expression profiles obtained in the absence of TS primer correlated with the gold standard slightly better than in the presence of TS primer, although the difference was not statistically significant. When poly(A)+ RNA rather than total RNA was amplified as reference, the correlation with the gold standard was slightly, but not statistically significantly, better with TS primer (Table 1). The correlation coefficient using aRNA amplified from total RNA as reference is higher than using aRNA amplified from poly(A)+ RNA as reference regardless of whether TS primer is used. This suggests that when total RNA from a tumor sample is amplified for microarray analysis, the reference RNA should also be amplified from total RNA.
Table 1. Correlation coefficients of amplified and unamplified expression levels of 14,044 genes selected according to the described criteria. Amplifications with or without TS primer and with two different ds cDNA cleanup protocols were performed on BC91 total RNA.
The yield of aRNA amplified from BC91 total RNA using different protocols is shown in Table 2. Assuming 1% of the total RNA is poly(A)+ RNA, a 253 and 370-fold amplification was observed in the presence and absence of TS primer, respectively. The yield of aRNA amplified from total RNA which are generated from cultured cell lines was 2- to 3-fold higher than when the primary tumor total RNA was amplified (data not shown).
Table 2. Efficacy of amplification using 3 μg total RNA from BC91 and different ds cDNA cleanup methods, with or without TS primer.
These experiments demonstrate that the TS mechanism does not increase the fidelity of amplification, and therefore can be eliminated from the protocol. The reasons for the limited effect of the TS mechanism on the correlation coefficients probably are: 1) the second strand cDNA synthesis primed by the TS primer probably represents a small fraction, while the majority of the synthesis is self-primed or primed by small pieces of RNA generated by RNase H; and 2) adding a few base pairs to the ds cDNA prior to in vitro transcription does not change the aRNA significantly enough to affect the array hybridization.
DNA ligase activity is not required for amplification
Amplification protocols that do not include DNA ligase in second strand cDNA synthesis generate the same length aRNA (ranging from 0.2 kb to 6 kb, data not shown) as generated from a widely used T7 based amplification protocol developed by Affymetrix® which uses E. coli DNA ligase. The correlation coefficient between amplified and unamplified sample and the yield of aRNA amplified without DNA ligase are high enough to suggest that DNA ligase activity is not necessary for RNA amplification in microarray analysis. For confirmation, we omitted DNA ligase from the protocol developed by Affymetrix® and compared the expression profiles of the resulting aRNA to aRNA obtained using the standard Affymetrix® protocol that includes DNA ligase. The correlation coefficient between amplified and unamplified samples is slightly higher in the absence of ligase (Table 3), supporting our previous conclusion that ligase is not required for total RNA amplification in cDNA microarray analysis. However, the yield of aRNA is higher when ligase is used, suggesting that ligase may play a role in improving the efficiency of amplification.
Table 3. Effect of DNA ligase on the fidelity of amplification.a,b
Column cleanup of ds cDNA does not improve the fidelity of amplification, but decreases the yield of aRNA
In the Wang-Miller protocol , ds cDNA is purified using a Bio-6 column (Bio-Rad). A drawback to this method is that the cDNA is eluted with a large volume and needs to be concentrated into a much smaller volume by lyophilization prior to in vitro transcription. This is a time-consuming step, especially when large numbers of samples are processed. To eliminate the lyophilization step from the protocol, we used an alternative column–the Sephadex™ G-50 column–to filter out free nucleotides from the ds cDNA after completion of the second strand synthesis reaction. The ds cDNA is then precipitated following phenol-chloroform extraction and re-suspended in proper volume for in vitro transcription. The correlation coefficient between amplified and unamplified samples and the yield of aRNA using this less time-consuming modification are similar to that using the Wang-Miller protocol (Tables 1 and 2).
We further explored the question of what effects the ds cDNA column cleanup step itself had on amplification. We amplified total RNA from tumor BC2, either with or without the cleanup step of Sephadex™ G-50. Seven amplifications were done on different dates with the Sephadex™ G-50 column and five amplifications were done without this cleanup step. Both the virtual and the average correlation coefficients using the column are slightly lower than without it (Table 4), suggesting that the column cleanup does not improve the fidelity of amplification. Moreover, the yield of aRNA is significantly higher without the column purification of ds cDNA, suggesting some loss of ds cDNA on the column. Since the column had a negative effect on amplification by decreasing the yield of aRNA without improving the fidelity of amplification, we eliminated this step from our protocol.
Table 4. Effect of column cleanup on the fidelity and yield of amplification.a,b
Effect of in vitro transcription time on the fidelity of RNA amplification
To determine the effect of in vitro transcription time on amplification, duplicate reactions were performed at 37°C for 2, 3, 4, 5 and 6 hours. Two additional 5-hour incubation reactions were stored at 4°C overnight to determine the effect of low temperature incubation on amplification. The virtual correlation coefficient is slightly higher for the 5-hour incubation at 37°C (Figure 1A). However, in vitro transcription for 5 hours at 37°C plus overnight incubation at 4°C gives the highest yield of aRNA (Figure 1B). Since the yield of aRNA at any time point is sufficient for multiple hybridizations, we decided to use 5-hour incubation at 37°C for all subsequent amplifications.
Figure 1. Effects of in vitro transcription time on the fidelity of T7 based amplification and the yield of aRNA amplified from BC2 total RNA. Average correlation coefficients between amplified samples vs. unamplified poly(A)+ RNA at each time point are shown in A and average yields of aRNA from each time point in B.
Evaluation of the fidelity of T7 based linear amplification protocols
To systematically evaluate the fidelity of T7 based amplification, we compared the correlation coefficient obtained from four different protocols. The correlation coefficients of individual samples amplified using different protocols with the gold standard range from 0.74–0.86 (Figure 2). The scatter plots comparing the gene expression of the virtual amplified samples for each protocol with the unamplified gold standard are shown in Figure 3, with the virtual correlation coefficients ranging from 0.83–0.86. The differences in correlations obtained using different protocols are not statistically different by Student's t-test, demonstrating that differences in gene expression for samples amplified using different protocols are minor.
Figure 2. Box graph of correlation coefficients of the gene expression levels for 11,123 clones, comparing individual amplified samples to the gold standard of BC2 (idealizing unamplified poly(A)+ RNA). Each closed circle represents the correlation coefficient for each individual sample amplified with a particular protocol to the gold standard. The average and virtual correlation coefficients of the replicate samples for each protocol are shown below the graph.
Figure 3. Scatterplot matrix using average expression ratios of multiple replicate amplifications for each protocol and the gold standard. The X-axis and Y-axis show virtual gene expression level [normalized log(base2) fluorescence intensity ratio of sample to reference averaged over multiple arrays] measured using aRNA amplified by different protocols or unamplified poly(A)+ RNA as labelled. The last column of plots shows each amplification protocol (Y-axis) vs. gold standard (X-axis). The correlation coefficient for each pair is listed in each plot. The orange and blue shaded regions indicate more than a two-fold difference between the virtual expression values for each protocol being compared.
Our results also suggest that the level of bias introduced into gene expression profiling by amplification is relatively low; expression profiles obtained using aRNA provide a close approximation of the true expression profile of the original sample. To assess the biases of amplification quantitatively, we calculated the number and percentage of genes with expression level change by 4- or 2-fold after amplification. The biases of amplification by different protocols are similar. Specifically, less than 0.2% of 11,123 genes (12–15 genes) changed their expression level by 4-fold or greater and less than 6% (306–594 genes) changed expression by 2-fold or greater after amplification. With the Jeffrey lab protocol, less than 4% of genes showed changes in expression level by 2-fold or greater. Of the genes that changed, 7 genes and 139 genes changed their expression in all four protocols in the same direction greater than 4-fold and 2-fold, respectively. Also, the virtual correlation coefficients between different protocols are high (average 0.95) (Figure 3), suggesting that slight differences in protocols based on T7 linear amplification mechanism do not affect the correlation of amplified samples to unamplified samples. These results suggest the conclusion that aRNA provides a close approximation of the true expression profile of the original sample.
We present here a components of variance model for explaining the different sources of variation in the amplification protocols (see Statistical Appendix, 1). The expression measurement for a gene for a specific array/protocol/sample can be broken down as
X = Z + e, where
X is the measured expression value
Z is the "true" expression that does not change under replication, and
e is the measurement error that does change under replication.
While we cannot estimate Z and e directly, we can estimate their variances from the data. For the four different amplified protocols and the unamplified arrays, the relevant variances are estimated in Table 5. The variance of the true expression (Var Z) ranges from 0.623–0.661 for the amplified protocols and is 0.726 for the unamplified arrays. The variance of the measurement error (Var e) ranges from 0.055–0.102 for the amplified protocols and is 0.059 for the unamplified arrays. The estimates of Var Z were obtained by averaging the pairwise covariances of the replicates within each protocol, and the estimates of Var e by using the within-protocol variance. (While the variance of a collection of numbers measures how much they vary about their mean, the covariance of two sets of numbers measures how much they vary with respect to each other. See Statistical Appendix for more details). We notice that Var Z for unamplified poly(A)+ RNA is larger than all of the others, indicating a dampening effect on gene expression by amplification. Measurement error variance is lowest for the Jeffrey lab protocol.
Table 5. Variance of true gene expression (Var Z) and measurement error (Var e) for each of the different amplified protocols and the unamplified arrays.
The covariances between the different Zs for the different methods (estimated from the virtual arrays) are shown in Table 6. The off-diagonal elements of Table 6 are the covariances; the diagonal elements are the variances of the virtual arrays. We notice that covariances among the amplified protocols, which range from 0.62 to 0.66, are higher than their covariances with the unamplified arrays, which range between 0.57 and 0.61. Furthermore, the variances of the amplified protocols (0.63 to 0.68) are lower than that of the unamplifed (0.74).
Table 6. Covariances between the "true" gene expression for the different amplification protocols, estimated from the virtual arrays. The diagonal of the table contains the variances for the techniques.
This suggests the following further breakdown:
Z = Zc + Zs, where
Zc is a common expression component, with variance about 0.6, and
Zs is a specific expression component, with variance about 0.04 for the amplified arrays and 0.14 for the unamplified arrays.
Therefore, the amplified expression values for genes on an array are largely the same as on the unamplified array. The component of variation in which they differ appears to be common for all amplified protocols, and shows a much higher variance in the unamplified arrays. The effect of amplification can be summarized by saying that it has a dampening effect on the true expression of some genes (decreased variance in gene expression – see Figure 4).
Figure 4. Scatterplot of the t-statistics (the numerical score underlying a t-test) comparing the differences in gene expression between amplified and unamplified RNA for BC2 (X-axis) and BC91 (Y-axis). The tests were based on 7 amplified total RNA and 5 unamplified poly(A)+ RNA samples for BC2, and 2 amplified total RNA and 4 unamplified poly(A)+ RNA samples for BC91.
A recent report compared amplified expression profiles of different primary breast tumors using Affymetrix® arrays . Unger et al. found that correlation coefficients between gene expressions in different tumors revealed by aRNA ranged between 0.71–0.89. However, gene expression was not measured using unamplified poly(A)+ RNA from these different tumors, raising the question of whether the observed fairly high correlation between diverse tumors was due to amplification bias. To answer this question, we compared the correlation coefficients between gene expression profiles of different tumors determined either with unamplified poly(A)+ RNA or aRNA amplified using our protocol (Table 7). The correlation coefficients between BC2 and BC91 measured using poly(A)+ RNA or aRNA are the same, 0.55. Moreover, the correlation between differences in gene expression between amplified and unamplified samples for different tumors is weak (Figure 4), suggesting that genes that differ through amplification depend on the sample rather than on systematic changes from amplification. Our results demonstrate that different primary breast tumors are not closely related to each other in gene expression profiles, and amplification does not affect the correlation of gene expression between different tumors. Amplification is therefore suitable for comparison of gene expression profiles among large sample sets.
Table 7. Correlation between expression levels of different tumors (BC2 and BC91) determined with both poly(A)+ RNA and aRNA for each tumor.a,b
Evaluation of reproducibility of T7 based linear amplification
Another important aspect of RNA amplification is the degree of reproducibility. To evaluate this, we calculated the correlation coefficients between individual amplified samples. The correlation coefficients between individual hybridizations done on the same day for poly(A)+ RNA averaged 0.97 and for aRNA amplified on the same day averaged from 0.91–0.98 (Table 8). The correlation coefficients between individual hybridizations done on different days averaged 0.89 for poly(A)+ RNA and 0.85–0.90 for aRNA amplified from total RNA on different days. The reproducibility of our protocol is slightly better than that using other protocols. Notably, when samples are amplified on the same day, the correlations are significantly higher than when samples are amplified on different days. In addition, samples amplified with protocols omitting ligase activity give higher reproducibility regardless of whether they are amplified on the same day or not.
Table 8. Evaluation of the reproducibility of T7 based amplification.a,b
The effect of the amount of input total RNA on amplification
To determine the effect of the amount of input total RNA on single round amplification, we amplified different amounts of total RNA, 3 μg, 1 μg, 300 ng, 100 ng, 30 ng and 10 ng, using different amounts of T7 primer according to the quantity of input total RNA (Table 9). When the input total RNA is lower than 300 ng, the yield of aRNA for is lower than the standard quantity required for one hybridization (3 μg). At amounts greater or equal to 300 ng, the correlation coefficients between amplified and unamplified samples and among amplified samples remain about the same. The fold of amplification increases with smaller quantities of template RNA, but the absolute yield of aRNA decreases. Therefore, within the range of 0.3–3 μg total RNA, decreasing the input RNA does not affect the fidelity and reproducibility of amplification.
Table 9. The effect of the amount of template BC2 total RNA on the fidelity, reproducibility and yield of amplification.a
All original microarray data may be accessed at the RNA Amplification for Microarrays website http://genome-www.stanford.edu/breast_cancer/amplification/ webcite.
In conclusion, T7 based linear amplification generates high fidelity aRNA for gene expression profiling using high-density cDNA microarrays. The average correlation coefficient between amplified and unamplified samples is 0.82 with less than 4% of genes showing changes in expression level by 2-fold or greater using the optimized (Jeffrey lab) protocol. The correlation to unamplified poly(A)+ RNA increases to 0.85 when experimental variability is minimized by configuring multiple amplified samples into a virtual array. Reproducibility between samples amplified with this technique is high, especially when performed on the same day rather than weeks apart. Amplification produces a dampening effect on gene expression variation.
Two primary breast carcinomas, BC2 and BC91, in which more than 90% of the breast epithelial cells were cancer cells, were chosen for experiments in this study. The specimens were frozen in either liquid nitrogen (BC2) or on dry ice (BC91) within 30 minutes following devascularization and stored at -80°C. Frozen sections were cut from primary breast carcinoma specimens and stained with hematoxylin and eosin to confirm tumor content.
Total RNA was isolated from primary tumor tissue using TRIzol® solution (Invitrogen™) following homogenization using a PowerGen Model 125 (Fisher Scientific). Poly(A)+ RNA was isolated from total RNA with the FastTrack 2.0 kit (Invitrogen™). The concentration of total RNA and poly(A)+ RNA was determined using a GeneSpec I spectrophotometer (Hitachi) and the integrity of total RNA and poly(A)+ RNA was assessed using a 2100 Bioanalyzer (Agilent).
The amplification of total RNA or poly(A)+ RNA was performed based on a previously described protocol  with our modifications.
For first strand cDNA synthesis, 3 μg (unless otherwise specified) tumor total RNA, Universal Human Reference total RNA (Stratagene®), or 150 ng poly(A)+ RNA was mixed with 1 μg Eberwine primer (Operon®) in RNase-free water to a total volume of 9 μl. The RNA/primer mixture was denatured at 70°C for 3 min and cooled on ice for 2 min, followed by adding: 4 μl of 5X first strand buffer (Invitrogen™), 2 μl 0.1 M DTT, 1 μl RNasin® (40 U/μl, Promega™), 2 μl 10 mM dNTP, and 2 μl Superscript™ II (200 U/μl, Invitrogen™), and incubated at 42°C for 1.5 hours.
Second strand cDNA synthesis was performed by mixing the first strand synthesis reaction with 106 μl RNase-free water, 15 μl 10X Advantage™ PCR buffer (Clontech), 3 μl 10 mM dNTP mix, 3 μl Advantage™ cDNA polymerase mix (Clontech), and 1 μl RNase H (2 U/μl, Invitrogen™). The reaction was incubated at 37°C for 5 min to digest RNA, followed by 94°C for 2 min to activate the Advantage™ cDNA polymerase, 65°C for 1 min to prime and 75°C for 30 min to extend the second strand cDNA. The reaction was stopped by the addition of 7.5 μl 1 M NaOH/2 mM EDTA and incubated at 65°C for 10 min.
ds cDNA was extracted with an equal volume of phenol:chloroform: isoamyl alcohol (25:24:1), transferred to a Phase Lock Gel™ tube (Eppendorf) and centrifuged at 16,000 g for 5 min. The ds cDNA (aqueous layer) was transferred to a new tube and precipitated by adding 1 μl linear acrylamide (0.1 μg/μl), 70 μl 7.5 M NH4Ac and 1 ml 200 proof ethanol, and centrifuged at 16,000 g for 20 min at room temperature. The pellet was washed in 500 μl 75% ethanol, centrifuged at 16,000 g for 5 min, air dried and resuspended in 16 μl RNase-free water.
In vitro transcription of ds cDNA was performed using a T7 MEGAscript™ kit (Ambion®). Four microliters of each 75 mM NTP, 4 μl of 10X reaction buffer and 4 μl of T7 polymerase mix was added to the 16 μl of ds cDNA. The reaction was then carried out at 37°C for 5 hours. aRNA was cleaned up using an RNeasy® mini kit (Qiagen®) as described by the manufacturer.
RNA labelling and hybridization
Three micrograms of aRNA (unless otherwise specified) or 2 μg poly(A)+ RNA were labelled either with Cy5-dUTP (experimental sample) or Cy3-dUTP (reference) in a 30.4 μl reaction. RNA was first mixed with either 8 μg of random primer for aRNA or 5 μg of oligo(dT) primer for poly(A)+ RNA in 16 μl of RNase-free water. RNA/primer mix was incubated at 70°C for 10 min and cooled on ice for 2 min. The following reagents were added: 6 μl of 5X first strand buffer, 3 μl 0.1 M DTT, 0.7 μl 50X dNTP (25 mM dATP, dCTP, dGTP and 10 mM dTTP), 3 μl 1 mM Cy3-dUTP or Cy5-dUTP and 1.7 μl Superscript™ II (200 U/μl). The labelling reaction was carried out at 42°C for 2 hour during which 1 μl Superscript™ II was added to the reaction at the end of the first hour. The input RNA was hydrolyzed by adding 15 μl 0.1 M NaOH/2 mM EDTA and incubated at 65°C for 8 min, followed by neutralization with 15 μl 0.1 M HCl. The Cy5 and Cy3 labelled probes were combined and purified in a Microcon® YM-30 column (Millipore) by washing three times with Tris-EDTA buffer. 15 μg Human Cot-1 DNA was added to the probe before the first wash. The purified probe was adjusted to a total volume of 26 μl and mixed with 5.3 μl 20X saline-sodium citrate (SSC), 1 μl yeast tRNA (10 μg/μl), 2 μl poly(A) DNA (10 μg/μl), and 0.6 μl 10% sodium dodecyl sulfate (SDS). The resulting 35 μl probe solution was denatured at 95°C for 2 min and then incubated at 42°C for 25 min. The probe was then hybridized to cDNA arrays at 65°C for 14–18 hours. Depending on the print batch, the arrays contained from 42,772 to 43,915 spots (41,931–42,602 distinct clones representing 16,907–18,417 named genes, 3946–4145 ESTs with known functions and 19,369–21,384 ESTs with unknown functions), and were manufactured as previously described [31-33]. Following hybridization, the arrays were washed with 2X SSC with 0.05% SDS once for 2 min at room temperature, 1X SSC for 2 min at room temperature, 0.2X SSC three times for 1 min at 45–50°C.
Imaging and data analysis
The arrays with hybridized probes were scanned using an Axon scanner. The scanned images were analyzed first using GenePix® Pro 3.0 software (Axon Instruments) and spots of poor quality determined by visual inspection were also removed from further analysis. The resulting data collected from each array was submitted to the Stanford Microarray Database (SMD, http://genome-www5.stanford.edu/microarray/SMD webcite) . A total of 97 arrays were submitted (60 experiments done with BC2 and 37 experiments performed with BC91). Only features with a signal intensity >50% above background in both Cy5 and Cy3 channels for all of the samples included in a particular analysis were retrieved from SMD. Pearson's correlation coefficient was calculated using Microsoft® Excel 2000. A components of variance model was used to explain different sources of variation in the amplification protocols.
HZ participated in study design, isolated the RNA, performed all the amplification and hybridization experiments, participated in data analysis, and drafted the manuscript. TH performed the statistical analyses and drafted the statistical portions of the manuscript. MW participated in study design and editing of the manuscript. ALBD participated in data analysis and editing of the manuscript. SSJ conceived of the study, guided its design and coordination, participated in data analysis, drafted portions of the manuscript, and supervised editing.
All authors read and approved the final manuscript.
We are grateful to Drs. David Botstein and Patrick O. Brown for helpful discussions and Susan Overholser for her invaluable assistance in the preparation of this manuscript. This work was supported by NIH/NCI Grant U01 CA85129 and California Breast Cancer Research Program Grant 5JB-0126. M.L.W. is supported by a National Research Service award from the National Human Genome Research Institute and by funds from the Scleroderma Research Foundation. S.S.J.'s website is Stefanie Jeffrey Lab http://www.stanford.edu/group/sjeffreylab/.
Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, Pergamenschikov A, Williams CF, Zhu SX, Lee JC, et al.: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, et al.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.
Hegde P, Qi R, Gaspard R, Abernathy K, Dharap S, Earlem-Hughes J, Gay C, Nwokekeh NU, Chen T, Saeed AI, et al.: Identification of tumor markers in models of human colorectal cancer using a 19,200-element complementary DNA microarray.
Mol Interv 2002, 2:101-109. Publisher Full Text
Biotechniques 2000, 29:530-536. PubMed Abstract
Nat Biotechnol 1997, 15:1359-1367. PubMed Abstract
Biotechniques 2001, 31:874-879. PubMed Abstract
Affymetrix GeneChip® Expression Analysis Technical Manual [http://www.affymetrix.com/Download/manuals/expression_manual.pdf] webcite
Sotiriou C, Powles TJ, Dowsett M, Jazaeri AA, Feldman AL, Assersohn L, Gadisetti C, Libutti SK, Liu ET: Gene expression profiles derived from fine needle aspiration correlate with response to systemic chemotherapy in breast cancer.
Science 1995, 270:467-70. PubMed Abstract
Genome Res 1996, 6:639-645. PubMed Abstract