Several high throughput technologies have been employed to identify differentially regulated genes that may be molecular targets for drug discovery. Here we compared the sets of differentially regulated genes discovered using two experimental approaches: a subtracted suppressive hybridization (SSH) cDNA library methodology and Affymetrix GeneChip® technology. In this "case study" we explored the transcriptional pattern changes during the in vitro differentiation of human monocytes to myeloid dendritic cells (DC), and evaluated the potential for novel gene discovery using the SSH methodology.
The same RNA samples isolated from peripheral blood monocyte precursors and immature DC (iDC) were used for GeneChip microarray probing and SSH cDNA library construction. 10,000 clones from each of the two-way SSH libraries (iDC-monocytes and monocytes-iDC) were picked for sequencing. About 2000 transcripts were identified for each library from 8000 successful sequences. Only 70% to 75% of these transcripts were represented on the U95 series GeneChip microarrays, implying that 25% to 30% of these transcripts might not have been identified in a study based only on GeneChip microarrays. In addition, about 10% of these transcripts appeared to be "novel", although these have not yet been closely examined. Among the transcripts that are also represented on the chips, about a third were concordantly discovered as differentially regulated between iDC and monocytes by GeneChip microarray transcript profiling. The remaining two thirds were either not inferred as differentially regulated from GeneChip microarray data, or were called differentially regulated but in the opposite direction. This underscores the importance both of generating reciprocal pairs of SSH libraries, and of real-time RT-PCR confirmation of the results.
This study suggests that SSH could be used as an alternative and complementary transcript profiling tool to GeneChip microarrays, especially in identifying novel genes and transcripts of low abundance.
Gene expression profiling has become an invaluable tool in functional genomics. Since the mid-1990's, DNA microarrays [1-3], cDNA subtraction [4-7] and Serial Analysis of Gene Expression (SAGE)  have emerged as the leading transcript profiling technologies in the global analysis of biological systems. One of the high throughput technologies, high-density oligonucleotide GeneChip® microarrays, manufactured by Affymetrix [1,3,9], makes it possible to simultaneously measure the relative abundance of thousands of mRNAs in a cell. However, DNA microarray technology is limited by its insensitivity to transcripts of low abundance . A similar low sensitivity was also seen with SAGE . However, recently a PCR-select cDNA subtraction method (called suppressive, subtractive hybridization, or SSH) was developed by Clontech, which, due to a normalization step to equalize the abundance of cDNAs within the target population, makes it possible to detect some low abundance transcripts [4,7]. Although custom DNA microarrays have been used in combination with the cDNA subtraction technology in identifying differentially expressed genes [11-15], no direct comparison of the sensitivity and bias of the SSH and GeneChip technologies has been done so far.
In order to comparatively evaluate the SSH and GeneChip technologies, we explored the similarities and differences in regulated genes discovered using SSH and GeneChip microarrays. We compared the regulated genes identified through SSH with the genes found to be differentially regulated using the GeneChip microarrays in a human dendritic cell (DC) differentiation paradigm. We regard this as a "case study" of the potential for novel gene discovery using SSH methodology, that would not be accessible using Affymetrix profiling alone. The same RNA samples isolated from immature DC (iDC) and RNA samples isolated from monocytes were used for GeneChip microarray probing and SSH library construction. Overall, about two thirds of the transcripts identified using SSH methodology were not identified using GeneChip microarrays alone. These results suggest that SSH could be used as an alternative and complimentary transcript profiling tool to Affymetrix GeneChip microarrays, especially in identifying novel genes or transcripts of low abundance.
Genes not represented on Affymetrix GeneChip microarrays can be identified through SSH
Reciprocally subtracted cDNA libraries between immature dendritic cells (iDC) and monocytes were generated using the SSH technology developed by Clontech (H56 stands for iDC minus monocytes, and H57 stands for monocytes minus iDC). Deep sequencing of the SSH cDNA libraries were carried out by randomly picking 10000 clones from each library for DNA sequence analysis. After accumulating more than 8000 lanes of successful sequences for each library, a total of 1940 transcripts were identified for H56. These transcripts were further extended in silico as described in Methods (Figure 1), and the extended transcripts were mapped to Affymetrix GeneChip microarray qualifiers. About 73% of the transcripts identified in H56 were represented on the U95 series GeneChip microarrays (Figure 2). However, about 17% of the transcripts identified through deep sequencing of the cDNA subtractive library were not represented on those chips hence could not have been identified in a study based purely on GeneChip microarray analysis. This number did not change significantly when the newer U133 series GeneChip microarrays were used (data not shown). In addition, about 10% of the transcripts identified appeared to be novel sequences without any match in the three cDNA databases we searched, and it is unlikely that these transcripts were represented on GeneChip microarrays.
Figure 1. Annotation and mapping of SSH sequences to GeneChip® microarray qualifiers (see Methods).
Figure 2. Genes not represented on Affymetrix GeneChip microarrays identified through SSH: Out of the 1940 transcripts in H56 SSH library, 1409 are represented on the U95 series GeneChip microarrays. 333 of the transcripts have matches in the DNA sequence databases searched, but are not represented on the U95 series GeneChip microarray. 198 of the transcripts have no match in any of the DNA sequence databases searched.
Certain genes present in the SSH libraries and represented on GeneChip microarrays were not detected through GeneChip microarray analysis
To find out how many of the SSH detected genes with probes on the GeneChip microarrays can actually be detected by the Affymetrix technology, we used the subtracted cDNA to synthesize labeled complimentary RNA (cRNA) targets for the GeneChip microarrays. A T7 promoter in the SSH PCR primers allows us to perform in vitro transcription with the SSH cDNA. Since the SSH library is derived from cDNA primed from the 3' end, we assume that any transcripts detected by sequencing the library are potentially represented by 3' fragments, whether or not the sequenced fragment is localized to the 3' end of the transcript. This is critical because Affymetrix probes are designed to interact with the 3' regions of the targeted transcripts. When the GeneChip microarrays were screened with targets made from the SSH cDNA, 571 out of the 1409 transcripts were given "absent" calls, suggesting that no positive signals can be detected on the GeneChip microarrays for these transcripts, even though the presence of these 571 transcripts had been confirmed by sequencing (Figure 3a). Next we asked the question whether the transcripts undetectable by the GeneChip microarrays were limited only to transcripts of low abundance. Although the abundance of genes had been normalized in the SSH, the frequency of each gene in the SSH cDNA library can still be used as a relative indicator for its abundance because the same SSH cDNA samples were used to label the cRNA targets for the GeneChip microarrays. Here we used the number of sequenced cDNA clones belonging to each transcript as a measurement of the copy number of this transcript in the SSH cDNA library. As shown in Figure 3b, the transcripts scored as "absent" using GeneChip microarrays include genes with high and low copy numbers. The distribution pattern of the copy numbers of this group was very similar to the group of transcripts scored as "present" by the GeneChip microarrays. This suggests that there are some inefficiencies in the GeneChip microarray technology that are independent of transcript abundance.
Figure 3. Comparing the SSH data with GeneChip microarray data using subtracted samples as targets. The GeneChip microarrays were screened with cRNA targets made from the same subtracted cDNA used for SSH. (a) Number of transcripts in the H56 SSH library identified as "present" or "absent" on the GeneChip microarrays. (b) The copy number of each transcript in the SSH library plotted against its detectability on the GeneChip microarrays. Each dot represents a distinct transcript identified in the H56 SSH cDNA library. The transcripts that can be detected by the GeneChip microarrays were given "present" calls, while the transcripts that cannot be detected by the GeneChip microarrays were given "absent" calls.
Discrepancy between genes identified through SSH and genes identified through GeneChip microarray analysis
Among the transcripts that were found in the H56 library (iDC minus monocytes) and were also detectable on the GeneChip microarrays, about a third were concordantly discovered as up-regulated in iDC based on GeneChip microarray profiling using non-subtracted RNA samples (labeled as "GeneChip Concordant" in Figure 4a). The remaining two thirds were either not inferred as differentially regulated (labeled as "GeneChip no Change" in Figure 4a), or were called down-regulated in iDC according to GeneChip microarray profiling data, contradictory to their presence in the iDC minus monocytes SSH cDNA library (labeled as "GeneChip Contradictory" in Figure 4a). As expected, the fraction appearing to contradict the results of GeneChip microarray profiling is diminished somewhat by subtracting out the 408 genes that appeared in both of the two reciprocally subtracted libraries H56 and H57 (Figure 4b). This underscores the importance of generating reciprocal pairs of SSH libraries.
Figure 4. Comparing the SSH data with GeneChip® data using non-subtracted RNA as targets. The GeneChip microarrays were screened with cRNA targets made from un-modified iDC and monocyte RNA samples, and the concordance of the SSH data with GeneChip data was shown when (a) all the transcripts in the H56 library were considered, or (b) when only the transcripts unique to H56 were considered, after the transcripts appeared in both H56 and the reciprocally subtracted H57 libraries were excluded.
Real time RT-PCR analysis of selective genes identified through SSH
To find out how genes with conflicting SSH data and GeneChip microarray data are differentially expressed, we used the more sensitive real time RT-PCR (TaqMan® analysis) to quantitate the RNA levels of selected genes in the iDC and monocyte samples used for both the SSH and GeneChip microarray analysis. As shown in Table 2, among the 4 genes that appeared only in the H56 library (iDC minus monocyte), but were suggested by GeneChip microarray profiling to be upregulated in monocytes, three of them have higher levels of expression in iDC, while one of them has higher level of expression in monocytes. These data suggests that there are false positives in both SSH data and GeneChips® profiling data. By using SSH in addition to GeneChip microarray profiling, we can identify some differentially expressed genes with false GeneChip microarray profiling results. However, more sensitive RNA quantitative measures, such as real-time RT-PCR analysis, are needed for more reliable verification of these differentially expressed genes. Since all RNA quantitation methods, including real-time RT-PCR, have their limitations, further validation of the differential gene expression pattern might need to be carried out. For examples, Northen blot may not be as sensitive as real-time RT-PCR, but the size of the bands on the blot may be used as indications for the specificity of the signals. If antibodies are available for the gene products under study, Western blot, flow cytometry and other protein analysis tools may also be used to verify the differentially gene expression pattern.
In this study, we evaluated the similarities and differences in genes discovered using SSH and GeneChip microarrays by comparing the genes found to be differentially expressed during DC differentiation from monocytes using these two technical approaches. Our results showed that among the genes identified in the SSH libraries, more than half of those genes would not have been identified as differentially expressed by using GeneChip microarrays alone. Some of these genes were either novel or not represented on the GeneChip microarrays. However, a significant number of genes were missed by GeneChip microarray analysis despite the presence of probe sets for these genes on the microarrays; whether this number could be lower if the new and improved U133 series GeneChip microarrays were used remains untested.
DNA microarrays are powerful tools that enable the global analysis of a variety of complex biological systems. The expression levels of thousands of genes can be monitored simultaneously by using this high throughput, cost effective technology. However, this technology is also limited by its insensitivity to identify transcripts of low abundance, i.e. genes expressed at low levels or in a small fraction of the cells studied. Even some transcripts of high abundance could be missed by DNA microarrays as well due to the poor hybridization between the probes and the labeled cRNA targets. One factor that could affect the hybridization step is the sequence targeted by the GeneChip probes. Since the GeneChip probes are 3'-biased to match the target generation characteristics of the sample amplification method, the sensitivity of some probes could be compromised either due to their positioning toward the 5' region, or the poor in vitro transcription efficiency caused by the complexity of their sequences. The complexity of these targeted sequences may also affect the hybridization efficiency between the labeled cRNA targets and the GeneChip probes. On the other hand, the normalization step in the SSH protocol equalizes the abundance of cDNAs within the target population and the subtraction step excludes the common sequences between the target and driver populations. So a comprehensive analysis of at least 5000 to 10000 clones isolated from the SSH cDNA libraries may enable the detection of some transcripts of low abundance that would not be revealed by other transcript profiling protocols. Genes not represented on the DNA microarrays, including some genes with novel identities may also be identified through sequencing the SSH cDNA libraries. However, the construction and sequencing of subtractive cDNA libraries is time consuming and labor intensive. These restrictions will limit the number of samples that can be surveyed by this technology in each study.
In practice, we suggest DNA microarrays as the preferred approach for transcript profiling of a large number of samples. This is especially true when the RNA is derived from homogenous cell populations. . However, in a number of cases, such as clinical tissues, the relevant cell type may be difficult to purify or in low abundance. In these cases, normalized subtractive cDNA libraries are preferable. Our results indicate that even though DNA microarrays and SSH may each be preferred in distinct situations, neither technique can adequately identify all regulated genes. Thus, even when homogenous cell populations were examined as we did in this study, more than half of the genes discovered through sequencing the SSH libraries would not have been identified by using GeneChip® technology alone. In conclusion, using normalized cDNA subtraction as an alternative and complementary transcript profiling tool to DNA microarrays will help identify novel genes and low abundance transcripts, therefore achieving a more comprehensive global view of the transcriptome in the biological system studied.
iDC generation and RNA preparation
CD14+ monocytes were isolated from the peripheral blood samples of healthy donors by negative selection using magnetic cell-sorting (Miltenyi, Auburn, CA) and differentiated into immature dendritic cells (iDC) in RPMI/10%FBS containing 1000 U/ml GM-CSF and 1000 U/ml IL-4 (Peprotech, Rocky Hill, NJ) [16-18]. Total RNA of monocytes and iDC was isolated using RNAeasy minikit (Qiagen, Valencia, CA).
Affymetrix GeneChip® Microarray studies
The cRNA labeling and hybridizations were performed according to protocols from Affymetrix Inc. (Santa Clara, CA). Briefly, the mRNA in 5 μg of total cellular RNA was converted to double-stranded cDNA using Superscript (Gibco-Invitrogen) with a T7-(dT)24 primer containing T7 RNA polymerase promoter. The cDNA was in vitro transcribed to biotinylated complementary RNA (cRNA) by incorporating biotin-CTP and biotin-UTP using Enzo BioArray High Yield RNA labeling kit (Enzo Diagnostics, New York, NY). Biotinylated cRNA from each sample was fragmented to approximately 40–100 bases and 10 μg of the fragmented cRNA were hybridized to the Affymetrix human U95 probe array series (A, B, C, D, and E) for 16 h at 45°C with constant rotation at 60 rpm. Following washes, the hybridized chips were sequentially stained with streptavidin-phytoerythrin (Molecular Probes, Eugene, OR), biotinylated goat anti-streptavidin (Vector Laboratories, Burlingame, CA) and another streptavidin-phytoerythrin for signal amplification. After a series of washes, chips were scanned with an argon-ion laser confocal microscope (Hewlett-Packard, Palo Alto, CA) for fluorescence signal detection. All washes and staining procedures were performed on an Affymetrix Fluidics station. The raw expression data derived from Affymetrix Microarray Suite 4.0.1 software gave each transcript an absolute expression level (signal intensity) and a "present" or "absent" call based on the signal/noise ratio. The data were analyzed on two levels. At the detection level, a call of "present" suggests that positive signal is detected for a probe, while a call of "absence" suggests that negative signal is detected for a probe. Gene expression ratio of different samples for each donor was inferred using the PFOLD algorithm  that employs a Bayesian estimation scheme for estimating the fold-change of gene expression and also the significance of the change (P-value). The comparison level analysis of the iDC and monocytes defines a gene as up-regulated if the signal log ratio between the iDC and monocyte samples is larger than 1 (equals a 2-fold increase) and the target sample is present. RNA samples from 3 individuals were analyzed.
The construction and sequencing of subtraction suppression hybridization (SSH) cDNA libraries
SSH libraries were generated using the reagents and protocols provided by Clontech (Clontech, Palo Alto, CA). In one SSH library (H56), the RNA from iDC was used as "tester" and the RNA from monocytes was used as "driver". In another SSH library (H57), the RNA from iDC was used as "driver" and the RNA from monocytes was used as "tester". In both cases, the starting RNA material was a pool of the RNA samples from 3 individuals used in the microarray experiment. RT-PCR analysis of the SSH products showed that the level of the house-keeping gene GAPDH decreased more than 1000 fold in both H56 and H57 cDNA when compared with unsubtracted cDNA (data not shown), suggesting that the subtraction procedure was very effective. 10000 clones from each SSH library were sequenced with M13 primers using the ABI BigDye Terminator v2.0 Cycle Sequencing Kit (Applied Biosystems, Foster City, California) and ABI 3700 DNA Analyzers (Applied Biosystems), according to the manufacturers' protocols and manuals. The SSH cDNA was also used to prepare the cRNA for GeneChip microarrays. In vitro transcription was carried out from the T7 promoter in the PCR primers for SSH. The cRNA generated was used for GeneChip microarray hybridization as described above.
Annotation of the sequence results
Sequences generated through the deep sequencing were clustered into contigs before being submitted to BLAST searches of various online databases to elucidate the identity of clones. These included the National Center for Biotechnology Information (NCBI) nr (nonredundant GenBank, EMBL, DDBJ, and PDB), EST (nonredundant GenBank, EMBL, and DDBJ EST divisions), Incyte LifeSeq® database (Incyte, Palo Alto, CA) and Celera database (Celera, Rockville, MD). The sequences extended in silico were used to search for correspondent qualifiers on the U95 series of GeneChip microarrays (Figure 1).
Real-time RT-PCR analysis
The probes and primers used in the TaqMan® (Applied Biosystems, Foster City, CA) analysis are listed in Table 1. Isolated RNA was treated with Dnase I at 37°C for 1 hour to remove any genomic DNA contamination and first-strand cDNA was then synthesized from the DNased RNA using the ABI Reverse Transcriptase Kit (PE Applied Biosystems, Foster City, CA) in a reaction with RNA at 20 ng/μl. cDNA samples were diluted in TE buffer at 1:20 and plated in triplicate in adjacent wells at 10 μl in a 96-well MicroAmp Optical plate (PE Applied Biosystems, Foster City, CA). Three wells without any template were also included on each plate as negative controls. Real time TaqMan PCR was performed using the ABI PRISM 7700 Sequence Detection System (PE Applied Biosystems, Foster City, CA). GAPDH was amplified along with the target gene as an endogenous control in each well with VIC-labeled probe to normalize expression between different samples. The probes and primers for target gene and GAPDH were diluted in the TaqMan Universal PCR Master Mix (PE Applied Biosystems, Foster City, CA) and 15 μl of the reaction mix were added to each well. The probe and primer concentrations in the final 25 μl reaction mix were 250 nM and 900 nM for target gene, and 100 nM and 50 nM for GAPDH. Reactions were performed by an initial incubation at 50°C for 2 min and at 95°C for 10 min, and then cycled at 95°C for 15 sec and 60°C for 1 min for 40 cycles. Output data generated by the instrument on-board software Sequence Detector Version 1.6.3 (PE Applied Biosystems, Foster City, CA) were transferred to a custom designed Microsoft Excel macro for analysis. The differential mRNA expression of each studied gene was calculated with the comparative Ct method using the formula 2ΔΔCt. Here, ΔCt stands for the difference between the target gene and the endogenous control, GAPDH, adjusted by the Ct difference between these 2 genes in negative controls, and ΔΔCt equals to the difference between the ΔCt value of the target gene in iDC and monocyte samples.
WC and AZ conceived of the study. WC is responsible for the study designing, construction of the SSH cDNA libraries, data analysis and manuscript preparation. CE participated in the design of the study and coordinating the research. HL, FL, JL, HC, NG and LL are responsible for the sequence annotation, microarray data analysis and statistical analysis. CD, MW and KW carried out the sequencing of the libraries. RD carried out the real-time RT-PCR analysis. XZ, YC, PSW and SB carried out the microarray studies. AGS and ZT provided the RNA samples for all the studies.
The authors are grateful to Drs. Robert A Lewis and Michael Tocci for their support and insights to this project, and to Drs. Chang S. Hahn and Timothy Connolly for their helpful discussion and suggestions.
Biotechniques 1995, 19:442-447. PubMed Abstract
Science 1995, 270:467-470. PubMed Abstract
Diatchenko L, Lau YF, Campbell AP, Chenchik A, Moqadam F, Huang B, Lukyanov S, Lukyanov K, Gurskaya N, Sverdlov ED, Siebert PD: Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries.
Science 1993, 259:946-951. PubMed Abstract
Methods Enzymol 1999, 303:349-380. PubMed Abstract
Science 1995, 270:484-487. PubMed Abstract
Evans SJ, Datson NA, Kabbaj M, Thompson RC, Vreugdenhil E, De Kloet ER, Watson SJ, Akil H: Evaluation of Affymetrix Gene Chip sensitivity in rat hippocampal tissue using SAGE analysis. Serial Analysis of Gene Expression.
Maekawa T, Bernier F, Sato M, Nomura S, Singh M, Inoue Y, Tokunaga T, Imai H, Yokoyama M, Reimold A, Glimcher LH, Ishii S: Mouse ATF-2 null mutants display features of a severe type of meconium aspiration syndrome.
Villaret DB, Wang T, Dillon D, Xu J, Sivam D, Cheever MA, Reed SG: Identification of genes overexpressed in head and neck squamous cell carcinoma using a combination of complementary DNA subtraction and microarray analysis.
Wang T, Hopkins D, Schmidt C, Silva S, Houghton R, Takita H, Repasky E, Reed SG: Identification of genes differentially over-expressed in lung squamous cell carcinoma using combination of cDNA subtraction and microarray analysis.
Xu J, Stolk JA, Zhang X, Silva SJ, Houghton RL, Matsumura M, Vedvick TS, Leslie KB, Badaro R, Reed SG: Identification of differentially expressed genes in human prostate cancer using subtraction and microarray.
Biotechniques 2001, 31:782-4, 786. PubMed Abstract
Eur J Immunol 1997, 27:431-441. PubMed Abstract