Abstract
Background
Biological tissues consist of various cell types that differentially contribute to physiological and pathophysiological processes. Determining and analyzing cell typespecific gene expression under diverse conditions is therefore a central aim of biomedical research. The present study compares gene expression profiles in whole tissues and isolated cell fractions purified from these tissues in patients with rheumatoid arthritis and osteoarthritis.
Results
The expression profiles of the whole tissues were compared to computationally reconstituted expression profiles that combine the expression profiles of the isolated cell fractions (macrophages, fibroblasts, and nonadherent cells) according to their relative mRNA proportions in the tissue. The mRNA proportions were determined by trimmed robust regression using only the most robustlyexpressed genes (1/3 to 1/2 of all measured genes), i.e. those showing the most similar expression in tissue and isolated cell fractions. The relative mRNA proportions were determined using several different chip evaluation methods, among which the MAS 5.0 signal algorithm appeared to be most robust. The computed mRNA proportions agreed well with the cell proportions determined by immunohistochemistry except for a minor number of outliers. Genes that were either regulated (i.e. differentiallyexpressed in tissue and isolated cell fractions) or robustlyexpressed in all patients were identified using different test statistics.
Conclusion
Robust Computational Reconstitution uses an intermediate number of robustlyexpressed genes to estimate the relative mRNA proportions. This avoids both the exclusive dependence on the robust expression of individual, highly cell typespecific marker genes and the bias towards an equal distribution upon inclusion of all genes for computation.
Background
The comparative analysis of gene expression in diseased tissues and its isolated cell fractions can be used to identify genes with potential pathophysiological relevance, including those involved in interactions among different cell types. In the present study 'isolated cell fractions' are defined as cultivated cell populations of individual cell types purified from the respective tissue samples. A direct approach to the gene expression of specific cell types in the tissue is their microdissection from the tissue. Isolation and amplification of mRNA from microdissected single cells or pure cell type subpopulations has recently been established and described [1,2]. However, this method is just emerging, still having technical problems with reliable cell type markers, exact dissection, and representative mRNA extraction and amplification [3,4]. Therefore, instead of comparing gene expression profiles of individual cell types between tissue and isolated cell fractions, the present study compared the gene expression profiles of whole tissues and computationally reconstituted expression profiles that combine the expression profiles of the isolated cell fractions according to their relative mRNA proportions in the tissue. These relative mRNA proportions were determined using trimmed robust regression.
Methods for the reconstruction of cell typespecific expression profiles and relative proportions have already been proposed in the literature. The marker gene approach [5,6] determines the relative mRNA proportions from the expression of highly cell typespecific marker genes. A drawback of this method is its dependence on the robust expression of single genes. Venet et al. [7], Stuart et al. [8], and Lähdesmaeki et al. [9] identified cell typespecific expression profiles from tissue samples differing in their cell type composition. Venet et al. [7] and Lähdesmaeki et al. [9] computed the cell typespecific expression profiles and their corresponding relative proportions simultaneously (matrix factorization of the tissue gene expression matrix), whereas Stuart et al. [8] determined the cell proportions experimentally and then calculated the respective expression values (genewise regression). The method of Lu et al. [10] and the present study are different from the three previous approaches in that they use actually measured, cell typespecific expression profiles and determine the relative mRNA proportions computationally (tissuewise regression). Whereas Lu et al. [10] compared desynchronized yeast cell 'tissues' and five isolated cell fractions consisting of synchronized yeast cells in the G_{1}, S, G_{2}, M, and M/G_{1 }cell cycle phases, the present study compares synovial tissues with the isolated cell fractions of adherent macrophages, adherent fibroblasts. and nonadherent cells. The study of Lu et al. [10], however, did not address the relative importance of regulated gene expression resulting from the induction of cell cycle arrest. In contrast, the present study demonstrates for the first time that in order to avoid a bias towards an equal distribution of the computed mRNA proportions, as many regulated genes as possible must be excluded from the analysis. Here, 'regulated genes' are defined as showing a differential expression between tissue and isolated cell fractions, whereas 'robustlyexpressed genes' show a similar expression under both conditions. Differential gene expression was also investigated in the study of Ghosh [11], in which the gene expression of tumor tissue samples was compared to that of normal tissue controls (probabilistic mixture model). This study accounted for the varying relative proportions of stromal tissue within the tumor samples and assumed the gene expression in tumor and normal tissue to be perfectly robust, i.e. independent of their relative proportions in the sample (as is also the case in the studies of Venet et al. [7]. Stuart et al. [8], Lähdesmaeki et al. [9], and Lu et al. [10]). Thus, differential expression in the study of Ghosh [11] refers to expression differences between tumor and normal tissue, whereas regulated expression in the present study refers to expression differences between two conditions (tissue versus isolated cell fractions).
Synovial membranes (inner aspect of the joint capsule) consist of three main cell types: macrophages and fibroblasts, both adhering to the culture vessel, and mixed nonadherent cells. These cell types are expected to contribute differentially to the pathogenesis of rheumatic diseases by expressing proinflammatory and prodestructive genes. Therefore, gene expression profiles of the whole synovial tissue and the respective isolated cell fractions of patients with rheumatoid arthritis (RA) and osteoarthritis (OA) were analyzed using Affymetrix GeneChip technology. The performance of the newly developed reconstitution algorithm was validated using two mRNA mixing experiments performed by the authors and the mixing part of the GeneLogic dilution study [12]. The computed mRNA proportions were compared to the cell proportions determined by immunohistochemistry. Regulated and robustlyexpressed genes selected according to statistical evidence and pathophysiological relevance are listed in the supplementary material.
Results
The relative mRNA proportions of macrophages (p_{M}), fibroblasts (p_{F}), and nonadherent cells (p_{N}) were determined for each tissue sample by matching the measured synovial tissue expression profile S and the computationally reconstituted expression profile S* = p_{M }M + p_{F }F + p_{N }N, which in turn combines the measured expression profiles of the isolated cell fractions of macrophages (M), fibroblasts (F), and nonadherent cells (N) according to their relative mRNA proportions (Figure 1).
Figure 1. Computational reconstitution. Schematic view of the gene expression in the synovial tissue (left) and the computationally reconstituted tissue profile (right). The expression of the synovial tissue is composed of the expression of macrophages
, fibroblasts , and nonadherent cells . The expression of the whole tissue is measured as the overall expression profile S. The computationally reconstituted tissue profile S* is composed of the measured expression profiles of the isolated cell fractions, i.e. of macrophages M, fibroblasts F, and nonadherent cells N. The relative mRNA proportions p_{M}, p_{F}, and p_{N }are determined by matching the computationally reconstituted profile S* with the measured tissue profile S, The measured expression profiles are enframed using solid lines.The matching of S and S* as a function of the relative mRNA proportions was performed using trimmed robust regression (Methods section, subsection Mathematical model). Trimmed regression only uses part of the data in order to exclude outliers that otherwise would bias the result. Here, the result is given by the set of relative mRNA proportions that minimize the differences between S and S* as quantified by the regression objective function. Solutions to trimmed regression problems are in general not unique due to the data subset choice, however, the solutions generally become more alike with increasing subset size. In the present study, the trimmed regression problem was solved for an increasing number of included genes. For each such number the optimization routine was initialized from several different starting values. The standard deviation of the results obtained from these random initializations was used to assess the similarity of the solutions. The relative mRNA proportions were determined from the respective ensemble means at the minimum number of included genes, for which the ensemble standard deviations approached zero, i.e. for which the solution was almost unique. This 'educated guess' (heuristic) approach accounts for the fact that the solution is completely indeterminate towards zero included genes and increasingly biased towards an equal distribution upon inclusion of all genes (Methods section, subsection Mathematical model). The general performance of this methodology is demonstrated in this first subsection. In order to assess the effect of data preprocessing on the present results the influence of different chip evaluation methods was investigated (Different chip evaluation methods subsection). The preliminary knowledge of the true relative mRNA proportions in mRNA mixing experiments was used to validate the present methodology as well as the chip evaluation methods (Mixing experiments subsection). In addition, the computed mRNA proportions were compared to the cell proportions determined by immunohistochemistry and those obtained using the marker gene approach (Immunohistochemistry and marker genes subsection). Finally, genes that were regulated and robustlyexpressed in all patients were identified using different test statistics (Regulated and robustlyexpressed genes subsection).
General performance
The proposed method for the determination of the relative mRNA proportions is demonstrated using the data of patient 2. Figure 2 shows how the means and standard deviations of the computed relative mRNA proportions of macrophages (p_{M}), fibroblasts (p_{F}), and nonadherent cells (p_{N}) depend on the number k of genes included in the trimmed regression approach (Methods section, subsection Mathematical model). The mean is variable for small k. settles to a constant value for intermediate k and shows a slow and incomplete convergence towards an equal distribution (p_{M }= p_{F }= p_{N }= 1/3) for large k. The respective standard deviations approach zero at an intermediate number of included genes. This number was used to determine the mRNA proportions from the respective mean values. The solid curves correspond to data, for which an additional local regression normalization was performed for the measured and reconstituted tissue profiles in each algorithmic step (Methods section, subsection Data preparation). The decrease of the standard deviation is faster for additional local regression normalization in this case, but the resulting mRNA proportions are similar.
Figure 2. Results for patient 2. Mean and standard deviation of the computed relative mRNA proportions for macrophages (macro), fibroblasts (fibro) and nonadherent cells (nadc) for patient 2 as a function of the number k of genes included in the trimmed regression approach (Methods section, subsection Mathematical model). The measured expression profiles are MASS probe set summaries normalized by a symmetric trimmed mean (t) (Methods section, subsection Data preparation). The reconstituted tissue profile is either not normalized (dashed) or normalized by stepwise local regression (lr, solid) (Methods section, Data preparation). The proportions are determined from the respective means at the number of genes for which the standard deviations approach zero. The chosen numbers are indicated by the dashed and solid (lr) vertical lines, respectively. The convergence is faster for additional local regression (lr), but the resulting proportions are similar. The mRNA proportions are estimated to be p_{M }= 0.82/0.79 (lr) for macrophages, p_{F }= 0.09/0.06 (lr) for fibroblasts and p_{N }= 0.10/0.15 (lr) for nonadherent cells. These values were determined at 6000 and 5000 (lr) included genes, respectively.
The general curve progression in Figure 2 is similar to that obtained from a probabilistic model described in Supplementary Section A (Supplementary Figure A.2, see 1). However, the best estimate of the 'true' tissue mRNA proportions assumed by the model was obtained for the smallest number of included genes (in contrast to an intermediate number in Figure 2). This difference can be attributed to the striking roughness and overall flatness of the empirical objective function (3) used for trimmed regression (Methods section, subsection Mathematical model; Supplementary Figure A.3, see 1), if only a small number of genes is included. Hence, the means of the computed relative proportions converge to an equal distribution towards both ends. For increasing k, this is due to the inclusion of more and more regulated genes with expression levels noncorrelated or negatively correlated between tissue and isolated cell fractions (Supplementary Section A.2, see 1). For decreasing k, this is due to the increasing roughness (increasing number of local minima) and overall flatness of the objective function (Methods section, subsection Mathematical model; Supplementary Section A.3, see 1). The relative mRNA proportions of the tissue can be estimated from the respective means of the computed mRNA proportions, as soon as the respective standard deviations become sufficiently small. The distribution of the individual mRNA proportions can then be assumed to be symmetric, implying the equality of mean and global minimum [13]. The drop of the standard deviation was indeed the most important criterion, but the curve progression and the agreement between the curves for the two different chip normalization methods (with and without local regression) were also taken into account.
Additional File 1. Supplementary Material. Selfcontained pdffile providing supplementary information. Supplementary sections A – H.
Format: PDF Size: 3.7MB Download file
This file can be viewed with: Adobe Acrobat Reader
Different chip evaluation methods
There is an ongoing discussion about the question which probe set summary and which chip normalization method proves optimal for evaluating oligonucleotide microarrays. For assessing the effect of different chip evaluation methods on the results obtained by Robust Computational Reconstitution, four different probe set summaries were applied: MASS, MASC [14,15], RMA [16], and MBEI [17] (Methods section, subsection Data preparation). In addition, four different normalization methods were tested for the MASS and MASC summaries: trimmed mean only (t) [14,15], trimmed mean plus cyclic local regression (clr) [18], quantile normalization (q) [18,19], or centralization (c) [20] (Methods section, subsection Data preparation). The results are summarized in Table 1.
Table 1. Computed relative mRNA proportions. Computed relative mRNA proportions p_{M }(macrophages) and p_{F }(fibroblasts) for patients 1–6 calculated for different chip evaluation methods. The proportion of nonadherent cells is p_{N }= 1  p_{M } p_{F}. The primed 1 in 'patient 1' (RA)' indicates the use of the second tissue chip replicate for patient 1. Additional stepwise local regression normalization is indicated by lr. The proportions are given in percent. Probe set summaries: MASS: Microarray Suite 5.0 signal algorithm, RMA: Robust Multiarray Analysis, MBEI: Model Based Expression Index, GP: GenePublisher (not available for HGU95A chips, patient 1), the R implementation of MBEI was used with the same settings as in GP. Normalization methods: t: trimmed mean, clr: cyclic local regression, q: quantile normalization, c: centralization.
The computed mRNA fractions vary considerably among different probe set summaries and, for MASS and MASC, among different normalization methods. The methodological scatter differed among patients. It was quantified in terms of the pooled Mean Absolute Deviation (MAD) of the relative proportions across methods calculated with regard to the respective mean values, i.e. MAD = , in which denotes the mean across methods (m) for a given cell type (C) (Table 2). The hybridization for patient 1 was repeated two times. Using the original tissue chip (HGU95A), an MAD of 15% (in absolute value) was calculated. It was reduced to 5% and 9% using the second and third tissue chip replicate (HGU95Av2), respectively (Table 2). Considering only the four different chip normalization methods applied to the MASS summaries resulted in a lower MAD for all patients, except for patient 2. Additional stepwise local regression normalization (lr) had little effect when all methods were considered, however, the MAD was greatly reduced by lr for the MASS normalization methods (all within 1–3%, Table 2).
Table 2. MAD of the computed mRNA proportions. Pooled mean absolute deviation (MAD) of the relative mRNA proportions across different chip evaluation methods, calculated with regard to the respective mean values. 'MAD all' refers to all seven methods: the four different normalizations of MASS, as well as RMA, MBEI, and MBEIGP, whereas 'MAD MAS' only refers to the four different normalization methods of MASS. The additional application of stepwise local regression normalization is indicated by lr. The MAD refers to the proportions of Table 1 (given in percent).
The MASS summaries were preferred to the MASC, RMA, and MBEI summaries in this study because the agreement between the curves with and without stepwise local regression normalization (lr) was generally better for the MASS summaries (Figure 2 and Supplementary Figures C.1 and C.2, see 1). In addition. RMA, MBEI, and MASC tended to more extreme proportions close to 100% for patients 2, 5, and 6 (Table 1 and Supplementary Table C.1, see 1). In view of the relative cell proportions determined by immunohistochemistry (see subsection Immunohistochemistry and marker genes), the less extreme values obtained from MASS are more plausible. Moreover, the use of MASC resulted in irregular curves for patient 1 (original tissue chip, Supplementary Figure C.2, see 1).
The results obtained by excluding weaklyexpressed genes (either < 100 or < 200 signal intensity) were within the range of the values listed in Table 1. This was also true for data filtering by Affymetrix present calls and transformation of the data according to the variance stabilization method of Huber et al. [21,22] (this is presumably due to the fact that the data are already corrected by MAS 5.0 in order to avoid negative intensities). Masking of outliers as performed by the MAS 5.0 software also did not alter the results. This suggests that measurement errors associated with weaklyexpressed genes, outliers, or saturation of certain probe sets play a minor role for the estimation of the relative mRNA proportions using trimmed robust regression.
The effect of the experimental chip quality was assessed by correlating the MAD of the computed mRNA fractions (Table 2) with several relevant chip operating figures provided by the Affymetrix 'Expression Report' (Supplementary Section D, Supplementary Figures D.1 and D.2, see 1). Generally, the MAD was positively correlated with the noise and background level of the chips. However, due to the small number of patients the results strongly depended on individual patients (inhomogeneity correlation), in particular on patient 1 (original tissue chip).
Mixing experiments
In mixing experiments, mRNA samples from different sources are mixed together in defined proportions and subsequently hybridized to chips. These experiments were used to validate the present computational approach. Two mixing experiments were performed by the authors with material from patient 2. The other experimental data were derived from the mixing part of the GeneLogic dilution study [12].
The target values for the mRNA proportions of the first two mixing experiments were set according to the range of values computed for patient 2 (Table 1). Figure 3 shows the results for the second mixing experiment using trimmed mean normalized MASS probe set summaries. The proportions are virtually fixed from the beginning and the standard deviations drop at only a few thousand included genes. This picture was typical for all mixing experiments, probe set summaries, and chip normalization methods of the present study. The number of included genes, at which the computed mRNA proportions were determined, was between 2000 and 4000 in all cases.
Figure 3. Results for mixing experiment 2. Mean and standard deviation of the computed mRNA proportions of macrophages (macro), fibroblasts (fibro) and nonadherent cells (nadc) for the second mixing experiment as a function of the number of included genes. This experiment corresponds to Figure 2, except that the tissue expression profile was not obtained by preparing mRNA from the whole synovial tissue, but by mixing mRNA samples from the isolated cell fractions according to the relative proportions p_{M }= 0.86, p_{F }= 0.11, and p_{N }= 0.03. These proportions were almost perfectly determined when no further normalization of the reconstituted tissue profile was performed (dashed curves): p_{M }= 0.86, p_{F }= 0.10, p_{N }= 0.04. Applying additional stepwise normalization (lr) (solid curves) resulted in p_{M }= 0.85, p_{F }= 0.13, p_{N }= 0.02. All proportions were determined at 4000 included genes.
Table 3 summarizes the results in terms of the pooled MAD of the relative mRNA proportions, calculated with regard to the respective target values. All methods gave better results for the second mixing experiment. This difference cannot be explained by the experimental quality, because both chips had very similar operating figures. MASC tended to perform somewhat better than MASS, and both MASS and MASC gave better results than RMA, MBEI, and MBEIGP. In contrast to MASS and MASC, the results for RMA, MBEI, and MBEIGP were improved (with one exception) by additional stepwise local regression normalization (lr).
Table 3. MAD for mixing experiments 1 and 2. Pooled MAD of the computed relative mRNA proportions in mixing experiments 1 and 2 for different chip evaluation methods. The MAD is calculated with respect to the target values p_{M }= 0.75, p_{F }= 0.11, p_{N }= 0.14 (mix 1) and p_{M }= 0.86, p_{F }= 0.11, p_{N }= 0.03 (mix 2). 'MASS mean' and 'MASC mean' denote the means with respect to the four preceeding normalization methods. Notation according to Table 1.
The GeneLogic dilution study contains 25 chips, in which mRNA samples from liver and SNB19 cells were mixed for SNBproportions of 0. 0.25, 0.50. 0.75. and 1. and subsequently hybridized to HGU95Av2 GeneChips. Replicas of these 5 mixes were processed by 5 different scanners resulting in a total of 25 experiments.
Figure 4 shows mean, standard deviation, and MAD of the computed mRNA proportions for SNBproportions of 0.25, 0.5, and 0.75. The MADs were calculated with respect to these nominal target values. The standard deviations drop very early and the proportions were generally determined at 2000 included genes, with very few exceptions. The MAD of mixes 25 and 50 is quite high (roughly 4–10%) for all chip evaluation methods possibly indicating some imprecision in sample weighing. The results summarized in Table 4 show that, as in the mixing experiments of the authors, MASC tended to perform somewhat better than MASS (except for some cases in mix 75). The results for RMA and MBEI are similar to those for the MAS summaries.
Figure 4. Results for the GeneLogic Dilution Study. Mean, standard deviation, and MAD of the computed mRNA proportions with respect to the target SNBproportions of 0.25, 0.5 and 0.75. The displayed values are averages across 5 replicas. The results were calculated using trimmed mean (t) normalized MASS probe set summaries. Additional normalization by stepwise local regression is indicated by lr (solid curves). The proportions were determined at 3000 and 2000 (lr) included genes.
Table 4. MAD for the GeneLogic Dilution Study. Pooled MAD of the computed relative mRNA proportions in mixing experiments 25, 50, and 75 for different chip evaluation methods. The respective MAD is calculated with regard to the target values p_{S }= 0.25, 0.50, and 0.75. Notation according to Table 3.
Immunohistochemistry and marker genes
Immunohistochemical staining was used to assess the cellular composition in the synovial tissue samples. The obtained cell type proportions of macrophages (), fibroblasts (), and nonadherent cells () (Table 5) were compared to the respective mRNA proportions (p_{M}, p_{F }and p_{N}) as determined by Robust Computational Reconstitution. For the comparison, each cell type C was assumed to contain a specific amount of mRNA r_{C}. The mRNA proportions of the model thus read
Table 5. Cell proportions. Relative cell proportions of adherent macrophages M, adherent fibroblasts F, and nonadherent cells N for peitients 1–6 obtained from immunohistochemistry by analyzing tissue samples in close vicinity to the sample from which the mRNA was isolated. The fraction of nonadherent cells consists of Tcells (T), Bcells (B), plasma cells (P), and endothelial cells (E). If standard deviations are given, three samples were analyzed. The proportions are given in percent.
The model effectively contains two independent parameters that were fitted by matching the mRNA proportions of the model _{C }(p^{c}) to the 18 mRNA proportions (6 patients, 3 proportions each) computed by Robust Computational Reconstitution using least squares regression (Table 1, MASS t; Figure 5). The computed mRNA and cell proportions of nonadherent cells agreed almost onetoone. Hence, the mRNA content of macrophages r_{M }and fibroblasts r_{F }were assessed relative to r_{N}. Macrophages appeared to contain four times as much mRNA as nonadherent cells (r_{M }= 3.98 r_{N}), whereas fibroblasts seemed to contain only a quarter of it (r_{F }= 0.25 r_{N}). Excluding patient 1 from the analysis somewhat improved the fit for patients 2 to 6 (data not shown) but the resulting ratios for macrophages (r_{M }= 4.56 r_{N}) and fibroblasts (r_{F }= 0.57 r_{N}) were qualitatively similar. Part of the large apparent differences in mRNA content between cell types can possibly be attributed to differential mRNA yield due to experimental methodology, like mRNA extraction and amplification [3,23]. For example, de Bruin et al. [3] observed a 3.5fold higher mRNA yield for tumor compared to stroma cells. In addition, differential specificities of the cell type markers used in immunohistochemistry may result in over or underestimation of the relative cell proportions. The assumption that the mRNA content r_{C }is independent of composition results from the usage of robustlyexpressed genes for calculating the relative mRNA proportions. Nevertheless, the fit to the computed mRNA proportions was somewhat improved by assuming the mRNA content to depend on composition (linear ansatz; data not shown). Although this result suggests a possible dependence of the cellular mRNA content on the cell type composition, data overfitting by inclusion of too many adjustable parameters has to be avoided.
Figure 5. Comparison of relative cell and mRNA proportions. Relative cell proportions determined by immunohistochemistry (open circles; Table 5), relative mRNA proportions calculated by Robust Computational Reconstitution (solid squares; Table 1, MASS t), and relative mRNA proportions according to model equation (1) (solid triangles; fit to the computed relative mRNA proportions). The square root of the coefficient of determination is R = 0.90. The regression pvalue is p = 10^{6 }(least squares regression of 18 values using 2 parameters). The fitted model mRNA proportions are similar to the cell proportions multiplied by a constant factor reflecting the respective cellular mRNA content. Part of the large apparent differences in cellular mRNA content can possibly be attributed to differential mRNA yield due to experimental methodology, like mRNA extraction and amplification, or differential specificities of the cell type markers used in immunohistochemistry.
Cell typespecific marker genes allow to determine the mRNA proportions from the ratio of the expression in the synovial tissue (s_{i}) and the respective isolated cell fraction (e.g., p_{M }= s_{i}/m_{i }for macrophages) provided the marker gene (i) is very specific (no expression in fibroblasts and nonadherent cells) and robust (same expression in tissue and isolated cell fractions). In the present study, the relative mRNA proportions calculated using Robust Computational Reconstitution and marker genes showed a markedly higher than random consistency only for fibroblasts (Supplementary Table F.1, see 1). Obviously, the two premises required by the marker gene approach, specificity and robustness, are not fulfilled simultaneously, at least not for macrophages and nonadherent cells (the former being known to be quite responsive to stimulation and the latter being subject to a confounding heterogeneity of their cell type composition).
Regulated and robustlyexpressed genes
Regulated and robustlyexpressed genes were identified using six different test statistics and logtransformed expression values of the measured (S) and reconstituted (S*) tissues of patients 1 to 6. The first four methods are well established techniques in the postgenomic era: Significance Analysis of Microarrays (SAM) [24], onesample paired ttest (ttest1), homoscedastic twosample ttest (ttest2) (being equivalent to oneway ANOVA for two groups), and VERAandSAM (V&S) [25]. In addition, two new criteria, μtest and MADtest, were introduced in the present study owing to the fact that SAM, ttest1, and ttest2 tended to select robustlyexpressed genes with low expression strength (Supplementary Figure C.2, see 1). This bias is a result of the fact that the standard error appears in the denominator of these test statistics and that weaklyexpressed genes generally show a higher standard error than stronglyexpressed genes (logtransformed data). As a consequence, the newlydefined μ and MADtest statistics, t_{μ }= μ_{1 } μ_{2 }= mean(s  s*) and t_{MAD }= mean(s  s*), respectively. (s = log S, s* = log S*), do not contain the standard error. The μtest, and also V&S, appeared to be almost unbiased with respect to expression strength. The MADtest (most strongly related to the regression objective function (3) in the Methods section, subsection Mathematical model) showed a clear preference for highlyexpressed genes when applied to identify robustlyexpressed genes (Supplementary Figure G.2, see 1). This may reflect the fact that in general small differences in gene expression are observed for highlyexpressed genes (well known from so called MAplots for logtransformed expression data). Also, highlyexpressed genes serving fundamental roles in the cell are commonly used as housekeeping genes [23]. The pvalues of ttest1 and ttest2 were calculated using both the Student distribution and the permutation method described in Storey and Tibshirani [26]. For the paired ttest1 n_{P }= 1000 permutations were used. For the unpaired ttest2 all n_{P }= (6 out of 12)/2 = 462 possibilities for obtaining different absolute values of the test statistic were enumerated. The Pearson correlation coefficient between the respective pvalues was c_{1 }= 0.999979 for ttest1 and c_{2 }= 0.999986 for ttest2 suggesting that the gene expression data of the present study are roughly lognormally distributed. Consequently, the Student distribution derived pvalues were used for ttest1 and ttest2. The pvalues of SAM and MADtest were calculated using the permutation method and n_{p }= 1000 permutations, those of μtest by enumerating all n_{p }= 462 possibilities. The λvalues of V&S are based on an expression leveldependent error model [25]. The pairwise correlations between the six statistical methods are shown in Supplementary Figure G.1 (see 1). The best correspondence was observed among SAM, ttest1, ttest2, and μtest (Pearson correlation between 0.844 and 0.977). The MADtest differed most from all other methods (Pearson correlation between 0.240 and 0.677). Also, this test statistic was the only one showing a twomodal pvalue distribution in contrast to the unimodal distributions assumed by the method of Storey and Tibshirani [26] for calculating False Discovery Rate (FDR)related qvalues. The other methods allowed to estimate the proportion of nullhypothesis genes quite consistently (between 0.5 and 0.6). Regulated and robustlyexpressed genes were identified by 1.) applying each of the six individual test statistics independently and 2.) selecting topranking genes with respect to pathophysiological relevance (Supplementary Section G, Supplementary Tables G.2 and G.3, see 1). A total of 62 regulated and 48 robustlyexpressed genes were selected. Each method contributed some selected genes and the approach of combining different complementary test statistics for gene selection, as proposed in the present study, may prove useful in future investigations.
Discussion
Determining and analyzing the gene expression profiles of different tissue cell types under diverse physiological and pathophysiological conditions is a central aim of biomedical research. However, microdissection of single cells or pure cell types, as well as mRNA isolation and amplification [1,2] still bears basic technical problems [3,4]. Therefore, gene expression profiles of whole tissue samples and purified cell types were compared in the present study. Using Robust Computational Reconstitution, the required mRNA proportions and the set of robustlyexpressed genes that allow for their determination were simultaneously identified.
The present results suggest that MASS provided the most stable probe set summaries. This is in contrast to the study of Irizarry et al. [16], but is in line with the results of Choe et al. [27]. This discrepancy may be due to the data sets used. Whereas in the study of Irizarry et al. [16] only few spikedin genes were present, Choe et al. [27] spikedin more than 1300 genes. Real world applications like the present study possibly contain more biological and technical noise and the subtraction of the mismatch probes from the perfectmatch probes (MAS 5.0) may be appropriate in this case. Recent detailed sequence level studies of Binder and Preibisch [28,29] resulted in similar conclusions. However, when Shedden et al. [30] applied their FDRbased pairwise comparison method to ovary and colon tumor data, MAS 5.0 performed inferior to most other methods.
Chipwise (nonlinear) averaging of the probes before comparison to other chips (MASS) may also be less susceptible to the effects of outliers than the initial comparison of probes between different chips and subsequent averaging of the differences (MASC). There were only few differences in the present study among the different chip normalization methods applied to the MASS and MASC summaries. However, the application of quantile normalization and also centralization resulted in more outliers compared to the remaining normalization methods. Quantile normalization also tended to perform suboptimally without additional local regression normalization in the study of Choe et al. [27].
The accuracy of Robust Computational Reconstitution depends on the fraction of robustlyexpressed genes, the quality of experimentation, and the appropriateness of the chip evaluation method. The variance is estimated to be in the range of 5–10% in absolute value based on the variability of the computed mRNA proportions among methods (Table 2) and the expected bias introduced by the imperfect robustness of even the most robustlyexpressed genes (Supplementary Figures A.1 and A.2, see 1). A sensitivity analysis (resampling of relative proportions) showed that the rank order of differentiallyexpressed genes is only moderately affected by this variance. However, the maximum shift in rank order was always large indicating that some differentiallyexpressed genes may be lost even from a moderately extended list of topranking genes. This maximum rank shift was close to that of random permutations but the third quartile of the rank shift distribution was already much smaller than that observed for random permutations (rank shift of 60 compared to 9300 for the 200 topranking genes and a resampling standard deviation of 5%).
The results for patient 1 largely differed from all other patients. On the one hand this may be due to the fact that HGU95A chips were used instead of HGU95Av2 chips. On the other hand patient 1 is the only type I RA patient (patients 2 and 3 are type II RA) [31,32]. This was confirmed by immunohistochemistry and comparison with synovial tissue gene expression profiles of 12 other classified RA patients (using the distance to the respective expression means and Affymetrix' bestmatches probe set list for matching HGU133A and HGU95Av2 chips). Hence, the differences between RA patient 1 and RA patients 2 and 3 are substantial.
Remarkably, the computed mRNA proportions considerably differ from the cell proportions determined by immunohistochemistry (Figure 5). Most striking is the prevalence of macrophage mRNA over fibroblast mRNA in contrast to the dominance of fibroblasts over macrophages in the cell composition, implying that macrophages produce 8 to 16 times as much mRNA as do fibroblasts. The results for the mRNA proportions may be biased because the isolated fractions of macrophages and nonadherent cells were not perfectly pure with respect to fibroblasts (resulting in apparently higher mRNA proportions for macrophages). Also, the mRNA yield of different cell types may be severalfold different due to experimental methodology, e.g. mRNA extraction and amplification [3,23]. Furthermore, the cell proportions as determined by immunohistochemistry may be somewhat imprecise due to the limited specificity of this semiquantitative method. Nevertheless, the augmented mRNA production of macrophages compared to fibroblasts is expected to be still valid under idealized experimental conditions. The central aim addressed by the present study was the cell typespecific gene expression in synovial tissue and its dependence on cell type composition (purified cell fractions representing an extreme case of composition). Cell typespecific gene expression was also addressed by Venet et al. [7], who assumed the individual expression profiles to be largely uncorrelated. According to our ownexperience (the correlation coefficients among the isolated cell fractions showed values as high as 0.7 and 0.9), this is an arguable assumption, as also admitted by Venet et al. [7] themselves. Lähdesmaeki et al. [9] used different constraints in their least squares approach. Similar to Venet et al. [7] and Lu et al. [10] and in contrast to the present study, they included all genes in the sum of squares and showed this to be appropriate in the case of mixing experiments (as also demonstrated in the present study). However, their method, as well as the methods of Venet et al. and Lu et al. (which is otherwise equivalent to the present approach), will give biased results in the presence of regulated (i.e. compositiondependent) gene expression. Using an intermediate number of robustlyexpressed genes (present study) avoids, on the one hand, the exclusive dependence on the robust expression of individual highly cell typespecific marker genes and, on the other hand, the bias towards an equal distribution when including all genes in matrix factorization or regression. This statement remains valid despite the inability of all current in silico microdissection methods (including our own; Methods section, subsection Mathematical model and Supplementary Section H, see 1) to directly assign compositiondependent changes in gene expression to specific cell types.
Conclusion
The proposed method of Robust Computational Reconstitution is applicable if there is a sufficient number of robustlyexpressed genes whose expression is highly correlated between tissue and isolated cell fractions. The existence of such housekeeping genes is plausible and wellknown [23] since many genes code for basic cell functions that must be maintained across different environmental conditions. The present approach improves previously published methods for the determination of the relative mRNA proportions of different cell types by the exclusion of regulated genes that bias the result towards an equal distribution. In addition, it can identify robustlyexpressed genes (representing basic metabolism or persistent pathological changes) and responsive genes that change their expression between tissue and isolated cell fractions (possibly reflecting physiological or pathological cell communication processes). Both are of biomedical interest and can be further screened for pathophysiological relevance.
Evidently, microdissection of single cells or pure cell types [1,2] is a promising experimental technique. However, it is still under steady development and, for the time being, the preferential use of macrodissection in combination with in silico microdissection techniques has been recently recommended [3]. The method of the present study can thus be readily used as an effective research tool and a supporting reference method for further studies.
Note added in proof
During the review process of the present manuscript Wang et al. [33] also published a related paper dealing with the cellular composition of complex tissues and the relative contribution of the individual cell types to tissue gene expression.
Methods
Mathematical model
The proposed method for determining the cell typespecific mRNA composition of synovial tissue samples (applicable to any other tissue) is based on a comparison between the measured expression profile S of the whole synovial tissue and the computationally reconstituted tissue profile
S* = p_{M }M + p_{F }F + p_{N }N, (2)
which is composed of the measured expression profiles of the isolated cell fractions, i.e. isolated adherent macrophages M, adherent fibroblasts F, and nonadherent cells N, according to their respective computational mRNA proportions p_{M}, p_{F}, and p_{N }(Figure 1).
The proposed computational approach aims at the simultaneous identification of the cell typespecific mRNA proportions and the particular set of robustlyexpressed genes, which allows to reliably determine these proportions. This is achieved by minimizing the trimmed sum of absolute differences
between the gene expression in the measured and the reconstituted tissue with respect to p_{M }and p_{F}, treating p_{N }= 1  p_{M } p_{F }as a dependent variable (equivalently, an optimization routine able of handling constraints could be used with all three variables). In Equation (3) s_{i }= log S_{i }and = log are logtransformed expression values in S and S*, respectively, and k denotes the number of genes included in the sum. For given values of p_{M }and p_{F}, the set of gene indices I_{k }is determined in order to minimize the objective function . More specifically, the genes are sorted with respect to their absolute differences s_{i } , in which = (p_{M}, p_{F}) depends on the current values of p_{M }and p_{F}. Subsequently, the first k differences are summed up. Clearly, the proportions uniquely determine the rank order of the genes, whereas k determines how many of the sorted differences (genes) are used to calculate the objective function value. The described method is a trimmed least modulus (L_{1}) regression. It is similar to the trimmed least squares (L_{2}) regression described in [22] and [34], but is considered to be more robust with respect to youtliers (however, graphs of the L_{1 }and L_{2 }objective functions as a function of p_{M }and p_{F }looked alike and L_{1} and L_{2}regression gave similar results; Figure 2 and Supplementary Figures A.3 and A.4. see 1). If only a small number of genes is included in the sum, the optimization routine (simplex algorithm) is likely to end up in a local minimum that depends on the starting values for p_{M }and p_{F}. This is because the trimmed sum is very similar for different proportions, if the included genes can be chosen from a much larger set of genes providing a great variety of suitable expression values (resulting in a vast number of local minima for the present data; Supplementary Figure A.3, see 1). Increasing the number of included genes results in more unique values for the computed proportions (reduced number of local minima, Supplementary Figure A.3, see 1). However, the more regulated genes are included in the sum, the more the proportions will be biased towards an equal distribution (p_{M }= p_{F }= p_{N }= 1/3) (Supplementary Section A, see 1). For this reason, the present study suggests to determine the mRNA proportions as soon as the standard deviations of the computed proportions approach zero, indicating the emergence of an unique global minimum and thus an unequivocal solution to the regression problem. The following simple algorithmic protocol was developed for the computation of the mRNA proportions (Table 6):
Table 6. Algorithmic Protocol
The results obtained from a statistical model for the expectation value [] of the objective function in Equation (3) compare well with those for the patient data (Figure 2 and Supplementary Figure A.2, see 1). The statistical model assumes normal distributed expression values. This is consistent with the fact that the data appear to be roughly lognormally distributed as evidenced by the high Pearson correlation (> 0.99997) between the ttest pvalues determined by the Student distribution on the one hand and those pvalues calculated according to the permutational method of Storey and Tibshirani [26] on the other hand (Results section, subsection Regulated and robustlyexpressed genes). The good general correspondence between the patient data and the statistical model corroborates the suitability of the proposed method for estimating the relative mRNA proportions.
The differences s_{i } in gene expression between the whole tissue and the isolated cell fractions presumably reflect transcriptional changes due to different environmental conditions and it is desirable to know how each individual cell type is actually responding to this change. However, the cell typespecific changes in gene expression can only be determined by either actually measuring the gene expression of each cell type in the tissue (using e.g. microdissection) or by applying appropriate statistical estimates. Again, by using a normal distribution model (Supplementary Section H, see 1) and results obtained from the present study (Supplementary Table H.1, see 1) it is demonstrated that the distributional parameters of the model cannot be reliably estimated based on the present gene expression data. The inability to assign cell typespecific changes in gene expression was also recognized in the study of Stuart et al. [8], in which nonlinear regression analysis revealed interactions between different cell types. The suggestion of Stuart et al. [8] to attribute the observed expression change to the cell type that expresses the respective gene most strongly is, however, not necessarily appropriate in all cases, e.g. for robustlyexpressed marker genes [5].
Experiments
Synovial tissue samples were obtained from 3 RA [35] and 3 OA patients [36] and prepared as previously described [31]. During the primary cell culture, nonadherent cells were removed by medium exchange on day 1. After 7 days, synovial fibroblasts were purified by removal of macrophages using antiCD 14 magnetobeads. Total RNA was isolated and labeled according to the supplier's (Qiagen) instructions. Gene expression was analyzed using Affymetrix HGU95A (patient 1) and HGU95Av2 (patient 2–6) chips. The synovial tissue chip of patient 1 showed about twice as much noise as the other chips. Therefore, the hybridization was repeated twice (using HGU95Av2 chips). Finally, the first repeat was selected for analysis. Two mixing experiments were performed, in which mRNA preparations of the isolated cell fractions of patient 2 were mixed according to two representative sets of relative mRNA proportions and subsequently hybridized to HGU95Av2 chips. Immunohistochemical analysis was performed on cryostat sections of synovial membranes using marker antibodies for specific cell types. For each patient, the cellular composition was assessed for one to three different synovial tissue sections from samples adjacent to those, from which the mRNA was prepared (Table 5). Further details of the experimental materials and methods are given in the Supplementary Sections B and E (see 1).
Data preparation
Microarrays were evaluated using four different probe set summaries: 1) single chip (MASS); and 2) chip comparison (MASC) algorithm of Affymetrix' Microarray Suite 5.0 [14,15]; 3) Robust MultiArray Analysis (RMA) developed by Irizarry et al. [16]; and 4) Model Based Expression Index (MBEI) of Li and Wong [17] as implemented in R and the GenePublisher webinterface [37]. In addition, the MASS and MASC summaries were normalized using four different globalization methods: i) 2%symmetric trimmed mean (t) [14,15]; ii) cyclic local regression (clr) [18]; iii) quantile normalization (q) [18,19]; and iv) centralization (c) [20]. The computationally reconstituted tissue profile S* was either not further normalized or normalized by local regression. This method is based on so called MAplots and normalizes two expression profiles, in this case S and S*, at the same time. The normalization has to be done in each algorithmic step, i.e. for every pair of relative proportions that is explored by the optimization algorithm. It was chosen to be a 200genes symmetric moving average instead of the widely used loess procedure [18,38] in order to achieve reasonable computation times. The symmetric moving average was also used for the cyclic local regression normalization (clr). Centralization (c) requires the selection of a set of robustlyexpressed genes that is used for normalization. This set was chosen to consist of those 1000 genes, for which at least one pairwise logratio was among the best conserved (i.e. showed the lowest mean absolute deviation) across the four chips per patient. The additive nature of Equation (2) requires absolute expression values. However, the chip comparison algorithm (MASC) computes logratios that compare a baseline experiment to different other experiments in a pairwise manner. The absolute expression values were reconstructed from the baseline experiment (S) and the respective logratios. The number of random starting values for the relative mRNA proportions (cf. algorithmic protocol. Table 6) was l = 25 throughout this study. The missing Nprofile of patient 3 was substituted from patient 2 (both Type II RA) [32]. When HGU95A and HGU95Av2 chips were investigated simultaneously, the analysis was restricted to the 12533 probe sets common to both chips (excluding controls). For the RMA and MBEI summaries of patient 1 (tissue repeats) the mixture CDF environment provided by Bolstad [39] was used.
Authors' contributions
MH developed mathematical methods, performed computations, and wrote the manuscript; DP conducted the experimental work and contributed to the manuscript; DK hybridized chips and contributed to the discussion, HJT and SW contributed to the design of the study and the discussion; RWK contributed to experimental work, discussion, and writing of the manuscript. All authors read and approved the final manuscript.
Acknowledgements
The authors cordially thank Reinhard Guthke for valuable discussion. Ernst Günter SchukatTalamazzini is acknowledged for his proofreading of the mathematical methods. Torsten Kroll contributed to the early stage of the project and Andreas Beyer contributed by critical discussion. Rafael A. Irizarry and James W. MacDonald are acknowledged for their comments on the Genelogic dilution study. The two anonymous referees contributed valuable comments and suggestions that considerably helped to improve the manuscript.
This work was supported by the German Federal Ministry of Education and Research (BMBF) through the Jena Centre for Bioinformatics (JCB) (FKZ 0312704D to R. Guthke, which supported MH, and 0312704B to RWK, which supported DP), the Interdisciplinary Center for Clinical Research (IZKF) (FKZ 01ZZ0105 and 01ZZ0405 to RWK), and the National Genome Research Network 2 (NGFN2) (FKZ 01GS0413 to RWK).
References

Kamme F, Salunga R, Yu J, Tran DT, Zhu J, Luo L, Bittner A, Guo HQ, Miller N, Wan J, Erlander M: SingleCell Microarray Analysis in Hippocampus CA1: Demonstration and Validation of Cellular Heterogeneity.
J Neurosci 2003, 23(9):36073615. PubMed Abstract  Publisher Full Text

Taylor T, Nambiar P, Raja R, Cheung E, Rosenberg D, Anderegg B: Microgenomics: Identification of New Expression Profiles Via Small and SingleCell Sample Analyses.
Cytometry A 2004, 59A:254261. Publisher Full Text

de Bruin E, van de Pas S, Lips E, van Eijk R, van der Zee M, Lombaerts M, van Wezel T, Marijnen C, van Krieken J, Medema J, van de Velde C, Eilers P, Peltenburg L: Macrodissection versus microdissection of rectal carcinoma: minor influence of stroma cells to tumor cell gene expression profiles.
BMC Genomics 2005, 6:142. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Wang H, Owens J, Shih J, Li M, Bonner R, Mushinski J: Histological staining methods preparatory to laser capture microdissection significantly affect the integrity of the cellular RNA.
BMC Genomics 2006, 7:97. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Schmid H, Henger A, Cohen C, Frach K, Gröne HJ, Schlöndorff D, Kretzler M: Gene Expression Profiles of PodocyteAssociated Molecules as Diagnostic Markers in Acquired Proteinuric Diseases.
J Am Soc Nephrol 2003, 14:29582966. PubMed Abstract  Publisher Full Text

Häupl T, Grützkau A, Grün J, Kinne R, Berek C, Stuhlmüller B, Rohrlach T, Kaps C, Rudwaleit M, Morawietz L, Gursche A, Zacher J, MüllerLadner U, Krenn V, Burmester GR, Radbruch A: Dominant Role of Bcells and Monocytes in Rheumatoid Arthritis Based on Synovial Expression Profiles.
American College of Rheumatology, 69th Annual Meeting, San Diego, CA, USA 2005.

Venet D, Pecasse F, Maenhaut C, Bersini H: Separation of samples into their constituents using gene expression data.
Bioinformatics 2001, 17(Suppl 1):S279S287. PubMed Abstract  Publisher Full Text

Stuart R, Wachsman W, Berry C, WangRodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, Wang Y, Sawyers A, Kalcheva I, Tarin D, Mercola D: In silico dissection of celltypeassociated patterns of gene expression in prostate cancer.
Proc Natl Acad Sci USA 2004, 101(2):615620. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lähdesmäki H, Shmulevich I, Dunmire V, YliHarja O, Zhang W: In silico microdissection of microarray data from heterogeneous cell populations.
BMC Bioinformatics 2005, 6:54. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Lu P, Nakorchevskiy A, Marcotte E: Expression deconvolution: A reinterpretation of DNA microarray data reveals dynamic changes in cell populations.
Proc Natl Acad Sci USA 2003, 100(18):1037010375. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ghosh D: Mixture models for assessing differential expression in complex tissues using microarray data.
Bioinformatics 2004, 20(11):16631669. PubMed Abstract  Publisher Full Text

GeneLogic: Dilution Study. [http://www.genelogic.com] webcite
2003.

Delyon B, Juditsky A, Benveniste A: On the relationship between identification and local tests. [http://www.irisa.fr/sigma2/byname/delyon.html] webcite
Tech. Rep. IRISA Rennes Cedex, France; 1997.

Liu WM, Mei R, Di X, Ryder T, Hubbell E, Dee S, Webster T, Harrington C, Ho MH, Baid J, Smeekens S: Analysis of high density expression microarrays with signedrank call algorithms.
Bioinformatics 2002, 18(12):15931599. PubMed Abstract  Publisher Full Text

Affymetrix: Statistical Algorithms Reference Guide. [http://www.affymetrix.com] webcite
2002.

Irizarry R, Bolstad B, Collin F, Cope L, Hobbs B, Speed T: Summaries of Affymetrix GeneChip probe level data.
Nucleic Acids Res 2003, 31(4):el5. Publisher Full Text

Li C, Wong W: Modelbased analysis of oligonucleotide arrays: Expression index computation and outlier detection.
Proc Natl Acad Sci USA 2001, 98:3136. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bolstad B, Irizarry R, Astrand M, Speed T: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.
Bioinformatics 2003, 19(2):185193. PubMed Abstract  Publisher Full Text

Kroll T, Wölfl S: Ranking: a closer look on globalisation methods for normalisation of gene expression arrays.
Nucleic Acids Res 2002, 30(11):e50. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zien A, Aigner T, Zimmer R, Lengauer T: Centralization: a new method for the normalization of gene expression data.
Bioinformatics 2001, 17(Suppl 1):S323S331. PubMed Abstract  Publisher Full Text

Huber W, von Heydebreck A, Sultmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression.
Bioinformatics 2002, 18(Suppl 1):S96S104. PubMed Abstract  Publisher Full Text

Huber W, von Heydebreck A, Sueltmann H, Poustka A, Vingron M: Parameter estimation for the calibration and variance stabilization of microarray data.

Szabo A, Perou C, Karaca M, Perreard L, Quackenbush J, Bernard P: Statistical modeling for selecting housekeeper genes.
Genome Biol 2004, 5(8):R59. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Tusher V, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci USA 2001, 98(9):51165121. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ideker T, Thorsson V, Siegel A, Hood L: Testing for DifferentiallyExpressed Genes by MaximumLikelihood Analysis of Microarray Data.
J Comput Biol 2000, 7(6):805817. PubMed Abstract  Publisher Full Text

Storey J, Tibshirani R: Statistical significance for genomewide studies.
Proc Natl Acad Sci USA 2003, 100(16):94405. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Choe S, Boutros M, Michelson A, Church G, Halfon M: Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset.
Genome Biol 2005, 6:R16. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Binder H: Probing gene expression – sequence specific hybridization on microarrays. In Bioinformatics of Genome Regulation and Structure II chap. Springer Science+Business Media, Inc., New York; 2006.

Binder H, Preibisch S: GeneChip microarrays – signal intensities, RNA concentrations and probe sequences.
J Phys: Condens Matter 2006, 18:53766. Publisher Full Text

Shedden K, Chen W, Kuick R, Ghosh D, Macdonald J, Cho K, Giordano T, Gruber S, Fearon E, Taylor J, Hanash S: Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data.
BMC Bioinformatics 2005, 6:26. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Zimmermann T, Kunisch E, Pfeiffer R, Hirth A, Stahl H, Sack U, Laube A, Liesaus E, Roth A, PalomboKinne E, Emmrich F, Kinne R: Isolation and characterization of rheumatoid arthritis synovial fibroblasts from primary culture – primary culture cells markedly differ from fourthpassage cells.
Arthritis Res 2001, 3:7276. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Ruschpler P, Stiehl P: Shift in Th1 (IL2 and IFNgamma) and Th2 (IL10 and IL4) cytokine mRNA balance within two new histological maintypes of rheumatoidarthritis (RA).
Cell Mol Biol 2002, 48(3):285293. PubMed Abstract

Wang M, Master SR, Chodosh LA: Computational expression deconvolution in a complex mammalian organ.

Rousseeuw P, Leroy A: Robust Regression and Outlier Detection. John Wiley and Sons, New York; 1987.

Altman R, Asch E, Block D, Bole G, Borenstein D, Brandt K, Christy W, Cooke TD, Greenwald R, Hochberg Mea: Development of criteria for the classification and reporting of osteoarthritis. Classification of osteoarthritis of the knee. Diagnostic and Therapeutic Criteria Committee of the American Rheumatism Association.
Arthritis Rheum 1986, 29(8):10391049. PubMed Abstract

Arnett F, Edworthy S, Bloch D, McShane D, Fries J, Cooper N, Healey L, Kaplan S, Liang M, Luthra H, et al.: The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis.
Arthritis Rheum 1988, 31(3):315324. PubMed Abstract

Knudsen S, Workman C, SicheritzPonten T, Friis C: GenePublisher: Automated analysis of DNA microarray data.
Nucleic Acids Res 2003, 31(13):34713476. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dudoit S, Yang Y, Callow M, Speed T: Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. [http://citeseer.ist.psu.edu/dudoit00statistical.html] webcite

Bolstad B: Mixture CDF environments. [http://bmbolstad.com/misc/mixtureCDF/MixtureCDF.html] webcite
2004.