Abstract
Background
Although DNA microarray technologies are very powerful for the simultaneous quantitative characterization of thousands of genes, the quality of the obtained experimental data is often far from ideal. The measured microarrays images represent a regular collection of spots, and the intensity of light at each spot is proportional to the DNA copy number or to the expression level of the gene whose DNA clone is spotted. Spot quality control is an essential part of microarray image analysis, which must be carried out at the level of individual spot identification. The problem is difficult to formalize due to the diversity of instrumental and biological factors that can influence the result.
Results
For each spot we estimate the ratio of measured fluorescence intensities revealing differential gene expression or change in DNA copy numbers between the test and control samples. We also define a set of quality characteristics and a model for combining these characteristics into an overall spot quality value. We have developed a training procedure to evaluate the contribution of each individual characteristic in the overall quality. This procedure uses information available from replicated spots, located in the same array or over a set of replicated arrays. It is assumed that unspoiled replicated spots must have very close ratios, whereas poor spots yield greater diversity in the obtained ratio estimates.
Conclusion
The developed procedure provides an automatic tool to quantify spot quality and to identify different types of spot deficiency occurring in DNA microarray technology. Quality values assigned to each spot can be used either to eliminate spots or to weight contribution of each ratio estimate in followup analysis procedures.
Background
In comparative DNA microarray experiments compared test and control samples are labeled with different fluorescent dyes (typically the redfluorescent Cy5 and the greenfluorescent Cy3), mixed up and cohybridized with the DNA clones regularly spotted on the microarray. The array is scanned at a high spatial resolution at the corresponding fluorescent wavelengths, and the fluorescence intensities are recorded in two color channels (Cy5 and Cy3) for each pixel. The ratio of the measured intensities (Cy5/Cy3) for each microarray spot reveals either differential gene expression (cDNA technology [1]) or change in DNA copy numbers (comparative genome hybridization (CGH) technology [2]) between the test and control samples for the corresponding gene. Each ratio estimate should be accompanied by some measure of quality demonstrating the level confidence in the obtained ratios.
The main components of the microarray image analysis pipeline for spots include localization, quantification and quality control. Among these, quality control is the least formalized and least developed. To determine spot quality we need to have a clear definition of a good spot, or a list of all possible distortions that may spoil the spot. The diversity of instrumental platforms and instrumental and biological factors that may influence the result makes formalization difficult and unlikely to be universal.
In this paper, we consider the problem of quantifying spot quality in comparative DNA microarray experiments. Several attempts have been made to approach the problem [37]. Generally a number of parameters characterizing the spot, such as signaltonoise ratio, size, circularity, etc., are introduced. These parameters have to be combined into an overall quality value to be used as a confidence level in the followup analysis. There are different methods for deriving such a parameter. For example, in two studies [5,6], it was assumed that individual quality scores contribute equivalently to the composite quality score. This may not be true, depending on the instrumental setup and experimental design. Therefore we need an approach that allows us to evaluate the weights that control the input of each of the marginal quality characteristics into the overall score. For that, different training procedures, in which the user classified a set of representative spots into three (accepted, rejected or intermediate spots) [3] or four (bad, close to bad, close to good or good spots) [7] groups, were proposed. This requires an expert to evaluate at least a couple of hundred spots to achieve a good approximation, which is a difficult and timeconsuming task.
Here, we develop an automatic training procedure to evaluate the contribution (or weight) of each marginal quality characteristic into the overall quality score, together with an original set of quality characteristics and a model that maps this set into an overall quality value. This procedure is based on information from replicated spots, located on the same array or over a set of replicated arrays, and assumes that unspoiled replicated spots must have very close intensity ratios, whereas poor spots yield greater diversity in the obtained ratio estimates. The obtained weights can then be used to establish a critical limit for each quality characteristic, such that if a spot's characteristic exceeds its critical limit, the spot can be declared a "bad" spot.
We demonstrate the applicability of the developed algorithms using simulated artificial images and experimental images of different array designs used within our Institute and CGH images obtained from the UCSF Cancer Center.
Results
We assume that the spots are identified and well localized [8], (Novikov E, Barillot E: unpublished data)] in squares (called spot cells). This involves: (i) identifying the position of each spot on the array to associate it with the spotted clone; and (ii) establishing the borders between the neighboring spots to allow further independent data processing (extracting quantitative information) for each spot. Visually this results in the generation of a grid covering the microarray image.
Ratio estimation
We have implemented [8,9] two approaches for calculating the Cy5/Cy3 ratio for the spot: (i) based on spot segmentation and (ii) based on linear regression.
Spot segmentation
In the spot segmentation approach a direct arithmetic ratio of the backgroundcorrected fluorescence intensity estimates in the two color channels is evaluated. This approach requires the identification of both the foreground – the measured spot – and the background – typically the level of nonspecific hybridization. The segmentation procedure, in our implementation, is based on the kmeans adaptive pixelclustering algorithm [10] modified to explicitly take into consideration the geometrical constraints on the spots and to improve identification of the background areas for the spots with smooth edges.
Linear regression estimation
In the linear regression approach, the ratio is represented as the slope of the linear regression fit of the pixel intensities in two color channels (Figs. 1 and 2). As measured fluorescence intensities are statistically distorted in both color channels, orthogonal regression [11,12] is used to estimate the slope. In this method spot segmentation is unnecessary, as background pixels are concentrated at the origin of the linear regression plot and do not influence the slope of the regression line (Fig. 1). However, outlier or aberrant pixels within the spot cells, even in small numbers, can strongly influence the regression line, thus biasing the ratio. We have improved [8,9] the linear regression approach by developing a statistical filtering procedure that systematically searches and removes aberrant or outlier pixels.
Figure 1. Example of a spot with a low DW parameter (0.45). Although the coefficient of determination of the linear regression plot (red line) is relatively high (0.92), it is obvious that the linear regression model is not appropriate in this case. It is possible that there are contributions (blue lines) from two different species occurring within the given spot, leading to two different Cy5/Cy3 ratios. The background pixels are grouped near the origin of the linear regression plot (cyan circle).
Figure 2. Example of a spot with contamination. The filtering procedure removes aberrant pixels (dots with the red contours), improving the Cy5/Cy3 ratio estimation and increasing the coefficient of determination of the linear regression plot. Before filtering (red line): CD = 0.86; linear regression ratio is 1.41; segmentation ratio is 0.712. After filtering (black line): CD = 0.91; linear regression ratio is 0.275; segmentation ratio is 0.273. However, larger amounts of aberrant pixels may result in a less reliable estimation.
Briefly, this procedure can be outlined as follows. Suspicious pixels are examined by evaluating the quality of the linear regression fit with and without the suspicious pixel. We quantify the fit quality by the residual variance, s^{2 }[13]. The smaller s^{2 }is, the closer the linear regression line is to the experimental data. The ratio of the s^{2 }values is calculated for the fit with the tested pixel and for the fit without. If this ratio is larger than a critical value of the Fdistribution at a userdefined confidence level, the pixel will be marked as aberrant. We select pixels with the highest intensity in either of two channels first and then select pixels having the largest deviation from the fitted regression line. To take into account the fact that the distortions caused by pixels from the top of the intensity scale and by pixels lying off of the linear regression line, may be different, we apply different confidence levels for the Fstatistics for these pixels.
For the highintensity pixels we also perform another test to determine how far their intensities are from the averaged intensity of the other pixels within the spot cell. This detects pixels, far away from the other pixels, that do not distort the linear regression line. Although these pixels may not change the Cy5/Cy3 ratio, they could be considered as aberrant pixels, as we expect to see an almost continuous distribution of pixels intensity.
The procedure performs iteratively until no more aberrant pixels are detected. An example of the outlier detection is presented in Fig. 2. Note that the regression approach is capable of detecting contamination pixels that are geometrically inseparable from the spot. Therefore, the developed procedure can be considered not only as a procedure for correcting ratio recovery, but also as a procedure to repair the spot and to improve the quality of experimental material. It requires, however, that the contamination clearly deviates from the straight regression line, which is defined by the majority of "good" pixels from the spot.
Although this procedure gives a high level of confidence in the linear regression ratio estimates, we still apply spot segmentation, because linear regression estimates may be biased when there is a high level of statistical noise (low correlation between the Cy3 and Cy5 color channels). However, after removing aberrant pixels, the segmentation algorithm also gives more robust estimates, and there is a greater agreement in the ratio values obtained with both methods.
As well as the Cy5/Cy3 ratio estimates, each of these approaches generates a series of parameters that can be used to evaluate the spot quality.
Quality characteristics
We define a set of quality parameters, characterizing different features of the spot. These parameters are scaled between 0 (bad spot) and 1 (good spot) to facilitate further quality analysis.
First, we use the characteristics from the linear regression approach. The coefficient of determination (CD = r^{2}, where r is the correlation coefficient) of linear regression signifies the degree of linear relationship between the intensities in the Cy3 and Cy5 channels. High values of CD (approaching 1) are expected for good spots. Low values suggest either relatively bright but noncorrelated contamination, or strong statistical noise normally characterizing lowlevel (or missing) spots. Although the removal of aberrant pixels increases the CD of linear regression, it can still be low for noisy spots. These spots must be either flagged out or assigned a lower quality value. This parameter takes the range [0;1]: q_{1}(CD) = CD.
The DurbinWatson statistic (DWS) [14] evaluates the presence of the firstorder autocorrelation in the residuals of the linear regression fit. It ranges from 0 to 4, 0 being a positive correlation and 4 being a negative correlation. A DWS value close to two indicates that the residuals are uncorrelated and the model is appropriate. Large deviations from two, resulting from systematic patterns in the residuals plot suggest that the spot cannot be modeled in terms of a simple linear regression. Low DWS values typically imply strong contamination that was not removed by filtering (Fig. 1). The DurbinWatson quality parameter (q_{2 }∈ [0;1]) is obtained from the DWS value by the following transformation: q_{2}(DWS) = 1  DWS  2/2.
One more indicator of the quality directly available from the linear regression is the number (N) of the aberrant pixels flagged out by the filtering procedure. Although aberrant pixels can be found everywhere within the spot cell, we count only those pixels within the spot contours. This value can be used as a quantitative measure of spot contamination. Small numbers of aberrant pixels do not influence intensity ratio estimates, whereas the removal of large numbers of pixels from the spot may indicate inconsistency with the linear regression model (Fig. 2). We scale N to fit the range [0;1] as: q_{3}(N) = 1  N/S, where S is the number of pixels within the spot contour.
We now define three quality parameters from the analysis of the contoured spot. The diameter of the spot is calculated as D = 2(S/π)^{1/2}. As the true value for the spot diameter may be difficult to establish, we use a typical value taken as the median diameter over all spots on the array. Spots with exceptionally small diameters should normally be penalized. We define the diameter quality parameter as q_{4}(D) = exp{DT}, if D <T and q_{4}(D) = 1, if D > T, where T is the typical diameter.
The geometrical symmetry parameter measures deviation of the contoured spot from the ideal circle. The center and the diameter of the ideal circle correspond to the center and the diameter of the real spot. We divide both the real spot and the ideal circle into eight segments (pie slices defined as [kπ/4;(k + 1)π/4], k = 0,...,7) and we count the number of pixels belonging to the spot (N_{si}, i = 1,...,8) and to the circle (N_{ci}, i = 1,...,8) for each segment. The sum of the absolute relative differences is then taken as an indicator of quality. For ideal circular spots GS should approach 0, whereas highly deformed (uncircular) spots can be recognized by high GS values (Fig. 3A). We transform the GS values as q_{5}(GS) = exp(GS) to fit the range [0;1].
Figure 3. Examples of spots with different types of distortions. A) Large deviation from the circular shape (GS = 1.6); B) Bright piece of contamination within a larger circular spot, resulting in a low value of the IS parameter (IS = 1.71); C) Merged spots (UB = 2.98).
Using the same partition of the spot into eight segments, we can also calculate the mean intensities for each of the segments (I_{i}, i = 1,...,8). The intensity symmetry of the spot is defined as , where I is the mean intensity within the spot. We scale the IS values in the range [0;1] as q_{6}(IS) = exp(IS). Although a spot may have perfect circular shape, it may contain very bright (or dark) and highly concentrated groups of pixels originating from pieces of dust or other contamination (Fig. 3B). IS is calculated for each of two channels (Cy3 and Cy5) and the maximal value is taken as a final estimate.
We estimate two Cy5/Cy3 ratios; one by the linear regression approach (RR), and the other by the segmentation algorithm (RS). Despite the different methods of estimation, the variation between the two obtained ratios should be as small as possible. Consistent results should be expected, as most of the contaminating pixels have been removed by the filtering procedure. Large variations between the two estimates may indicate a problematic spot. Therefore, we use the coefficient of variation of two ratio estimates CVR = 2^{1/2}RRRS/(RR+RS) as a characteristic of quality. The CVR quality parameter is defined as q_{7}(CVR) = exp(CVR) fitting the range [0;1]. This measure of quality is the least explicit of all the quality characteristics: we can only determine that there is big discrepancy between the estimates but not why there is a big discrepancy, unless it is accompanied by lower values in any of the other quality parameters.
Finally, we introduce two parameters evaluating the quality of the background estimates. The first is the uniformity of the background (UB) around the spot, more precisely, along the grid lines separating neighborhood spots. We divide the grid line surrounding the spot into eight segments and calculate the mean intensity in each segment (B_{i}, i = 1,...,8). The UB parameter is defined as: , where B is the mean intensity for the whole grid line around the spot. Normally we would not expect to observe a big variability in the fluorescence intensity in the background areas. Large UB values may discover presence of relatively bright contamination around the spot, large variability in the background or merged neighboring spots (Fig. 3C). UB is rescaled to the range [0;1] by the exponential transformation: q_{8}(UB) = exp(UB).
The second background quality characteristic is the absolute level of the background (AB). We calculated this from the local area around the spot. As for the spot diameter D, there is no predefined ideal value for the absolute background. Therefore, as a benchmark for comparisons, we take the typical value as the median background level over all spots on the array. Spots with exceptionally high AB values should be treated with care or discarded. Of two background estimates obtained in twocolor microarray experiments, we use one that gives the highest value as the most indicative of possible problems. We define the AB quality parameter as q_{9}(AB) = exp{(ABB)/B}, if AB > B and q_{9}(AB) = 1, if AB <B, where B is the typical background level.
We cannot claim that the developed quality parameters are the optimal. However, they have led to reasonable results for most of the experimental and simulated situations we tested. Of course, there may be a possibility to formalize some of these parameters more precisely and/or to develop new parameters accounting for other types of distortions.
Spot quality analysis
We consider two aims of spot quality analysis. The first is to combine the nine marginal quality parameters into an overall quality value. This value can be used either to flag out directly spots with a quality lower than a userdefined threshold, or, in the followup image analysis procedures (normalization, classification, clustering, etc.) as a parameter characterizing the level of confidence in the obtained Cy5/Cy3 ratios. The second aim is to identify a critical range (a sort of a confidence interval) for each quality characteristic. If a certain quality characteristic of the spot falls in this range, the corresponding spot could be classified as a "bad" spot.
Overall quality
We used the following definition for the overall quality value:
where q_{i }= q_{i}(x_{i}) ∈ [0;1], i = 1,...,9 are the marginal quality parameters for x_{1 }= CD, x_{2 }= DWS, x_{3 }= N, x_{4 }= D, x_{5 }= GS, x_{6 }= IS, x_{7 }= CVR, x_{8 }= UB, x_{9 }= AB, and w_{i }are the weights that control the input of the corresponding quality components into the overall quality value. A link between the weight w_{i }and the critical value can be established for each quality characteristic i = 1,...,9:
where Q^{lim }∈ [0;1] is the userdefined overall quality threshold, and q_{i}() is the quality parameter calculated for . The critical value sets up the limit such that if a certain characteristic i exceeds this limit, the corresponding quality parameter q_{i}() will become lower than Q^{lim}. The correspondence between x_{i}, , q_{i}(x_{i}), q_{i}(), w_{i}, Q and Q^{lim }is demonstrated in Fig. 4.
Figure 4. The correspondence between the quality characteristics, quality parameters and overall quality value.
Quality weights w_{i}
The experimental quality parameters q_{i}, i = 1,...,9 are directly available from the quantification procedure, whereas the weights w_{i }(or the critical values ) are unknown and are not easily guessed or derived from theory. Therefore, the problem of spot quality analysis becomes a problem of weights (w_{i}) estimation. This can only be solved if additional information is available. Here we consider three possibilities:
1. The additional information may come, for example, from the user expertise. The user has to classify the spots manually [3,7] and assign a quality value to each spot from a representative subset. These values are then used for training the model (1) leading to a combination of the weights (w_{i}) such that the overall quality values reproduce the user classification reasonably well.
2. We can manually apply different combinations of the weights w_{i }and visually appreciate, which spots have been flagged out. The trials must be continued until most of the user classified "bad" spots are eliminated by the chosen combinations of the weights.
3. The weights can be estimated automatically using information available from replicated spots on the same array or over a set of replicated arrays. Unspoiled replicate spots should have very similar estimated Cy5/Cy3 ratio values. Large differences between the observed ratios in the replicate spots would signal that some spots from this replicate were irregular. We formalize this approach by first defining the quality value for the replicate:
where k indicates the replicates, n is the number of spots in a replicate and Q_{kj }is a spot quality value given by Eq. (1). Substituting Eq. (1) into (3) yields
where q_{kji }is the ith quality parameter of the jth replicated spot in the kth replicate. The weights w_{i}, i = 1,...,9 are the parameters that ensure the best fit of the experimental quality values (Q_{k }versus V_{k}) to a userdefined ideal quality curve f(V_{k}), where V_{k }is the intensity ratio variation coefficient in the kth replicate. The best fit of Q_{k}(V_{k}) to f(V_{k}) can be achieved by minimizing the sum of squared differences between Q_{k}(V_{k}) and f(V_{k}) (leastsquares fit):
where ψ_{k }are the weights of the contribution of each replicate into the sum (5). The quality weights w_{i }can be estimated by minimization of Eq (5) using one of the algorithms for nonlinear fitting [15].
Ideal quality curve f(V_{k})
f(V_{k}) defines how fast the overall quality of the replicates should decrease with increasing ratio variation. The shape of the ideal quality curve f(V_{k}) is somewhat arbitrary; the only requirement is that it should demonstrate monotonic decay. Further formalization for f(V_{k}) is hardly possible until more information regarding the ideal quality behavior becomes available. Therefore we have to look for empirical approaches to define f(V_{k}). For example, f(V_{k}) can be implemented as a nonparametric function, which can be constructed by the user through the properly designed user interface. f(V_{k}) can also be represented as a function (such as f(V_{k}) = exp{βV_{k}} with two adjustable parameters α and β), which can yield different shapes depending on the parameters values. Finally, a library of different functional shapes for f(V_{k}) can be created. In our work we follow the third approach: three shapes were implemented for testing:
and for each shape only the characteristic ratio variation coefficient must be predefined. A typical example of the quality plot with the exponential f(V_{k}) (Eq. (6),a) is shown in Fig. 5.
Figure 5. Quality plots (Q_{k }versus V_{k}) for image 7A. Green dots – using only the CD quality parameter; red dots – using only the CRV quality parameter; blue dots – using overall quality parameter Q. The black solid line is the exponential ideal quality curve f(V_{k}) (Eq. (6),a). Three triplicates showing poor quality (outlined by circles) are given in the insets. The main characteristics of the spots from the selected triplicates are given in Table 3.
In our quality analysis algorithm, user participation is limited to the definition of the ideal quality curve shape f(V_{k}). This is somewhat simpler than deciding on the quality of several hundred spots, which is used to teach the algorithm in the manual approach. However, as with other solutions, this algorithm requires representative images to train the model. It is impossible to evaluate confidently the weight of the contribution of the diameter quality parameter, for example, if all spots in the array have the same diameter. Therefore, a careful selection of training images containing a realistic diversity of all possible distortions and artifacts is needed.
Fitting weights ψ_{k}
An important issue of the minimization of Eq (5) is reasonable selection of the fitting weights ψ_{k}. The easiest way is to set all ψ_{k }equal to one. However, most of the replicates are often observed in the initial part of the quality plot (highquality spots), whereas there may be only a few replicates in the tail of the plot (poorquality spots). In this case, equalized weights would give a very accurate fit for the initial part of the quality curve while ignoring the tail. However, mainly the tail contains the relevant information (spots with different distortions and artifacts) for identifying the quality weights. Therefore, we try to increase the input from regions with a smaller number of replicates by defining the weights ψ_{k }in the following way:
1. All replicates are sorted according to their ratio variation V_{k}.
2. The set of sorted replicates is divided into bins with ten replicates in each bin. The difference λ_{j }between the smallest and the largest ratio variation coefficients in each bin j is calculated. The larger this difference, the smaller the concentration of the replicates in the given region of the quality plot.
3. The weights ψ_{k }for Eq. (5) are then calculated as ψ_{k }= ψ_{j*10+i }= , i = 0,...,9. That is, they are equalized for ten replicates from the same bin j.
Followup image analysis
As it was mentioned earlier, the overall quality value, Q (Eq. (1)), can be used as a parameter characterizing the level of confidence in the obtained Cy5/Cy3 ratios. If, for example, n ratios should be averaged, the weighted mean would ensure a more robust estimate for the average:
where R_{l }is the Cy5/Cy3 ratio and Q_{l}, is the corresponding overall quality value (l = 1,...,n). The weighted coefficient of variation is defined as
Note that the ratio variation coefficient V_{k }from Eq. (5) can be determined from Eq. (8), if we set Q_{l }= 1, l = 1,...,n, with n being the number of spots in the kth replicate.
Image simulation
In [9] we have described the MonteCarlo simulation model for generating artificial microarray images. The advantage of using artificial images is that we always know the exact Cy5/Cy3 ratio values. This allows us to test and compare objectively different image analysis algorithms. The general model for the twocolor (Cy3, Cy5) microarray image is given by [9]:
where N is the number of spots and M is the number of dust clusters, and are the coordinates of the center of a spot, and are the coordinates of the center of a dust cluster, r^{s }and r^{d }are the approximate radiuses of the spot and dust cluster, respectively, I^{s }and I^{d }are the fluorescence intensity in the center of the spot in the Cy3 color channel and in the center of the dust cluster, respectively, and R is the ratio of the test and control samples for each spot. Dust is represented by the random distribution over the array of clusters of pixels of varying brightness. We consider that these pixel clusters have an identical shape to the spots and therefore the same analytical representation is used for an ideal spot shape and dust cluster:
The parameters characterizing the spots (, , r^{s}, I^{s }and R) are userdefined. For example, the coordinates and , the radius r^{s }and the ranges for x and y for each spot are defined from a userdefined array design. The user should also specify the number of dust clusters M on the array. The other parameters characterizing the dust are random variables, and the probability laws for their generation is a matter of choice. We use uniform distributions for r^{d }(in the interval 0 to r_{m}) and I^{d }(in the interval 0 to I_{m}), where r_{m }and I_{m }are a userdefined maximal dust cluster radius and maximal dust intensity, respectively. We also assume that and are uniformly distributed over the array. Statistical laws of the dust characteristics can generally be different in the two (Cy3, Cy5) channels.
In the developed simulation model we also account for the nonspecific hybridization and statistical noise:
where i represents either Cy3 or Cy5, and η_{Bi }are the userdefined average and noisetosignal ratio of nonspecific fluorescence intensity in the color channel i, σ(x,y) is the standard deviation of the pixel statistical noise, and G_{B }and G_{S }are independent Gaussian random variables with zero mean and unit standard deviation. The exact representation for σ(x,y) is defined by the experimental setup. There are currently three possibilities: σ(x,y) can be (i) constant, (ii) proportional to the signal, or (iii) proportional to the square root of signal. The type and quantitative characteristics of the statistical noise are defined by the user.
Discussion
Software
The developed algorithm for quality control is included in the software package MAIA, which offers a complete solution for DNA microarray image analysis, including automatic spot localization and spot quantification procedures. A demo version of the software can be downloaded from [8].
Artificial images
All artificial images were generated using the same array design: 4 × 4 blocks and 21 × 21 spots within each block with the interspot distance of 15 pixels. For all spots in the generated arrays the spot radius, r, was about 4 pixels, the intensity, I, in the Cy3 color channel was 5000 and the ratio, R, of the Cy5 and Cy3 channels was 3. Nonspecific hybridization was generated using = 1000 and η_{Bi }= 0.5. The standard deviation of the statistical noise, σ(x,y), at each pixel was proportional to the signal at the corresponding pixel with the noisetosignal ratio of 0.1. The three generated images (Fig. 6 (insets)) differed in the number of dust clusters. The percentages of dust clusters with respect to the number of good spots were: 0% (image 6A), 5% (image 6B) and 25% (image 6C). In all cases the maximal dust cluster radius, r_{m}, was set to 8, and maximal dust intensity, I_{m}, to 64000.
Figure 6. Quality plots (Q_{k }versus V_{k}) for artificial images. Three generated images differed in the percentage of dust clusters with respect to the number of good spots: A) 0%, B) 5% and C) 25%. For the further details see the text. The solid lines represent the userdefined ideal quality curves f(V_{k}): Red – exponential (Eq. (6),a); Blue – Gaussianlike (Eq. (6),b); Black – inverse (Eq. (6),c).
As all spots have the same theoretical ratio (R = 3), they can be considered as replicates and therefore we can use the quantitative characteristics defined in Eqs. (7) and (8) to characterize the performance of the algorithm. We expect the best estimators to provide the averaged Cy5/Cy3 ratio closer to the true ratio (R = 3) with the least spread around this value.
For each artificial array we compared the weighted statistical characteristics for the average (Eq. (7)) and for the ratio variation coefficient V (Eq. (8)) with the unweighted ones. The unweighted characteristics were obtained from Eqs. (7) and (8) by setting all Q_{l}, l = 1,...,n to 1. The weighted characteristics were calculated with the overall quality values Q_{l }available from the quality analysis algorithm. As all spots from the simulated image can be considered as replicates, we artificially split up the total number of spots into the groups of three closely placed spots. These groups, regarded as independent triplicates, can be used to calculate the experimental quality values Q_{k }(Eq. (4)) and to build up the corresponding quality plot (Q_{k }versus V_{k}). Three functional shapes (Eq. (6)) for the ideal quality curve f(V_{k}) were tested.
The results are collected in Table 1. The weighted statistical characteristics with the developed quality control eliminated bias in the average Cy5/Cy3 ratio estimates and reduced the ratio variation coefficient. Note that for "ideal" image 6A (no dust clusters) the applied quality measures retained the estimates unchanged.
Table 1. The average
(Eq. (7)) and the coefficient of variation V (Eq. (8)) of the Cy5/Cy3 ratios over all spots for three artificial images A, B, and C from Fig 6The three compared ideal quality shapes f(V_{k}) (Eq. (6)) demonstrated similar performance. A small advantage (slightly smaller V) of the Gaussianlike function (Eq. (6),b) for image 6B (5% of dust clusters) was compensated by the lowest performance for image 6C (25% of dust clusters). The difference between the three shapes can be seen from Fig 6, where the corresponding quality plots are drawn for the three generated images. The inverse shape (Eq. (6),c) is the least stringent whereas the Gaussianlike shape is the most stringent with respect to the replicates variability. This means that the inverse shape does not require the replicates with higher variability to have lower quality values (and correspondingly the replicates with smaller variability to have higher quality values) as strictly as the Gaussianlike one. Although it seems that the Gaussianlike shape is the best choice, there is still a drawback. Due to its relatively abrupt decrease it is difficult to keep the balance in fitting weights ψ_{k }between the head and the tail of the shape. Despite the adaptive selection of the fitting weights ψ_{k }(see section Spot quality analysis), the fitting procedure, trying to ensure the highest quality for the replicates with the lowest variability, may still overlook the replicates with the higher variability. This depends on the image quality (replicate variability) and can explain why the Gaussianlike shape yielded the greatest ratio variation for image 6C. This also indicates that more work is needed to find out an optimal combination of the ideal quality curve and the fitting weights ψ_{k }to be used in the training procedure.
The results of simulation study and our experience with the experimental images suggest that the exponential shape (Eq. (6),a) offers a reasonable compromise between the fitting weights used and stringency with respect to the replicates variability. Therefore in the further quality analysis we will apply exponential f(V_{k}).
Experimental images
We performed quality analysis of two experimental images with different array designs and signaltonoise levels.
One image (Fig. 7A) (image 7A) was provided as a demonstration example for UCSF Spot 2.0 (downloadable from [16]). It contains 4 × 4 blocks with 21 × 21 spots in each block, with a spot cell size of about 10 pixels. Cy3 and Cy5 color channels are strongly correlated, with the average correlation coefficient for the spots being about 0.97. Bright contamination spots can be seen irregularly scattered over the array. The magnified image of one such spot is shown in Fig. 2 (inset). Each clone was spotted in triplicate. The replicated spots are placed as neighbors in a row (see Fig. 7A).
Figure 7. Experimental images. A) 4 × 4 blocks with 21 × 21 spots per block, spot cell size is about 10 pixels; B) 12 × 4 blocks with 15 × 15 spots per block, spot cell size is about 30 pixels. The locations of triplicates are indicated.
The second image (from the Institute Curie) (image 7B) contains 12 × 4 blocks with 12 × 12 spots in each block (Fig. 7B), with a spot cell size of about 30 pixels. The average correlation between the channels for the spots is about 0.85. This is lower than for the first image, although this image has no obvious contamination spots. Each clone was prepared in triplicate, with the replicated spots in three vertically distributed subarrays (see Fig. 7B).
We expect the Cy5/Cy3 ratios from the replicates to be similar. Therefore it is reasonable to take the coefficient of variation (Eq. (8)) of the replicates as a quantitative measure of the ratio estimation consistency. However, this measure may not be totally objective: (i) the estimates may be consistent, but systematically biased (the true values of the ratios are unknown); (ii) three replicated spots of very poor quality may give very similar ratio values just by chance (the number of replicates is low).
The average over all replicates at the given array coefficient of variation is taken as a global indicator of the Cy5/Cy3 ratio consistency of the array.
We compared two quality characteristics (the coefficient of determination CD (q_{1}) and the coefficient of variation of two ratio estimates CVR (q_{7}), applied separately) and the overall quality parameter Q (Eq. (1)) to the case when no quality control was applied. The weights of the marginal quality parameters for Q were identified using the exponential ideal quality curve (Eq. (6),a) with ≈ 0.08 for image 7A, and with ≈ 0.2 for image 7B. The results are summarized in Table 2.
Table 2. The average over all replicates coefficients of variation V (Eq. (8)) for two experimental images A and B from Fig. 7
We found a greater improvement for image 7A than for image 7B after applying the quality measures. This was not a surprise, as image 7B is characterized by a reasonably high signaltonoise level, and it does not contain any obvious contaminated spots. However, even in this case the quality measures cannot be ignored, as there are still a few lowintensity spots that need to be specially treated (probably rejected). By contrast, image 7A has obvious randomly distributed pieces of dust, and the developed quality measures proved to be powerful enough to disregard the contaminated spots, thus increasing the consistency of the Cy5/Cy3 ratio estimates.
The comparison of the different quality measures has shown that they are arrayspecific. For image 7B, CD, CVR and Q gave almost equivalent performance, whereas for image 7A, CD was much better than CVR, and Q gave a slightly better ratio consistency than both of these. However, the gain from Q was not much greater, so that it would be reasonable to assume that only a single quality measure (such as CD or CVR) could always give good results. Unfortunately, this is not the case. Although we have found that CD and CVR are indeed the most powerful quality measures they cannot cover every type of distortion.
We demonstrate this with image 7A. There remained, after applying the individual quality measures, several triplicates with high coefficients of variation that were not correctly suppressed. Three such replicates are shown in Fig. 5, and the main quantitative characteristics of the corresponding spots are listed in Table 3. Note that Table 3 contains the quality characteristics from section Quality characteristics, which do not need to reside in the interval [0;1].
Table 3. The main characteristics of the spots from three triplicates (see Fig. 5 (insets)), demonstrating excessive variability in the Cy5/Cy3 ratio estimates
In triplicate A, one spot is clearly contaminated; however, this contamination is highly correlated in the two channels (CD = 0.97) and the spots cannot be filtered out using CD. A more powerful parameter for this type of problem is CVR (0.11), perhaps in combination with IS (1.02). Triplicate B includes lowintensity spots. Although CVR quality characteristic does not detect any deficiency in this case, the difference between the estimated intensity ratios (within the triplicate) is large, meaning that CVR alone is not sufficient to eliminate the outlier spots. CD seems to be more indicative in this case. Triplicate C cannot be confidently recognized either by CD or CVR quality measures. This problem (contamination penetrating into the spot from the outside area) can be figured out by applying either the uniformity of the background quality parameter (UB = 0.99) or the DurbinWatson quality parameter (DWS = 0.58).
This type of analysis leads to the refined critical values for each of the quality characteristics. These values are then recalculated into the corresponding weights w_{i }for the userselected overall quality threshold Q^{lim}, using Eq. (2) (see Fig. 4). In general, however, the weights are derived automatically from Eq. (5) by the nonlinear fitting procedure. If certain quality factors do not influence the shape of the experimental quality curve Q_{E }(Eq. (4)), the corresponding weights will be set close to 0. If a certain effect shows up in only a small number of spots, it may be neglected by the optimization procedure, and the corresponding weight will be erroneously small. In this case, manual correction of the weights would be necessary.
The quality value of three bad replicates from Fig. 5 (insets), as well as of many others demonstrating larger deviations in the obtained Cy5/Cy3 ratio estimates (Fig. 5), were decreased using the combined criteria Q, whereas the quality of replicates with smaller variations remained almost unaffected.
Quality measures can signal some of the replicates to be bad despite there being no big difference in the ratios. With a small number of replicated spots all spots from a replicate may indeed have very close ratios, but several of them may be really deficient. Therefore, it is more important to consider those replicates demonstrating unusually high diversity in the obtained ratio estimates. This clearly suggests problems in the spots. Algorithmically, this can be achieved by assigning larger weights ψ_{k }to these replicates in the nonlinear fitting procedure (Eq. (5)). The fitted parameters w_{i }are then used to flag out all deficient spots, even those that belong to the replicates with the consistent ratio estimates. However, if all spots from a consistent replicate demonstrate poor quality, this can also be an indication that this replicate is actually good, but it was erroneously flagged out by the quality analysis procedure. This would require user intervention to correct for the selected set of quality characteristics and/or for the corresponding quality weight w_{i}. Additional procedures for automatic analysis of the replicates with consistent ratios but low quality values can be envisaged.
The fact that overall Q does not show up much better performance (Table 2) is due to rather good general quality of the images, and a few problematic triplicates cannot influence very much the averaged coefficients of variation. For example, in image 7A, we have less than 9% of triplicates with the ratio variation coefficients larger that the selected (~0.08), and 7% for image 7B ( ≈ 0.2).
Replicated arrays
Quality analysis can also be performed with replicated spots from different arrays. Three replicated images (Fig. 8 (insets)) were used for quality analysis. Although each image contains three replicated spots placed as neighbors in a row (similar to image A from Fig. 7), we pretended that there were no replicates within each array. Therefore we made available only replicated spots from different arrays for quality analysis. In Fig. 8 we present three quality plots. Black dots correspond to the case when the three replicated images were combined without normalization and the default quality weights were used. Relatively large coefficients of variation are due to the bias in the obtained Cy5/Cy3 ratio estimates between the three images. To apply our algorithm of quality analysis in this case the ideal quality curve should account for this bias, i.e. it should implicitly incorporate image normalization model. A simpler way would be to separate the normalization and quality analysis procedures: the image normalization should precede the quality analysis, or we can also envisage an iterative procedure where the steps of normalization and quality analysis are performed in turn. Taking into account that a variety of normalization methods [17] are currently available, we leave the detailed development of the quality analysis strategy in this case for the future.
Figure 8. Quality plots (Q_{k }versus V_{k}) using replicated arrays. Black dots – three replicated images (insets) are combined without normalization and the default quality weights are used; red dots – three replicated images (insets) are combined after the global normalization; blue dots – triplicate spots from the first array are used. The solid lines represent the exponential ideal quality curves f(V_{k}) (Eq. (6),a): Red line (for the red dots) –
≈ 0.125; Blue line (for the blue dots) – ≈ 0.08.As an example, we applied a global normalization algorithm [18]. It is assumed that most genes are not differentially expressed and therefore the expected averaged over all spots of the array Cy5/Cy3 ratio should be close to one. The normalization constant a for each array can be calculated as , where n is the number of spots and R_{i }is the ratio estimate for the ith spot. Red dots in Fig 8 represent the quality curve using the replicated arrays after the global normalization. For comparison, blue dots show the quality curve using the triplicates from the first array. The reddot cloud spreads wider, because, even after normalization, the variability of replicates from different arrays is higher than from the same array (where replicates were closely placed). Despite this replicated arrays can be used in the quality analysis since the required tendency – the decrease of the overall quality with the increase of the replicate variability – can be ensured. To account for higher levels of interarray statistical variability in replicates, we have to apply less stringent ideal quality curve. This can be achieved by selecting larger values for the parameter in Eq. (6).
Quality weight extrapolation
We demonstrate here a possibility to apply quality weights obtained from the analysis of one training image, which should contain replicated spots, to other arrays, which may not contain replicates. We used two images (Figs. 9A and 9B) of the same design as before: each image contains three replicated spots placed as neighbors in a row. Image 9A has obvious deficiencies (large amount of absent spots, scratches, contamination), whereas image 9B is less problematic.
Figure 9. Quality analysis of poorquality (A) and goodquality (B) images. A: (a) Poorquality image; (b) Image A with the "bad" spots identified by the quality analysis based on its own triplicates. B: (a) Image B with the bad spots identified by the quality analysis based on the triplicates from image A; (b) Image B with the bad spots identified by the quality analysis based on its own triplicates. "Bad" spots are the spots with the overall quality below 0.3. "Bad" spots are indicated by the white crosses.
The quality weights were estimated from the triplicates of image 9A, as it contains a wider diversity of possible artifacts and distortions. Using the obtained weights the "bad" spots were identified in image 9A. A spot was classified as a bad spot if its overall quality was below 0.3. Fig. 9A(b) shows the same image with the "bad" spots indicated by the white crosses.
To eliminate the "bad" spots (Q < 0.3) from image 9B, we first applied the quality weights obtained for image 9A. The result is shown in Fig. 9B(a). Then we performed quality analysis for image 9B using its own triplicates. The spots flagged out by this approach are indicated in Fig. 9B(b). Comparing Figs. 9B(a) and 9B(b), one can conclude that the results are very similar, although not identical, of course.
This example attempts to reproduce an important possibility of designing microarray experiments. A small number of training arrays with replicated spots and representative diversity of possible artifacts can be measured and analyzed. The obtained results can then be used to evaluate the quality of other arrays of similar design, which may not contain replicated spots.
Conclusion
We have described an algorithm for quantitative spot quality evaluation in DNA microarray image analysis that allows the automatic identification of the weights for the marginal quality parameters within the model combining these parameters into an overall spot quality value. The algorithm relies on the assumption that unspoiled replicated spots should have higher levels of consistency in the obtained Cy5/Cy3 ratio estimates than "bad" spots. The user is only required to define an ideal quality curve f(V_{k}) establishing how fast the overall quality of the replicates must decrease with increasing ratio variation in the corresponding replicates. For simple models from Eq. (6) only one parameter – the characteristic ratio variation coefficient – must be specified. In this paper the functional shape for f(V_{k}) was empirically selected through numerous experiments with artificial and experimental images. This, however, may not be an optimal choice and further improvements can be expected, preferably using more theoretical approaches. A complementary perspective is to further elaborate the algorithm for estimating the fitting weights ψ_{k}, which may be implicitly dependent on the ideal quality curve.
We use nine marginal quality characteristics, which cover a broad range of different deviations from a normal (good) spot. Therefore, it is possible that some of these parameters will not be relevant for a certain image. For example, if there are no clearly uncircular spots, the corresponding parameter (GS) can be omitted. The optimization procedure will however report this, assigning a very low weight to the corresponding quality characteristic. These weights are then converted into the critical levels using Eq. (2), which will tolerate a big diversity in the possible values of the corresponding parameter. Thus, the developed procedure both searches for the weights and implicitly reports on the relevance of the quality parameters to each particular image. However, if the images used to train the model (that is to identify the weights) are not representative enough and do not contain enough spots with certain types of artifacts, some important sources of spot deficiency may be overlooked. In this case, manual adjustment of the weights may be necessary.
Another problem is that generally a very small number of replicated spots (rarely more than 3) are available for the analysis. Therefore it is possible that all spots from a replicate, being defective, demonstrate consistent Cy5/Cy3 ratio values. However, we expect that is less probable than to observe unusually high diversity in the ratio estimates for these replicates. If all spots from a consistent replicate are actually good, but they are erroneously assigned low quality values, this is an indication of some problems in the quality analysis itself. This situation would possibly require user participation to correct for the selected set of quality characteristics and/or for the corresponding quality weights w_{i}.
We have demonstrated a possibility to carry out the quality analysis using replicated spots from different arrays. An additional procedure of the image normalization should precede the quality analysis in this case.
Authors' contributions
EN conceived the algorithm, performed software implementation and drafted the manuscript. EB conceived of the study and participated in coordination. All authors read and approved the final manuscript.
Acknowledgements
We would like to thank our colleagues from the different laboratories of the Institute Curie: (F. Radvanyi, UMR 144 CNRS/IC: Compartimentation et dynamique cellulaires; O. Delattre, INSERM 509: Pathologie moléculaire des cancers; M. Dutreix, UMR 2027 CNRS/IC: Génotoxicologie et cycle cellulaire) and Prof. D. Pinkel (UCSF Comprehensive Cancer Center) who have provided numerous microarray images allowing considerable improvement of the algorithms.
References

Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes JE, Snesrud E, Lee N, Quackenbush J: A concise guide to cDNA microarray analysis.
Biotechniques 2000, 29:548562. PubMed Abstract

Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays.
Nature Genetics 1998, 20:207211. PubMed Abstract  Publisher Full Text

Buhler J, Ideker T, Haynor D: Dapple: improved techniques for finding spots on DNA microarrays.

Brown CS, Goodwin PC, Sorger PK: Image metrics in the statistical analysis of DNA microarray data.
PNAS 2001, 98:89448949. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wang X, Ghosh S, Guo SW: Quantitative quality control in microarray image processing and data acquisition.
Nucleic Acids Res 2001, 29:e75. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chen Y, Kamat V, Dougherty ER, Bittner ML, Meltzer PS, Trent JM: Ratio statistics of gene expression levels and applications to microarray data analysis.
Bioinformatics 2002, 18:12071215. PubMed Abstract  Publisher Full Text

Hautaniemi S, Edgren H, Vesanen P, Wolf M, Järvinen AK, YliHarja O, Astola J, Kallioniemi O, Monni O: A novel strategy for microarray quality control using Bayesian networks.
Bioinformatics 2003, 19:20312038. PubMed Abstract  Publisher Full Text

Software package for automatic microarray image analysis (MAIA) [http://bioinfo.curie.fr/projects/maia] webcite

Novikov E, Barillot E: A robust algorithm for ratio estimation in twocolor microarray experiments.
Journal of Bioinformatics and Computational Biology 2005, 3:14111428. Publisher Full Text

Bozinov D, Rahnenführer J: Unsupervised technique for robust target separation and analysis of DNA microarray spots through adaptive pixel clustering.
Bioinformatics 2002, 18:747756. PubMed Abstract  Publisher Full Text

Kendall MG, Stuart A: The Advanced Theory Of Statistics. Volume 2. McMillan: New York; 1979.

Dissanaike G, Wang S: A critical examination of orthogonal regression. [http://ssrn.com/abstract=407560] webcite
2003.

Fisher LD, van Belle G: Biostatistics. A Methodology for the Heath Sciences. John Willey & Sons: New York; 1993.

Draper NR, Smith H: Applied regression analysis. John Wiley & Sons: New York; 1998.

Bevington PR, Robinson DK: Data reduction and error analysis for the physical sciences. McGrawHill: New York; 1991.

Jain Lab Downloads [http://jainlab.ucsf.edu/Downloads.html] webcite

Wu W, Xing EP, Myers C, Mian IS, Bissell MJ: Evaluation of normalization methods for cDNA microarray data by kNN classification.
BMC Bioinformatics 2005, 6:191. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Axon Instruments Inc: [http://www.moleculardevices.com] webcite