Abstract
Background
There has been much recent interest in the quantification of visually evident heterogeneity within functional grayscale medical images, such as those obtained via magnetic resonance or positron emission tomography. In the case of images of cancerous tumors, variations in grayscale intensity imply variations in crucial tumor biology. Despite these considerable clinical implications, there is as yet no standardized method for measuring the heterogeneity observed via these imaging modalities.
Methods
In this work, we motivate and derive a statistical measure of image heterogeneity. This statistic measures the distancedependent average deviation from the smoothest intensity gradation feasible. We show how this statistic may be used to automatically rank images of in vivo human tumors in order of increasing heterogeneity. We test this method against the current practice of ranking images via expert visual inspection.
Results
We find that this statistic provides a means of heterogeneity quantification beyond that given by other statistics traditionally used for the same purpose. We demonstrate the effect of tumor shape upon our ranking method and find the method applicable to a wide variety of clinically relevant tumor images. We find that the automated heterogeneity rankings agree very closely with those performed visually by experts.
Conclusions
These results indicate that our automated method may be used reliably to rank, in order of increasing heterogeneity, tumor images whether or not object shape is considered to contribute to that heterogeneity. Automated heterogeneity ranking yields objective results which are more consistent than visual rankings. Reducing variability in image interpretation will enable more researchers to better study potential clinical implications of observed tumor heterogeneity.
Background
There are hundreds of papers in the medical literature about the importance of heterogeneity within various types and degrees of cancerous tumors. In oncology parlance, “tumor heterogeneity” generally means that a whole tumor comprises distinct cellular subpopulations [1]. These cell groups vary in morphology, histology and growth rate, for example. The interactions of different tumor cells with each other and with the microenvironment are complex and not well understood [2]. In order to understand intratumor biology, potentially subtle differences near and within tumors must be quantified. The ultimate goal of tumor heterogeneity studies is to determine the implications and prognostic value of observed variations in a host of clinically relevant tumor properties such as physical size, shape, cellular density, cellular metabolism, hypoxia and vascularization. Each of these biologically interesting properties generally are assayed via an imaging modality—such as magnetic resonance (MR), computed tomography (CT) or positron emission tomography (PET)—which outputs grayscale images. Thus, the challenge of measuring biological heterogeneity manifests as the quantification of visually evident intensity variations in grayscale images.
Consider, for example, the PET image of a cancerous tumor shown in Figure 1. Within the bulk of the tumor, variation in the grayscale pixel intensities is seen. Toward the bottomleft, a region of the brightest pixels is seen to smoothly gradate to darker regions. Similar distinct darker regions then are seen at the center and toward the top of the bulk, although neither follow an easily described shape or pattern. Similarly amorphous bright regions punctuate the periphery of the tumor bulk. It is precisely these variations which interest the researchers because variation in PET intensity implies variation in biological properties of the tumor. When one attempts to perform a clinical study of those biological properties via PET image data, two significant obstacles arise. First, the variation in observed pixel intensity is difficult to describe verbally [3]. As such, there is no convention for describing the properties seen in tumor images. Thus, even experienced clinicians are unlikely to describe the same tumor in the same manner or, much less, even agree upon which image features are salient. This complicates the development of statistics used to quantify “obvious” intensity variations. Second, the implementation of any statistic will have to contend with relative paucities of pixel data. Consider, for example, that a typical PET scanner renders a 4×4×5 mm^{3} physical region as a single pixel in the final image [4]. If one assumes that a clinically typical tumor volume of 40 cm^{3} is a sphere, then the largest crosssectional image contains only ≈90 tumor pixels. In fact, it is common that oncology studies include a large fraction of PET images where fewer than 20 pixels comprise the entire tumor region. Therefore measuring the variation within these small regions can be difficult since the use of many statistics—including those common to texture analysis and correlation techniques [5]—requires large sample sizes.
Figure 1. Variations in the grayscale intensity of this tumor are evident. For example, contrast the bright region at the bottomleft to the darker region in the center. The regions of varying intensity are amorphous an diffusely bounded. In the present case, these regions represent the heterogeneous uptake of a radiolabeled glucose analogue (FDG). The vertical edge of this image corresponds to 64 mm. This image is also seen in Figure 4 as panel F1.
Several attempts to objectively quantify tumor heterogeneity have been published previously [610]. These attempts largely have yielded mixed results. The more rudimentary measures are nonspatial. That is, they depend only upon the distribution of intensities, not the spatial arrangement of those intensities within the tumor. For example, although the standard deviation, skewness and kurtosis clearly quantify variation of an intensity distribution, skewness and kurtosis have been shown not to vary sufficiently to separate some cervical cancers into groups of differing disease outcome [10]. Another nonspatial measure of heterogeneity which was shown to distinguish patient groups [6] was later demonstrated to actually be a surrogate for tumor volume [10]. It has been claimed that the image texture metrics introduced in the seminal work of Haralick et al. [11] can be used as spatial measures of heterogeneity [8,9]. However, many of those more sophisticated metrics have been shown later to be statistically irreproducible on tumor data [12]. In general, even in cases where enough image data exists to yield consistent interimage comparison, it remains unclear which texture metric corresponds to which subjectively perceived image quality [11]. In short, there is no standard for measuring tumor heterogeneity in a manner consistent with expert visual inspection.
In this study, we develop and test a novel statistic which may be used to quantify spatial variations in tumors imaged in vivo which were identified and segmented via independent analysis. In other words, we quantify the spatial heterogeneity within definitively bounded objects against a uniform background. Here, we do not seek to attach any clinical or prognostic meaning whatsoever to a given image or statistical value. Instead, we seek to give the clinician a quantitative mechanism of declaring one image to be x times more heterogeneous than another, such that a set of similarly imaged objects may be ranked and compared objectively. We compute our statistic on real PET images of tumors exhibiting visually manifest variations in size, shape and intratumoral intensity and find that our numerical heterogeneity comparisons agree well with the subjective comparisons made by experienced experts.
Methods
Motivation
Heterogeneity has applicationspecific connotation. Common to all applications, however, is that heterogeneity is necessarily defined by the scale of interest. For example, at scales much smaller than a single square, a chessboard image could be interpreted as homogeneous. This is due to the fact that at very small scales, neighboring pixels are much more likely to be in the same square than they are in one of differing color. At a scale on the same order of magnitude as the squares, the chessboard is more heterogenous because any pixel about one square size away from an arbitrary pixel has approximately equal chance of being either color. This is greater average variation than in the previous case of usually being confined to a single square. If the scale of interest becomes much larger than the size of a single square, the image may again be considered homogeneous since, on average, anywhere in the image, the pattern is precisely the same. When attempting to quantify the heterogeneity in an image, one must make clear the intended scale of interest.
Contrast the scaledependence of heterogeneity to the definitiveness of homogeneity. Ultimate homogeneity is unambiguously defined as the state being the same at all scales, e.g., an image of only a single color. We therefore assume that heterogeneity may be clearly defined as difference from homogeneity. Of course, a perfectly homogeneous image contains no information and is therefore of little practical use. The real images we study will contain pixels of varying grayscale intensity. We thus seek to first find the most homogeneous intensity transition between given pixels with differing intensities. Consider two distinct image pixels where the grayscale intensity is I_{1} at pixel one and I_{2} at pixel two. Assume that I_{1}>I_{2}. Let r be the distance measured from pixel one to an arbitrary point along the straight line segment between pixels one and two. The gradation between pixels one and two which is as locally homogeneous as possible, for every distance r, occurs when the whole intensity change is spread evenly across the entire line, i.e, when the derivative dI/dr is constant. We consider any deviation from that smoothest possible transition to be heterogeneity.
Mathematical development
We are interested in measuring the variation within grayscale image objects. We assume these objects have already been identified and isolated via relevant standards. We therefore begin with a definitively bounded object against a uniform background of zero intensity. We take as input a pair of distinct object pixels, which we arbitrarily label as m and n. The m^{th} pixel at image coordinate (x_{m},y_{m}) has a grayscale intensity I_{m}. The n^{th} pixel at image coordinate (x_{n},y_{n}) has a grayscale intensity I_{n}. The pair separation distance r_{mn} is simply the Euclidean distance between the m^{th} and n^{th} pixel. The interpixel intensity change is I_{mn}=I_{n}−I_{m}. The minimal discrete set of pixels connecting the m^{th} and n^{th} pixels is taken to be the Bresenham line [13] between those pixels, to which we collectively refer as . Thus, is an ordered set of pixels comprising the straightest line that can be drawn from the m^{th} pixel to the n^{th} pixel while still being constrained to a discrete lattice of nonfractional pixels. The smoothest possible gradation between these endpoints occurs when I_{n} monotonically decreases (or increases) to I_{m} over exhaustively. That is, when the intensity change is spread evenly across the entire line connecting the endpoints. We compute the grayscale value yielding the most
 homo
where r_{ml} is the Euclidean distance between the m^{th} pixel and l^{th} pixel in . The absolute difference between the homogenous grayscale value I(r_{ml}) and the actual value at the l^{th} pixel I_{l} is computed for each pixel in . These differences are summed over and then divided by the discrete pair separation L, i.e., by the number of elements in . We thus have computed
the average absolute intensity difference along a line connecting a pair of object pixels.
For each nonrepeating pair of object pixels, we compute the discrete distance L and . The result is a pool of values where any one value of L has associated with it many values of . Those multiple values are then ensembleaveraged for each L such that each discrete pair separation possible then has associated with it only a single value, . The discrete distances are normalized to the largest discrete distance ( ) observed amongst any object pixel pair. The are then plotted versus .
The resulting plot may be interpreted as follows. In essence, the plot measures how the average deviations from homogeneity scale with percent object size. As that percent distance approaches unity, the opportunity to accrue larger deviations from homogeneity increases. The simplest assumption is that object heterogeneity will then accrue proportionally as more of the object is considered. Thus, a curve observed to be well below the proportionality line implies that progressively greater spans across the object tend not to accumulate differences from a smooth gradation. In other words, that object tends to be more homogeneous. Conversely, an object with curve wellabove the proportionality line tends to be more heterogenous since deviations from smoothness accrue even across smaller spans. One simple means of quantifying the qualitative observation of a curve being above or below the proportionality line is via the area under that curve. We therefore define our heterogeneity quantification statistic to be
Because the distances plotted are relative to the size of the object, objects of greatly varying size may be compared to one another. This enables the use of ζ as a ranking statistic where increased ζ implies increased heterogeneity.
Implementation
Computer code to automatically compute ζ on grayscale images was written in Python v2.6.1 ( http://www.python.org webcite). Input images consist of an 8bit grayscale object, where each object pixel has intensity I>0, against a uniformly black background (I=0). The images were read into the program using the Python Imaging Library v1.17 ( http://www.pythonware.com/products/pil/ webcite). The xycoordinate at each nonzero pixel (an object pixel) along with the intensity at that pixel was recorded. For every unique pixel pair, the discrete line between the pair was calculated using Bresenham’s algorithm [13]. Here, we do not consider the pairs directionally. Thus, once a particular line from pixel m to pixel n has been considered, we do not consider again the line from n to m. We take adjacent pixels to have zero separation and thus there can be no intensity difference from the gradation between them. For line endpoints separated by more than one pixel, the smoothest gradient possible was computed using Equation 1. For each pixel between the endpoints, the average absolute difference between the measured intensity and that given by Equation 1 was computed via Equation 2. These averages were binned together for each integer value of the discrete line length. These binned averages then were themselves averaged, resulting in a single value at each discrete line length ( ). Each integer line length was divided by the maximum integer pairseparation observed for the image object. Using Equation 3, ζ was computed via the trapezoidal rule as implemented in Numpy v1.4 ( http://www.numpy.org webcite).
Test image database
Images used for numeric tests
Images were obtained as described in Ref. [10]. We briefly recapitulate the process here. Patients with cancers of the uterine cervix underwent a pretreatment hybrid PET/CT scan using the ^{18}Ffluorodeoxyglucose (FDG) radiotracer assay of glucose uptake by cells. As is common throughout the field of nuclear medicine, the raw FDGPET data are scatter and attenuation corrected via the proprietary software native to the PET machine. Images were reconstructed using ordered subset expectation maximization. A Gaussian smoothing filter with 4mm full width at half maximum was applied postreconstruction. No additional processing was implemented. In order to objectively distinguish tumor from background, we employed the ruleofthumb that, for a visually selected region of interest (ROI), any pixel brighter than 40% of the maximum ROI pixel brightness is to be considered part of the tumor [14]. Any remaining objects that are obviously (for sound anatomical reasons) not tumors are removed and the final ROI is exported as an 8bit grayscale region against a uniformly black background. Here, variations in grayscale intensity ostensibly imply variations in intratumoral metabolic activity [4]. To quantify these variations, we apply the computer code earlier described to the largest crosssectional tumor image for each patient.
Images used for visual tests
The wholebody images created by our PET scanner are only 168 ×168 pixels in size. The segmented tumor region within these images ranges from only 16 to 291 total pixels with a median of 63 pixels. These regions are generally too small to be inspected visually. Computer code using the Python Imaging Library (PIL) was used to automatically extract the predefined tumor region and paste that at the center of a new image such that the tumor is centered within a border which is half of the diameter of the tumor. This new image was expanded to 162 ×162 pixels using the resize function of the PIL with the filter option set to nearestneighbor. At 72 dots per inch, this corresponds to squares of side length 2.25 inches. Each image was printed and then pasted onto a separate piece of stiff card stock such that a group of images could be readily seen at once. The expert can then manipulate the cards into order of increasing image heterogeneity, i.e., visually evident intensity variations within an isolated tumor region.
In the rescaled images, very small tumor regions will appear manifestly “blockier” than very large ones. This is the typical pixelation seen whenever lowinformation images are expanded to greater than the original image size. The result is that a highly pixelated image may appear innately more homogeneous than a lesspixelated image simply due the relative sizes of the original pixels in the rescaled images. Furthermore, the blocky appearance of the rescaled tumor regions could give a clever physician some indication of tumor size, even without anatomical reference. Since we do not want the physicians to be biased by innate clinical knowledge regarding tumor size, we binned tumors of similar size into distinct groups which were independently inspected by the experts.
Comparison of test results
The test image sets earlier described were given to: an experienced clinical oncologist, a senior medical resident in the oncology department and a professional image processing expert with no medical background. Each expert was asked to rank each image set individually in order of increasing heterogeneity. Only the experienced oncologist, however, considered tumor shape as a heterogeneitydefining quality. The other two experts specifically ignored tumor shape and focused only upon intratumoral pixel variations. The computer code earlier described was used to compute an objective ranking of each image set based solely upon increasing values of the ζ statistic. The ζranking for each image set was compared to each expert ranking via the Spearman rank test of statistical correlation. As described in many textbooks, a pvalue associated with an uppertailed test of significance may be obtained from the Spearman correlation coefficient. Our null hypothesis is that the computer/expert rankings are totally uncorrelated. Our alternative hypothesis is that larger ζ values tend to pair with higher expertrankedheterogeneity.
Test of shape dependence
To explore the dependence of ζ upon object shape, we performed the following test. An 8bit grayscale image of a circle with radius 16 pixels and origin at the image center was created. The circle was shaded via a twodimensional Gaussian distribution such that the brightest intensity (255) is at the circle center and the dimmest intensity (128) occurs at the circle circumference. The result is a smoothly varying, symmetric “tumor” object. The object was then decimated by setting all pixels to zero within circular holes of radius 4 pixels centered at random locations within the object. As decimation is performed repeatedly upon the same object, the shape of the object becomes increasingly irregular. To test the effect of shape upon the ζ statistic, we modified the calculation given in Equation 3 to consider contributions from only the nonzero pixels along the connecting Bresenham lines and replacing the pairseparation L in Equation 2 with the number of pixels considered. This way, if the line between a pair of object pixels crosses the background, the differences of the background pixels from the smoothest gradation are ignored. This effectively renders any two backgroundseparated pixels to be neighbors, thus closing the shape while maintaining the relative orientation of one endpoint to the other.
Results and discussion
Because only some oncologists consider shape to contribute to overall tumor heterogeneity, we first describe the effect of object shape upon ζ. As described in the Methods section, a smoothly shaded circle was randomly decimated with circular holes. For a given number of decimations, an ensemble of 36 images was created. Some typical examples are shown in Figure 2. The ζ statistic was calculated as given in Equation 3 for each image in the ensemble and the results averaged to yield a single “shapeaware” value for a given number of decimations. This process was then repeated for each ensemble using modification earlier described to yield a single “shapeunaware” value, ζ_{u}, for a given number of decimations. The result is shown in Figure 3 where the contrast between ζ (triangles) and ζ_{u} (circles) is stark. We note that ζ_{u} varies only ≈5% from the average value and thus shows that ζ_{u} is effectively blind to the profound changes in object shape. In contrast, as the object becomes more irregular in shape, ζ increases by a factor of two. Additionally, because the greatly decimated objects have much less nonzero object area than do the slightly decimated objects, the lack of variation in ζ_{u} demonstrates independence from object size.
Figure 2. Repeated random decimation of a symmetrically shaded circle was used to test the shape dependence of the ζ statistic. Here, typical images of 2, 4, 8 and 16 decimations are shown. The images shown have been resized for display purposes.
Figure 3. The ζ statistic was calculated in shapeaware (triangles) and shapeunaware (circles) modes. As the object becomes more irregularly shaped with repeated decimation, the shapeaware heterogeneity measure increases by a factor of two while the shapeunaware metric exhibits little variation.
The ζ statistic was computed for each of the images shown in Figure 4. Within each image subset (labeled A–G), the images were ranked automatically in order increasing ζ value. These are the rankings compared to that of the experts via the Spearman rank test. Table 1 shows the associated pvalues which are the likelihood that a positive automated/expert rank correlation is due entirely to chance alone. As indicated by the low pvalues in the third column, the automated rankings via ζ value agree very well with those given by the veteran oncologist. Such close agreement is expected to occur by chance in only 1 out of every 25 attempts. Contrast this to the rankings given by either the oncology resident or professional image processor, where similar agreement with the automated ranking is expected to occur in 1 out of every 5 attempts. It is thus seen that ζranking agrees very well with those who consider object shape contributions to heterogeneity and does not agree with those who ignore object shape.
Figure 4. Tumors of various size, shape, mean intensity and heterogeneity are shown as assayed via FDGPET. The letternumber at the topleft indicates the set and ζrank within that set. The number at the topright is the rounded ζ value. The number at the bottomleft is the area of the tumor given in pixels. The sequence of numbers below is the ranks (within the single set) given to the image by the oncologist, the resident and the professional, respectively. The images above were rescaled via nearestneighbor filtering to a convenient display size; thus, the sizes shown should not be compared directly.
Table 1. Automated, shapeaware ranking versus ranking by human experts
This result was confirmed employing the modified, shapeunaware calculation described earlier. Our prediction is that ζ_{u} will agree with the two experts who ignored shape in their rankings while disagreeing with the expert who considered shape to be important. As seen in Table 2, this is precisely what occurred. The expected chance agreement between the automated and expert rankings is almost perfectly reversed from those seen in Table 1.
Table 2. Automated, shapeunware ranking versus ranking by human experts
We further investigated this sensitivity to shape as follows. Inspection of Figure 4 shows that images D5, D6, E5 and F6 each have the distinctive shape feature of a hole at the center. For these images of ringlike objects, 29≤ζ≤37. If one scans the entire image set, one finds five other images with a ζ value in this range. These images—such as B7, E6 and F5—are clearly not shaped similarly to the ringlike objects. It is thus seen that although the ringlike objects differ in ζ value by only 10% of their mean value, all ζ within this range do not correspond to similar objects. This demonstrates that while ζ is sensitive to object shape, it is not slaved to it. That is, one particular tumor shape does not uniquely correspond to a particular ζ value. This may be seen concisely in images D4–D6 where ζ_{D4}=ζ_{D5}, but have totally different shapes, while D5 and D6 have very similar shapes but ζ values which differ by about 10%.
Despite the ordered appearance of the rescaled images shown in Figure 4, clinically relevant tumor images vary greatly in size. This may be also seen in Tables 1 and 2 where the range of crosssectional tumor sizes varies tenfold. Since it is the relative arrangement of pixel variations that ultimately defines heterogeneity, a robust quantifier of heterogeneity should not scale with image size. For example, doubling the size of a chessboard does not change the size of the squares, instead, there simply are more of them. Thus, the heterogeneity is the same for both sizes of chessboard. With this in mind, we explored the correlation between ζ and tumor size. Since the ζ values for the pool of images shown in Figure 4 do not follow a normal distribution, the traditional Pearson product moment is not applicable. We therefore employ the Spearman rank test of correlation and find ρ_{S}=0.044 (p=0.39), which implies no appreciable correlation between ζ and object size. This is expected because the effective diameter of the object ( ) is divided out of Equation 3.
As mentioned in the Background, there have been mixed results from previously proposed heterogeneity metrics. We therefore tested ζ against the measures shown to be more reproducible and clinically applicable. Specifically, these metrics are the variance of the intensity distribution, the local entropy, image energy, image contrast and local homogeneity [810,12]. The latter four metrics are calculated from intensity cooccurrence matrices as described in Ref. [11]. In brief, a cooccurrence matrix describes the probability that pixels of differing shades occur as fixeddistance neighbors. As these matrices are innately directional, we computed the matrices first in the horizontal (angle=0) and then in the vertical (angle= π/2). A given heterogeneity metric was then computed from each matrix and rootmeansquare averaged into a single measure. In this manner, we computed each of the proposed metrics on each of our test images. These values are given in Table 3. Ranking of each image set via each of the metrics (including ζ) was done and compared to those done independently by the veteran oncologist. As seen in Table 4, ζranking is by far the most consistent with the human expert. In fact, the variance, local entropy and image energy rankings, on average, do not agree with expert analysis any better than random chance. The next best correlation is given by the image contrast which, on average, clearly did not perform as well as ζ in ranking our test images consistent with the veteran oncologist.
Table 4. The correlation between the heterogeneity rankings computed via various statistics and that done independently by the veteran oncologist was quantified via the Spearman rank test
Critique of the method
Handling large images
As discussed in earlier sections, while the tumor images we study vary greatly in size relative to one another, even the largest tumor yields a fairly small number of pixel pair combinations. Thus, the isolated tumor objects are readily processed on a desktop computer. However, the computational order of ζ is where is the average distance between two pixels and N is the number of pixels. Therefore, the exhaustive pairing we employ on the relatively small tumor regions is unlikely to be feasible on images containing a large number of object pixels. For such cases, we suggest that one take as large a random subset of all possible pixel pairings as is computationally accessible and proceed to calculate ζ as before.
Dependence upon bit depth
During computation of ζ, one must take the difference between a measured intensity and that predicted by the smoothest gradient. However, the smoothest gradient possible is determined by the number of shades available to transition between pixel pairs. Thus, ζ must depend upon image bit depth. This could be problematic when comparing image sets from different institutions which likely employ very different imaging and data storage protocols. One solution to this is the use of a common bit depth. For example, in our test images, a threshold of 40% of the maximum observed intensity was used to define the tumorous regions. Thus, only the top 60% of the grayscale range is employed to shade all intratumoral variations. On average, the brightest pixels corresponded to 80 kBq/mL of radioactivity (data not shown). This means that only 48 kBq/mL separate the brightest tumor regions from the dimmest. The noise associated with the FDGPET process may be estimated from Ref. [15] to be ∼1 kBq/mL. Thus, only ≈48 shades of gray are required to shade our example objects (the tumors) in such a manner that differing shades correspond to genuinely differing amounts of radioactivity. A similar downscaling of image bit depth could be employed on a patientbypatient basis. Statistics derived from these images (such as ζ) could then be meaningfully compared across patients since these variations are those common to the physics of the imaging modality, not those distinct to an arbitrary choice of bit depth.
Vectorization of ζ within one image
A directional heterogeneity quantifier is desirable for cases where the grain of the image has physical meaning. This could occur, for example, for stained histology slides [16] or for magnetic resonance images of tumors [17]. We suggest that the vectorization of ζ may be accomplished as follows. Again, we begin with a pair of arbitrarily labeled object pixels at coordinates (x_{m},y_{m}) and (x_{n},y_{n}). Instead of following the Bresenham line directly from pixel m to pixel n, one first could compute ΔI_{x} along a purely horizontal direction—say from (x_{m},y_{m}) to (x_{n},y_{m})—then compute a purely vertical contribution ΔI_{y} from (x_{n},y_{m}) to (x_{n},y_{n}). We note that does not equal the ΔI computed directly between (x_{m},y_{m}) and (x_{n},y_{n}), as was described previously. For both ΔI_{x} and ΔI_{y}, the ensembleaveraged, average absolute difference versus relative length curves may be constructed and corresponding ζ computed independently for each direction.
Consider, for example, an image consisting of uniquely shaded horizontal stripes. The directional ζ_{x}=0 since, for any pixel m, the intensity at all horizontal distances from m is identical to that of m. That same image, however, will have a ζ_{y}>0 since any vertical line must cross a stripe boundary at some distance away from m.
Two important caveats accompany the computation of directional heterogeneity quantifiers within an image. First, the choice of moving horizontally then vertically is completely arbitrary. As one proceeds horizontally from a given pixel, one encounters different intensity values than would be encountered had one first proceeded vertically. This ambiguity also arises when considering the order in which pixels are paired. Proceeding from the topleft of the image, the horizontal direction from m to n is defined by the upper pixel while pairing from the bottomright of the image, it is defined by the lower pixel. Thus, even decreeing that one always proceeds horizontally first does not guarantee consistent directional ζ. It is for this reason that we chose to employ a purely scalar ζ; one which is unambiguously defined by the pixel pairs. We suggest that anyone reporting directional ζ also clearly report and motivate their particular procedural choices.
The second concern lies in the fact that there is no reason to presume that directional components within the image have physical meaning within the image. Therefore, what is readily computed as horizontal in the image data may not be horizontal in the real object. For example, there is no guarantee that the stripes of some striped object align with the vertical or horizontal directions of the image. Therefore, before directional quantifiers may be compared across images, some common reference frame must be implemented. We suggest that one such frame is that determined by the quadrupole moment of the image object. The origin of such a frame is given by the object dipole. The object may then be translated such that the dipole is at the center of the image then rotated such that the strongest quadrupole eigenvector aligns with (arbitrarily) the xaxis. This way, directional variations are measured relative to objectively determined reference frames.
Extension to three dimensions
We present two dimensional (2D) images test images because we were interested specifically in measuring correspondence with human experts. Two dimensional tumor images pasted onto equallysized cards served as a ready means of presenting and manipulating the images into subjective order. Of course, an extension of ζ into three dimensions (3D) may be desired for a given clinical application. The Bresenham algorithm is extensible into 3D [18] and is available in many commercial software packages currently in common use. Using those 3D line data, ζ may be computed as described for the 2D case. A second means of extension into 3D might be as follows. Instead of drawing the Bresenham line along discrete pixels, a Euclidean line may be drawn directly to pixel pairs. The intensity value anywhere along that line may be determined by averaging and/or interpolation of the neighboring pixels. This gives I_{m} and I_{n} in Equation 1. The absolute intensity difference seen in Equation 2 may then be averaged along the Euclidean line using arbitrary intervals. That average may then be used in Equation 3 to compute ζ as was done in the 2D case. While we expect that a 3D ζ thus computed can be used to objectively rank 3D objects in order of increasing heterogeneity, the merits of one computational method over another are not clear and are a subject for further study. Additionally, if a comparison to expert opinion is to be done for 3D virtual objects, the rendering technique will also have to be scrutinized. For example, perception of 3D virtual objects varies greatly from persontoperson. Also, the construction technique—e.g., brightest intensity versus average intensity along a given line of sight—adds variation to the humancomputer comparison that likely requires many repeated trials to overcome. In the present work, we sought only an introduction to our automated ranking technique and purposely avoided the complications involved with extension into three dimensions.
Applications & future work
There is increasing interest in the role of heterogeneity within the tumor microenvironment as an indicator of disease prognosis [19]. Therefore, one application is to investigate whether the ζ measure has any prognostic value. Cancer patients where the treatment response and/or longterm survival is known a priori could be ranked objectively via ζ heterogeneity. This could be done for some subset of twodimensional images (such as the largest tumor crosssections) or for threedimensional, whole tumor, virtual objects as described above. The correspondence between heterogeneity and disease outcome could then be checked via rank testing, survival analysis or a Bayesian test of predictiveness.
Another interesting application of ζ lies in the association of a heterogeneity score with intratumoral diffusion. Diffusion tensor imaging (DTI) is a magnetic resonance modality which can measure the directional diffusion of water within a cancerous tumor [19,20]. A three dimensional tumor is imaged via twodimensional cross sections. Each image pixel then corresponds to a threedimensional vector representing the flow of water at the coordinate of the pixel. Unlike FDGPET imaging—where the relative contributions of intracellular metabolism and intercellular density and arrangement are not known—the images resulting from DTI result directly from the tumor microenvironment [20]. Thus, anisotropies in the diffusion imply variations in the initiation and maintenance of tumor growth. With this in mind, it would be very interesting to investigate the relation between the bulk intratumoral heterogeneity and diffusion anisotropy. One means of doing this is to parse the DT images into three sets: one for each vector component of the diffusion. For interpatient comparisons, these directions could be embedded directions relative to the tumor itself, for example, the same principal eigenvectors from which the fractional anisotropy is commonly computed [20]. The result is three sets of threedimensional coordinates with a grayscale value at each coordinate which represents the magnitude of diffusion in a given direction at that coordinate. A threedimensional ζ may be computed via Equation 3 for each set of coordinates. Those ζ values, each between zero and unity, may then be combined to form a single vector, . This is very different from the possible vectorization discussed earlier. There, arbitrary directions were imposed upon an image to create a vector quantity within that image. Here, each image set represents diffusion measured in a distinct physical direction. Thus, is a vector where each component measures the heterogeneity of an entire threedimensional image set, and that set measures diffusion in only one physical direction. If one then maps onto an RGB colorspace, the bulk heterogeneity (which could be indicative of directional diffusion) throughout the tumor bulk may then be reported as a single color. For example, it could be the case that tumors with predominately transaxial diffusion are less treatable than those with predominately planar diffusion. These differing diffusion scenarios will yield different colors since two of the similar magnitude can still point in different colorspace directions. The color is one objective means of quantifying visually perceived image heterogeneity and relating it to a directional, physical quantity which itself feasibly relates to tumor growth. Thus, is another means of studying the relation between image heterogeneity and clinical prognosis.
Conclusions
We have demonstrated a method for automatically ranking grayscale medical images in order of increasing heterogeneity. We have done this not only in a fashion where shape is considered to contribute to overall object heterogeneity, but also under conditions where shape is ignored. In both cases, the automatic ranking is found to agree very well with the rankings done visually by experts. The example images we analyze, specifically, those of a grayscale object against a uniformly black background, are precisely the type of images available after a clinician delineates tumors via standard image segmentation techniques. For these example cases, our shapesensitive ranking statistic was shown to yield heterogeneity rankings which almost perfectly parallel those given by a veteran radiation oncologist. Automated ranking via heterogeneity offers a new avenue for objectively studying the clinically crucial relation between disease outcome and some tumor properties observable in the images obtained at diagnosis.
Consent
Written informed consent for use of the data in this report was waived by the Washington University Institutional Review Board.
Endnotes
^{a}We note for clarity that the “image energy” and “local homogeneity” monikers employed by some authors actually refer to the “angular second moment” and “inverse difference moment”, respectively, as originally given in Ref. [11].^{b}We note again that this is the transaxial direction relative to the tumor—as determined by the threedimensional principal components of the image set—which is not necessarily the transaxial direction of the clinical imaging process.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
FJB conceived and drafted the manuscript as well as performed all mathematical analyses. PWG designed the protocol for the interpretation the FDGPET images, provided expert, independent ranking of those images, as well as created and maintained the database of clinical data and images employed herein. Both FJB and PWG read and approved the final manuscript.
Acknowledgements
We thank Bruce E. Davis for illuminating discussions and careful review of the manuscript. We thank Richard Laforest for providing technical advice about the PET process. We thank Jeffery Olsen, M.D. and Shalini Priti, M.Sc. for their expert assistance in ranking the test images. We also thank the reviewers for providing some specific, extraordinarily helpful comments. This work was supported by the National Institutes of Health under Grant 1R01CA13693101A2.
References

Heppner GH: Tumor heterogeneity.
Cancer Res 1984, 44(6):22592265. PubMed Abstract  Publisher Full Text

Torquato S: Toward an Ising model of cancer and beyond.
Phys Biol 2011, 8:015017. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Miller TR, Pinkus E, Dehdashti F, Grigsby PW: Improved prognostic value of 18FFDG PET using a simple visual analysis of tumor characteristics in patients with cervical cancer.
J Nucl Med 2003, 44(2):192197. PubMed Abstract  Publisher Full Text

Wahl RL: Principles and Practice of PET and PET/CT. China: Lippencott, Williams and Wilkins; 2009.
ISBN: 0781779995

Sonka M, Hlavac V, Boyle R: Image Processing, Analysis, and Machine Vision, 3rd Edition. Toronto: Thompson Learning; 2008.
718–745, ISBN 049508252X

Kidd EA, Grigsby PW: Intratumoral metabolic heterogeneity of cervical cancer.
Clin Cancer Res 2008, 14(16):523641. PubMed Abstract  Publisher Full Text

Eary JF, O’Sullivan F, O’Sullivan J, Conrad EU: Spatial heterogeneity in sarcoma 18FFDG uptake as a predictor of patient outcome.
J Nucl Med 2008, 49(12):19739. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

El Naqa I, Grigsby P, Apte A, Kidd E, Donnelly E, Khullar D, Chaudhari S, Yang D, Schmitt M, Laforest R, Thorstad W, Deasy JO: Exploring featurebased approaches in PET images for predicting cancer treatment outcomes.
Pattern Recognit 2009, 42(6):11621171. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Tixier F, Le Rest CC, Hatt M, Albarghach N, Pradier O, Metges JP, Corcos L, Visvikis D: Intratumor heterogeneity characterized by textural features on baseline 18FFDG PET images predicts response to concomitant radiochemotherapy in esophageal cancer.
J Nucl Med 2011, 52(3):36978. PubMed Abstract  Publisher Full Text

Brooks FJ, Grigsby PW: Current measures of metabolic heterogeneity within cervical cancer do not predict disease outcome.
Radiat Oncol 2011, 6:69. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Haralick R, Shanmugam K, Dinstein I: Textural features for image classification.

Tixier F, Hatt M, Le Rest CC, Le Pogam A, Corcos L, Visvikis D: Reproducibility of tumor uptake heterogeneity characterization through textural feature analysis in 18FFDG PET.
J Nucl Med 2012, 53(5):693700. PubMed Abstract  Publisher Full Text

Bresenham J: Algorithm for computer control of a digital plotter.

Miller TR, Grigsby PW: Measurement of tumor volume by PET to evaluate prognosis in patients with advanced cervical cancer treated by radiation therapy.
Int J Radiat Oncol Biol Phys 2002, 53(2):3539. PubMed Abstract  Publisher Full Text

Schmidtlein CR, Beattie BJ, Bailey DL, Akhurst TJ, Wang W, Gönen M, Kirov AS, Humm JL: Using an external gating signal to estimate noise in PET with an emphasis on tracer avid tumors.
Phys Med Biol 2010, 55(20):62996326. PubMed Abstract  Publisher Full Text

Loukas CG, Linney A: A survey on histological image analysisbased assessment of three major biological factors influencing radiotherapy: proliferation, hypoxia and vasculature.
Comput Methods Programs Biomed 2004, 74(3):183199. PubMed Abstract  Publisher Full Text

Castellano G, Bonilha L, Li LM, Cendes F: Texture analysis of medical images.
Clin Radiol 2004, 59(12):10619. PubMed Abstract  Publisher Full Text

Agoston MK: Computer Graphics and Geometric Modeling: Implementation and Algorithms. USA: Springer; 2005.
ISBN 1852338172

Galbán S, Brisset JC, Rehemtulla A, Chenevert TL, Ross BD, Galbán CJ: Diffusionweighted MRI for assessment of early cancer treatment response.
Curr Pharm Biotechnol 2010, 11(6):701708. PubMed Abstract  Publisher Full Text

JohansenBerg H, Behrens TE (Eds): Diffusion MRI: From Quantitative Measurement to Invivo Neuroanatomy. China: Elsevier Science; 2009.
ISBN: 9780123747099
Prepublication history
The prepublication history for this paper can be accessed here: