Abstract
Background
Computational analysis of tissue structure reveals subvisual differences in tissue functional states by extracting quantitative signature features that establish a diagnostic profile. Incomplete and/or inaccurate profiles contribute to misdiagnosis.
Methods
In order to create more complete tissue structure profiles, we adapted our cellgraph method for extracting quantitative features from histopathology images to now capture temporospatial traits of threedimensional collagen hydrogel cell cultures. Cellgraphs were proposed to characterize the spatial organization between the cells in tissues by exploiting graph theory wherein the nuclei of the cells constitute the nodes and the approximate adjacency of cells are represented with edges. We chose 11 different cell types representing nontumorigenic, precancerous, and malignant states from multiple tissue origins.
Results
We built cellgraphs from the cellular hydrogel images and computed a large set of features describing the structural characteristics captured by the graphs over time. Using threemode tensor analysis, we identified the five most significant features (metrics) that capture the compactness, clustering, and spatial uniformity of the 3D architectural changes for each cell type throughout the time course. Importantly, four of these metrics are also the discriminative features for our histopathology data from our previous studies.
Conclusions
Together, these descriptive metrics provide rigorous quantitative representations of image information that other image analysis methods do not. Examining the changes in these five metrics allowed us to easily discriminate between all 11 cell types, whereas differences from visual examination of the images are not as apparent. These results demonstrate that application of the cellgraph technique to 3D image data yields discriminative metrics that have the potential to improve the accuracy of imagebased tissue profiles, and thus improve the detection and diagnosis of disease.
Background
Errors in the structural organization and function of tissues are a major cause of many devastating human diseases, including cancer. Currently, clinicians use diagnostic profiles to distinguish between varying degrees of tissue health and disease. These profiles typically contain a combination of quantitative (e.g., expression of molecular markers, epidemiology) and qualitative (e.g., imagebased assessment) data. The primary means of diagnosing most cancers is histopathological examination of a biopsy, and the resulting diagnostic profile serves as the "gold standard" in almost all cases. This examination focuses on the following traits [1]:
1. Nuclear atypia: The morphological atypicality of a cell (such as polymorphism, multinucleated cells, and gigantic cells) often but not always implies cancer.
2. Cytoplasmic changes: Higher values of the ratio of the surface area of the nucleus to that of the cytoplasm may imply cancer.
3. presence of other changes such as increased vascularity and necrosis.
4. Cellularity: An increase in the number/density of cells within a tissue may indicate proliferation of a cancer, or simply an increase in inflammatory processes.
5. Cell distribution: The location and organization of cells relative to each other is used to identify cancer. For example, cancerous brain tissues have more randomly distributed cells, whereas areas of inflammation have more evenly distributed cells. The diagnostic profile for prostate cancer includes digital rectal examination, expression levels of prostate serum antigen (PSA), and numerous imagebased approaches (e.g., magnetic resonance imaging, ultrasound, CT scan, conventional biopsy/Gleason score) (reviewed in [2]). Both the incidence and mortality of prostate cancer have declined in the US and UK since the addition of PSA levels to this profile, yet the diagnostic value of the PSA test is still debated [3]. Many other molecular markers for prostate cancer are now appearing in the literature [4], though the functional roles of many are unknown. A similar situation exists for diagnosing breast cancers, such that the rate of misdiagnosis varies widely between clinicians and is nearly 40% in some cases [5].
Much of the classification errors in diagnosing solid tumors stems from incomplete tumor profiles, i.e. understanding the relationship between the functional state of a tissue and its structural organization. For example, the imaging methods used to grade the severity of solid tumors rely largely on the observations of the pathologist and qualitative metrics such as the thickness of an epithelial cell layer, atypical cell morphology, and relative uptake of contrast agents [6]. Even expression of most molecular markers is measured qualitatively, e.g., by the degree of staining with an antibody [7]. While these methods can enrich diagnostic profiles, they largely fail to address the underlying structural malfunctions that form the basis for the disease. Development of quantitative tools for image analysis and predictive modeling is thus a rapidly expanding field, showing great promise for improving diagnostic accuracy [6,8,9].
We recently developed a graph theoreticalbased method, called cellgraphs, for capturing structural characteristics of histopathological images that enabled distinguishing healthy, damaged, and cancerous states of brain, breast, and bone tissues [1012]. Our earlier studies relied on modeling functional state via the spatial organization of cell nuclei within standard histological biopsy images, and achieved accuracy equivalent to current diagnostic standards. For example, despite the visual resemblance between damaged and diseased brain tissues (both display a high cell density), the features extracted from the cellgraphs were able to distinguish between them with greater than 95% accuracy[10].
Cellgraphs are generalizations of Delaunay Triangulations that were previously used to model the spatial distribution of cells in a tissue by encoding a pairwise relationship between two vertices [13]. In a cellgraph, nodes (or vertices) represent the cell nuclei and pairs of nodes are connected by a link (or edge) based the chemical, physical or spatial, biological relationship between them. Distancebased construction of edges was most commonly used in previous studies [1012,1417]. Application of graph theory to these cellgraphs provides a rich set of computational metrics that represent the structural characteristics of the underlying tissue samples. Utilization of machine learning techniques then allows us to classify different functional states of tissues. We elected to use graph theorybased methods because they have an impressive record of modeling complex relationships in numerous contexts. Realworld graphs of varying types and scales have been extensively investigated [18] in technological [1921], social [2228] and biological systems [2931]. In spite of their different domains, such selforganizing structures unexpectedly exhibit common classes of descriptive spatial (topological) features [17,18,21,23,32]. These features are quantified by definition of computable metrics.
The major novelties of this study include: 1) cellgraph analysis of threedimensional temperospatial tissue samples with various origins and functional states, 2) differentiation between the tissue samples based on unique structural formations relative to functional state, 3) exploitation of multiway analysis to identify the most influential signatures that capture most of the variation in the data, and 4) establishing a correspondence between cellgraph features for invitro and invivo histology samples. The previous cellgraph work was confined to twodimensional histology samples stained with haematoxylin and eosin. In this study we expanded our analysis to temporal analysis of 3D hydrogel models of the three most common types of tissues that develop solid tumors (epithelial, connective, and neural), to explore additional temporospatial information currently inaccessible in conventional histology samples. 2D and 3D cell culture models form the foundation for virtually all drug screening regimens and remain valid in vitro representations of human tissues[33]. Furthermore, 3D cell culture is widely used in the fields of biology and medicine to study the organization of cells in native extracellular matrix (ECM) constructs [3436]. Likewise, cell lines with varying molecular mechanisms and protein characteristics are often used to represent a range of functional health states. Although there are limitations to in vitro studies, the cell lines used in this study represent a range of tissue types allowing us to directly compare the structural profiles of various functional states through analysis of cellgraph metrics. The resulting sets of cellgraph metrics that evolved over time yielded a distinct profile for each cell/tissue type, and thus have potential to identify structurefunction relationship changes in a threedimensional cell culture system. The long term goal of this study is to further understand cancer models by interpreting changes in metrics in terms of underlying changes in molecular mechanisms of cancer progression. To uncover these mechanisms, it is necessary to simplify the model in order to isolate specific cellcollagenI interactions.
Methods
Cell Culture Techniques
The different cell types and their respective culture conditions are listed in Table 1. The functional categories of each cell type are listed in Table 2.
Flourescence Imaging
Gels were fixed using 3% paraformaldehyde at 6 different time points (hours): 10, 16, 24, 72, 120, 168. Each was washed with PBS, then stained with nucleic acid dye (sytox green). Images of cells encapsulated within collagenI hydrogels were captured using a Zeiss LSM 510 META confocal microscope with a 10X dry objective. Representative Zstack images of 100 μm thickness with 900 μm × 900 μm crosssection area were collected for five samples of each time point.
Segmentation of Nuclei
To segment the cell nuclei, we first binarize the images. Binarization separates the image values into foreground and background classes. In our context, the foreground class represents the cell nuclei, whereas the background class represents the combination of cells and extracellular proteins. Binarization is accomplished by comparing the image values against a threshold function. Considering the large number of images that need to be processed, we employ Otsu's simple but effective automatic threshold selection algorithm[37] that determines a global (single) threshold for the image based on the histogram of image values. Each connected component in the resulting binary image corresponds to a nucleus and the coordinates of the centroids of these nuclei are calculated to identify the coordinates of the node (vertex) set for cellgraph generation.
Generation of CellGraphs
After obtaining the set of vertices in the images, we construct the cellgraphs based on the pairwise nuclei distances [1012,1417]. We assume that a biological relation exist between two nuclei, i.e. a link (or edge) between two nodes is established, if the Euclidean distance between the corresponding centroids are less than a threshold D. We tested 3 thresholds: D = 60, 75, and 90 μm. The graphs corresponding to 60 and 90 μm turned out to be too sparse and dense, respectively. Therefore, we decided to use 75 μm as the threshold.
Figure 1 illustrates the steps involved in extracting cellgraph features in 3D. Figure 1a shows an example of MG63 osteosarcoma cells, one of the eleven different types of cells representing various tissue functional states (listed in table 1), encapsulated in a collagenI hydrogel at time 0. Figure 1b shows a twodimensional slice of stained nuclei in a confocal fluorescence Zstack from the hydrogel in Figure 1a. The nuclei were then identified with the application segmentation algorithm described earlier (Figure 1c) to establish nodes within the graph. We applied our cellgraph algorithm to define edges between nodes (Figure 1d) within a distancebased threshold of 75 μm resulting in a 3D cellgraph.
Figure 1. Cellgraphs uncover hidden tissue architecture generated from 3D in vitro collagenI hydrogels. 1a shows a macroscopic image of an MG63 collagen I hydrogel following fixation. 1b displays a twodimensional slice from 3D confocal image of hydrogel (green = nuclei). 1c is a computer generated representation of confocal image after application nuclei segmentation algorithm to identify cell location in 3D space. 1d shows how cellgraphs are built by applying graph theory to computergenerated confocal image representation.
Calculating Features from CellGraph Metrics
On each cellgraph, G_{i }(V_{i }(t),E_{i}(t)),where V_{i}(t) and E_{i}(t) represents the list of vertices and nodes at time point t and i represents the index for the cell line, we calculated 20 metrics as listed in Table 3 based on the structural features of the graphs. We then conducted an analysis in the following section to determine the metrics that have the most discriminative power between the different tissue types over time.
Table 3. Cellgraph metrics, interpretations, and categories.
ThreeWay Data Modeling and Analysis of FeatureTimeCell line Joint Relationships
The data is organized to a thirdorder tensor with features, time, and cellline modes whose dimensions are I, J, and K, respectively. An entry
 T
 T
where P, Q, and R indicate the number of components extracted from the first, second and third modes (P ≤ I, Q ≤ J, and R ≤ K, respectively, A∈ℝ^{I×P}, B∈ℝ^{J×Q}, and C∈ℝ^{K×R}, and are the component matrices,
 G
 E
Parallel Factor Analysis (PARAFAC) [41] or Canonical Decomposition (CANDECOMP)[43] represents a tensor by the linear combination of rankone tensors. An Rcomponent PARAFAC model on a thirdorder tensor
 T
where a_{r}, b_{r}, and c_{r }are the r^{th }columns of the component matrices A∈ℝ^{I×R}, B∈ℝ^{J×R}, and C∈ℝ^{K×R}, respectively,
 E
Prior to the model fitting, the tensor is normalized by first centering across the time and cellline modes and then scaling within the features mode by the standard deviations[44]. In order to capture most of the variation in data, we first unfolded the tensors in each mode and determined the number of principal components that explains at least 95% of the variation in the data. The Tucker3 model was fit with 6 × 5 × 8 core tensor and the PARAFAC model was fit using 8components to the normalized tensor where 93.7% and 89.6% of the variations in the data are captured, respectively. The analysis then focused on the feature mode in order to identify a subset of the cellgraph metrics that are more influential than the others to explain the variation in the threeway data. For this purpose, we used the Hotelling's T^{2 }statistics and the sum of squared residuals of each mode. The larger the value of these statistics, the easier it is to distinguish between the different metrics and, therefore, they are useful indicators of the influence of metrics as outliers to explain the variation in the data. These statistics are built in the MATLAB PLS Toolbox 4.0 and MATLAB Tensor Toolbox 2.4 [45]. Figure 2 shows the Hotelling's T^{2 }values versus the sum of squared residuals and Figure 3 shows only the Hotelling's T^{2 }values of each metric. From these figures, the most influential metrics are chosen as number of central points, clustering coefficient D, percentage of isolated points, standard deviation of edge lengths, and number of connected components.
Figure 2. Influence of cellgraph metrics to explain the variation in data according to the Hotelling's T^{2 }values and sum of squared residuals of each metric. 2a and 2b show the Hotelling's T^{2 }values versus the sum of squared residuals of each metric in Tucker3 and PARAFAC model fitted data, respectively. Note that the highly influential metrics appear in the upper triangular portion of the plot.
Figure 3. Influence of cellgraph metrics to explain the variation in data according to the Hotelling's T^{2 }values of each metric. 3a and 3b show the Hotelling's T^{2 }values of each metric in Tucker3 and PARAFAC model fitted data, respectively. This figure displays the metrics with increasing importance from lower left to upper right corner to discriminate between in vitro samples.
TwoWay Data Modeling and Analysis of FeatureTissue Joint Relationships
Our histology data set contains 329 malignant and 210 benign brain samples, 128 malignant and 195 benign breast samples, and 49 malignant and 20 benign bone samples. The average of the 20 cellgraph metrics over the samples of each tumortype is taken to construct 6 × 20 twoway data matrix. In order to determine the influence of a metric to describe the variations in the data, a singular value decomposition (SVD) based technique is employed. First, the data is normalized by centering across the tumortype and scaling within the features mode. Next, the data is decomposed into its factor scores and loadings using SVD. Finally, the influence of a metric is measured by the sum of absolute factor scores corresponding to the first K factor loadings where K is the number of principal components that explains at least 95% of the variation in the data. K is determined to be three, reflecting the number of different tissue types.
Results
We extended our previously published cellgraph method of feature extraction into three dimensional collagenI hydrogel cultures that remodel over time. We extracted the set of 20 quantitative features (table 3) from the generated cellgraphs. We then applied tensor analysis to the extracted features using Tucker3 and PARAFAC models to identify the features that contribute the most to discriminating between different cell/tissue temporospatial architectures over time. Figure 2a and 2b show the influence of each metric according to Hotelling's T^{2 }and sum of squared residuals scores for Tucker3 and PARAFAC models, respectively. The most important metrics are located in the upper right triangular region of these figures. Figure 3a and 3b shows Hotelling's T^{2 }scores only for Tucker3 and PARAFAC models, respectively. The metrics are displayed in increasing importance from the lower left corner to the upper right corner in this figure. We determined the five most important metrics for distinguishing between different hydrogel architectures over time based on their nuclear organization using the combination of results from Figures 2 and 3: number of central points, percentage of isolated points, number of connected components, clustering coefficient D, and standard deviation of edge lengths.
To validate our findings, we used the cellgraphs for the histology data that we analyzed using a similar framework in our earlier studies [1012]. The discriminatory power of the extracted cellgraph metrics was successfully shown for the malignant and benign histology samples of brain [10], breast [11], and bone [12] tissues. Since these samples were surgically removed histopathology samples no temporal information is available. Thus, our histology data has two modes: tissue samples and features extracted on these samples. These data sets are obtained from 2D imaging of tissue samples from pathology department archives thus they do not have the depth information. Figure 4 shows the influence of cellgraph metrics to describe the variations in the histology data.
Figure 4. Influence of the cellgraph metrics to describe the variations in the histology data. This figure displays the metrics with increasing importance from lower left to upper right corner to discriminate between histology samples.
Figure 5 shows the Venn diagram of the five most significant metrics for the histology and the invitro data. We found considerable overlap between the two sets of discriminative metrics as displayed by the Venn diagram in Figure 5c. This confirms that our 3D hydrogels maintain important structural properties found in histological samples.
Figure 5. Histology and In Vitro tissue both have similar as well as unique metrics that can be used to distinguish between tissue types. The Venn diagram displays the most important metrics found by singular value decomposition and tensor analysis for the histology tissue and in vitro tissue images, respectively. The most discriminative metrics from the histology samples, in vitro samples and shared discriminative metrics are shown in figures 5a, 5b and 5c respectively. Numbers refer to feature numbers in Table 3.
We grouped the metrics into subcategories that describe particular aspects of structural organization. The percentage of isolated points and number of central points reflect the overall compactness of a cellgraph, as shown in Figure 6. The compactness metrics can quantify changes in cell density over time that we represented in the biological images in the top right of Figure 6. The change in cell density from low to high results in higher compactness and is captured by an increase in number of central points and a decrease in the percent of isolated points.
Figure 6. The most significant metrics determined from the normalized tensor analysis describe the compactness, clustering and uniformity properties of tissue structure. The diagrams on the left illustrate the metrics described in the central column. Representative images in the right column show variation for the corresponding metrics from the left column. The final row gives examples of the images analyzed in this study to show how it is difficult to quantify the important metrics by eye.
The second subcategory of descriptive metrics, number of connected components and clustering coefficient D, capture the extent of cell clustering in a sample. As seen in the representative biological images in the clustering row of Figure 6, samples with discrete clusters have a high number of connected components and a high clustering coefficient. On the contrary, uniformly distributed cells (nonclustering) have a low number of connected components and a low clustering coefficient, i.e. a majority of the cells in the sample are connected within a single connected component. The standard deviation of edge lengths describes the consistency in the distance distribution between the nuclei, thus establishing the level of uniformity in the sample. A sample with uniformdense clusters, as shown in the lower right of Figure 6, results in a low standard deviation edge of lengths and high uniformity. Alternatively, a sample with a disperse cell cluster distribution yields a high standard deviation edge of lengths and a lower uniformity.
The biological images from Figure 6 represent the extremes of the three subcategories of metrics, compactness, clustering and uniformity. In reality, the hydrogel architecture of different cell types typically lies between the extremes of each metric, and changes over time as the structure develops. The last row in Figure 6 gives examples of the variety of visually complex patterns of cell nuclei, from 5 different cell types, analyzed as part of this study. In these instances, visual inspection of hydrogel architecture images does not distinguish between the cell types and time points. Therefore, we used the cellgraphs and quantified the changes in metric values over time to differentiate the cell types from each other.
Figure 7 shows that the data trends from five cellgraph metrics are sufficient to distinguish between the hydrogel architectures formed by eleven different cell types. In Figure 7a, the raw data of the five metrics (determined by tensor analysis, Figures 2 and 3) were plotted for each cell type over time. Individual plots for each metric in Figure 7a can be found in Additional files 1, 2, 3, 4 and 5. To directly compare the metric trends between cell type architectures, we generated Figure 7b as a visual representation of the same data in Figure 7a. Figure 7b shows that the metrics for each cell type exhibit a distinct pattern of value changes over time. The patterns indicate both the direction of change (i.e. up arrow, down arrow or flat line) and relative magnitude (i.e. number of arrows). In addition, we performed twosample KolmogorovSmirnov tests between pairs of celllines to investigate if the corresponding metrics belong to the same probability distribution function. For each pair of celllines, the test is performed over the five most significant features. If the two celllines come from the same probability distributions, the result of the test is 0, and 1 otherwise. The results of the five tests are combined by logical OR operation. Figure 7c shows the results for the 11 cell lines used in our experiment at 10% significance level. It is clearly seen that most of the celllines belong to different probability distributions that the influential metrics are effective in distinguishing between the different celltypes. From this large set of data, only the data from the closely related AU565 and MB231 breast cancer cells to lack statistical significance. This data is capable of discriminating all cell lines from each other except between AU565 and MB231.
Figure 7. The most significant metrics capture structural differences to generate a unique metric profile for each cell type. 7a plots the raw data and standard deviation bars of the most important metrics from the generated cellgraphs for each cell type over time. Due to the scale of the graphs in 7a it is difficult to see small changes in metric values, however these changes are captured by the percent changes shown in 7b. 7b was generated by first calculating the averages of the data points in 7a at hour 10 and 16 for each sample as well as the averages for the data points at hours 120 and 168 (the first and last two time points in the graphs, respectively). These averages were then used to determine the percent change of each metric for each cell type over time. The key to 7b shows how arrows represent varying degrees of percent change in the table. Figure 7c shows the results of the combination of twosample KolmogorovSmirnov test results for the five most significant metrics. The cellline pairs that belong to similar probability distributions are shown with black squares. Note that the celllines are in exact agreement with themselves.
Additional file 1. Figure S1 Raw data plots for the number of central points metric. Shows the raw data for the number of central points metric plotted for each cell type individually over time.
Format: JPEG Size: 159KB Download file
Additional file 2. Figure S2 Raw data plots for the clustering coefficient D metric. Shows the raw data for the clustering coefficient D metric plotted for each cell type individually over time.
Format: JPEG Size: 159KB Download file
Additional file 3. Figure S3 Raw data plots for number the average connected component size metric. Shows the raw data for the average connected component size metric plotted for each cell type individually over time.
Format: JPEG Size: 161KB Download file
Additional file 4. Figure S4 Raw data plots for the percentage of isolated points metric. Shows the raw data for the percentage of isolated points metric plotted for each cell type individually over time.
Format: JPEG Size: 147KB Download file
Additional file 5. Figure S5 Raw data plots for the standard deviation of edge lengths metric. Shows the raw data for the standard deviation of edge lengths metric plotted for each cell type individually over time.
Format: JPEG Size: 152KB Download file
The first six cell types listed in Figure 7b are of epithelial origin (breast and prostate cells) representing a range of cancer grades from precancerous to metastatic. Each has a unique metric profile. The standard deviation of edge lengths metric values distinguish the MCF10A (precancerous) breast epithelial cells from the AU565 breast cancer cells because they trend in opposite directions over time. Like the AU565 cells, the MCF7 cells also show similar trends to the MCF10A cells for the percentage of isolated points and average connected component size metrics. However, in addition to the opposing standard deviation of edge lengths trend that distinguishes the AU565 from the MCF10A cells, the MCF7 cells also show an opposing decreasing trend in the number of central points. The metric trends for MB231 cells also differ from those for MCF10A. While the uniformity metric trends for MB231 cells resemble those for the MCF7 cells, the metrics that capture clustering and compactness show opposite trends. Compared to the other breast cancer cells, the percentage of isolated points and average connected component size for MB231 cells show opposite trends. Interestingly, the nontumorigenic RWPE1 prostate cells and the MCF10A breast cells have nearly identical metric trends with only a slight difference in the magnitude of the average connected component size. Likewise, the metric changes between nontumorigenic RWPE1 prostate cells and metastatic DU145 prostate cells are similar to those seen between the nontumorigenic MCF10A breast cells and the breast cancer lines.
The nontumorigenic NHA cells and the cancerous U118 cells are glial cells from brain tissue origin. The brain hydrogels exhibit a pattern of metric trends which differentiates them from the hydrogels of other cell/tissue types in this study.
Although the pattern of metric trends is similar in both of the brain hydrogels, the NHA cells are distinguishable from their cancerous counterpart (U118) due to the opposite trend in the number of central points and the magnitude change in the percentage of isolated points. Both of the metrics that distinguish between the nontumorigenic and cancerous brain cells are measures of compactness. Similar to the brain cells, the representative bone hydrogel architecture (NHOst and MG63) have a distinct set of metric trends, which differentiate them from the other tissue types in the study. The NHOst and MG63 are distinguishable from each other due to the magnitude of the average connected component size, a measure of clustering. Interestingly, the DU145 cells (metastatic prostate epithelial) show similarity between the bone cells (NHOst and MG63) and the fibroblasts (hDFB). The only variations between the DU145 and bone cells are the trends of the standard deviation of edge lengths. The DU145 compared to the hDFB only show different trends in the number of central points. Similarly, the MB231 cell line (metastatic breast epithelial) shows the same pattern of metric trends as the hDFB (fibroblasts), with differences in magnitude of change and slight variation in the clustering coefficient D and standard deviation of edge lengths.
Discussion
A hallmark of all complex tissues is carefully organized cell and ECM architecture. We believe this architecture is determined, at least in part, by a set of organizational "rules" that determine how cells orient with respect to each other. According to our model, both damaged and cancerous tissues exhibit architectures that deviate significantly from the nontumorigenic state dictated by these rules, but it is very difficult to quantify changes in these rules by eye. Teasing out the characteristic differences between different functional states in a tissue thus benefits from identifying and understanding the biological foundation for these rules.
This study represents the first attempt at defining these rules, by assigning rigorous quantitative metrics to architectural properties of 3D hydrogels containing distinct cell types. 3D collagenI hydrogels provide elements of tissue structure which are not obtainable in traditional 2D histology imaging. In this model system, cells from diverse tissue origins interact differently with the collagenI ECM and each other, resulting in a range of tissue architectures over time. The features extracted from the cellgraphs of 3D confocal images of cell nuclei from the hydrogels are analyzed using Tucker3 model to extract signature graph features. While it is very difficult to quantify important metrics from our images by eye, our computational approach uncovers hidden relationships in these images to discriminate between cell types in 3D, over time.
Our method improves upon histopathological image analysis using nuclear distancebased cellgraphs [13] to include more aspects of tissue structurefunction relationships. Comparison of the Singular Value Decomposition analysis of our 2D histology data and tensor analysis of our 3D in vitro feature sets revealed partial overlap of the most significant discriminating metrics. In Figure 5a, the average degree metric represents the connectivity and compactness of the 2D histology samples is the most significant for distinguishing between tissue types. The most significant metric in Figure 5b, number of connected components, characterizes the clustering of the sample. The overlapping metrics in Figure 5c show that histology and in vitro samples share metrics that characterize the compactness, clustering and uniformity of cellular structure organization in order to distinguish between tissue types.
With the metrics determined by tensor analysis, we were able to distinguish multiple functional states of tissues based solely on their nuclear organization in a 3D collagenI hydrogel. Using the metric profiles for each cell type (Figure 7b), we are able to discriminate different grades of breast and prostate cancer due to a variety of characteristic differences in trends between cell types. The profiles also successfully distinguish nontumorigenic brain and bone tissue organization from their cancerous counterparts. However, it is only a change in magnitude of the average connected component size that is able to distinguish between the nontumorigenic and cancerous bone cells. In the future, we will seek to identify new metrics that better distinguish the differences between mesenchymal tissues.
Our findings present an intriguing possibility, that the data in this study may be capturing features of the epithelial to mesenchymal transition (EMT). EMT is defined as a cellular change from epithelial phenotype to mesenchymal phenotype, involving a loss of adherens junctions, change in intermediate filament expression, and an increase in cell mobility [4649]. These cellular changes tend to result in a more aggressive, metastatic cancer. While EMT is a characteristic of epithelial tumor progression, it is difficult to quantify using structural changes or molecular markers[50].
In this study, we have included cell types which represent varying stages of EMT, based on their protein expression profile. The breast cancer cell types (MCF10A, AU565, MCF7, and MB231) represent progressive cancer grades from precancerous to metastatic (respectively). Interestingly, our metric profiles capture differences in metric trends between each cell type. The first change between MCF10A and AU565 represents a change in uniformity of cell distribution. The AU565 and MCF7 cell organizations differ by a change in the trend for the number of central points metric, representing a change in the clustering of the cells within tissue architectures. MB231 are further discriminated from the MCF7 cells by an increase in the compactness of the tissue, as demonstrated by the change in number of central points and percentage of isolated points. MB231 also shows a change in average connected component size trend compared to the other breast cancer lines. In addition, the MB231 metric profile shares similar trends as the mesenchymal fibroblast cell line as opposed to it's' breast cancer counterpart, MCF10A. The DU145 cells show remarkably similar metrics to both the osteogenic (NHOst, MG63) and fibroblast cells, with a change in only one metric trend between them. The resemblance in trends between the MB231 and DU145 cells with the mesenchymal tissue organizations (particularly the osteogenic lines NHOst and MG63) may reflect the frequency with which breast and prostate cancer metastasizes to bone.
Conclusions
Collectively, our findings demonstrate that our threedimensional cellgraph methodology is capable of discriminating between structural patterns of cellular organization in model tissues representing different grades of tumor progression and tissue origin that cannot be quantified by eye. The distinguishing features are based on threemode tensor analysis of graph theoretical properties calculated for each cell type over time. By extending the sensitivity of image analysis and tissue modeling to uncover diagnostic, hidden, temporospatial relationships between cells in model tissues, we feel this is a significant step towards enriching diagnostic profiles for disease. Such enhanced profiles have the potential to improve diagnostic accuracy and identify hidden traits that may suggest new therapeutic interventions.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
LMP and KH carried out the generation and maintenance of cellular 3D collagen I hydrogel cultures, collected confocal fluorescent microscopy images, participated in the design of the study, generation of figures and tables, analysis and interpretation of data, draft, and revision of the manuscript. BO carried out the generation and analysis of 2D histology image and contributed to the revising of the manuscript. CB carried out the 3D fluorescent image segmentation, cellgraph generation, and metric extraction. BY carried out the tensor analysis and contributed to drafting the methods and revising manuscript. GP participated in the design of the study, analysis and interpretation of data, and draft of the manuscript. All authors read and approved the final manuscript.
Acknowledgements
This work was partially supported by National Institutes of Health Grant #RO1 EB008016.
References

Burger P, Scheithauer B, Vogel FS: Surgical Pathology of the Nervous System and Its Coverings. Volume Chapter 4. Fourth edition. New York: Churchill Livingstone; 2002.

Humphrey PA, Andriole GL: Prostate cancer diagnosis.
Mo Med 2010, 107(2):107112. PubMed Abstract

Albertsen PC: The unintended burden of increased prostate cancer detection associated with prostate cancer screening and diagnosis.
Urology 2010, 75(2):399405. PubMed Abstract  Publisher Full Text

You J, Cozzi P, Walsh B, Willcox M, Kearsley J, Russell P, Li Y: Innovative biomarkers for prostate cancer early diagnosis and progression.
Crit Rev Oncol Hematol 2010, 73(1):1022. PubMed Abstract  Publisher Full Text

Jensen AJ, Naik AM, Pommier RF, Vetto JT, Troxell ML: Factors influencing accuracy of axillary sentinel lymph node frozen section for breast cancer.
Am J Surg 2010, 199(5):629635. PubMed Abstract  Publisher Full Text

Weinmann AL, Hruska CB, O'Connor MK: Design of optimal collimation for dedicated molecular breast imaging systems.
Med Phys 2009, 36(3):845856. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Jasani B, DouglasJones A, Rhodes A, Wozniak S, BarrettLee PJ, Gee J, Nicholson R: Measurement of estrogen receptor status by immunocytochemistry in paraffin wax sections.
Methods Mol Med 2006, 120:127146. PubMed Abstract

Doyle S, Feldman M, Tomaszewski J, Madabhushi A: A Boosted Bayesian MultiResolution Classifier for Prostate Cancer Detection from Digitized Needle Biopsies.

Rosenkrantz AB, Kopec M, Kong X, Melamed J, Dakwar G, Babb JS, Taouli B: Prostate cancer vs. postbiopsy hemorrhage: diagnosis with T2 and diffusionweighted imaging.
J Magn Reson Imaging 2010, 31(6):13871394. PubMed Abstract  Publisher Full Text

Demir C, Gultekin SH, Yener B: Learning the topological properties of brain tumors.
IEEE/ACM Trans Comput Biol Bioinform 2005, 2(3):262270. PubMed Abstract  Publisher Full Text

Bilgin C, Demir C, Nagi C, Yener B: Cellgraph mining for breast tissue modeling and classification.
Conf Proc IEEE Eng Med Biol Soc 2007, 2007:53115314. PubMed Abstract  Publisher Full Text

Bilgin CC, Bullough P, Plopper GE, Yener B: ECMAware CellGraph Mining for Bone Tissue Modeling and Classification.
Data Min Knowl Discov 2009, 20(3):416438. PubMed Abstract  PubMed Central Full Text

Gurcan M, Boucheron L, Can A, Madabhushi A, N. R, Yener B: Histopathological image analysis: A review.

Lund AW, Bilgin CC, Hasan MA, McKeen LM, Stegemann JP, Yener B, Zaki MJ, Plopper GE: Quantification of spatial parameters in 3D cellular constructs using graph theory.
J Biomed Biotechnol 2009, 2009:928286. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gunduz C, Yener B, Gultekin SH: The cell graphs of cancer.
Bioinformatics 2004, 20(Suppl 1):i145151. PubMed Abstract  Publisher Full Text

Demir C, Gultekin SH, Yener B: Augmented cellgraphs for automated cancer diagnosis.
Bioinformatics 2005, 21(Suppl 2):ii712. PubMed Abstract  Publisher Full Text

Bilgin C, Shayoni R, Dayley W, Baydil B, Sequeira S, Yener B, Larsen M: Cellgraph modeling of salivary gland morphology.
IEEE International Symposium on Biomedical Imaging: From Nano to Micro 2010.

Barabasi AL: The New Science of Networks. 1st edition. Perseus Books Group; 2002.

Shavitt Y, Tankel T: BigBang simulation for embedding network distances in Euclidean space.

Gunduz C, Yener B: Accuracy and sampling tradeoffs for inferring Internet router graph. Rensselaer Polytechnic Institute; 2003.

Faloutsos M, Faloutsos P, Faloutsos C: On powerlaw relationships of the Internet topology.
Comp Comm R 1999, 29(4):251262. Publisher Full Text

Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J: Graph structure in the Web.
Comput Netw 2000, 33(16):309320. Publisher Full Text

Newman MEJ: Who is the best connected scientist? A study of scientific coauthorship networks.

Wasserman SKF: Social network analysis:methods and applications. Cambridge UK: Cambridge University Press; 1994.

Liljeros F, Edling CR, Amaral LA, Stanley HE, Aberg Y: The web of human sexual contacts.
Nature 2001, 411(6840):907908. PubMed Abstract  Publisher Full Text

Goldberg MPH, MagdonIsmail M, Riposo J, Siebecker D, Wallace W, Yener B: Statistical modeling of social groups on communication networks.
Pittsburgh PA: First conference of the North American Association for Computational Social and Organizational Science 2003.

Wuchty SER, Barabasi AL: The architecture of biological networks. New York: Kluwer Academic Publishing; 2003.

Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The largescale organization of metabolic networks.
Nature 2000, 407(6804):651654. PubMed Abstract  Publisher Full Text

Jeong SPM, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks.

Watts D, Strogatz S: Collective dynamics of smallworld networks.

Hillisch A, Hilgenfeld R: Modern methods of drug discovery. Basel; Boston: Birkhäuser Verlag; 2003.

Justice BA, Badr NA, Felder RA: 3D cell culture opens new dimensions in cellbased assays.
Drug Discov Today 2009, 14(12):102107. PubMed Abstract  Publisher Full Text

Schindler M, Nur EKA, Ahmed I, Kamal J, Liu HY, Amor N, Ponery AS, Crockett DP, Grafe TH, Chung HY, et al.: Living in three dimensions: 3D nanostructured environments for cell culture and regenerative medicine.
Cell Biochem Biophys 2006, 45(2):215227. PubMed Abstract  Publisher Full Text

Yamada KM, Cukierman E: Modeling tissue morphogenesis and cancer in 3D.
Cell 2007, 130(4):601610. PubMed Abstract  Publisher Full Text

Otsu N: A thresholding selection method from graylevel histogram.
IEEE Transactions on Systems, Man, and Cybernetics 1979, 9(1):6266.

Tucker LR: Some Mathematical Notes on 3Mode Factor Analysis.
Psychometrika 1966, 31(3):279279. PubMed Abstract  Publisher Full Text

Tucker LR: Implicaitons of factor analysis to threeway matrices of measurement of change. In Problems In Measuring Change. Madison: Madison: The University of Weisconsin Press; 1963:122137.

Tucker L: The extension of factor analysis to threedimensional matrices.

Harshman RA: Foundations of the PARAFAC procedure: Modesl and conditions for an explanatory multimodal factor analysis.

Golub GH, Van Loan CF: Matrix computations. 3rd edition. Baltimore: Johns Hopkins University Press; 1996.

Carroll JD, Chang JJ: Analysis of Individual Differences in Multidimensional Scaling Via an NWay Generalization of EckartYoung Decomposition.
Psychometrika 1970, 35(3):283. Publisher Full Text

Bro R, Smilde AK: Centering and scaling in component analysis.
J Chemometr 2003, 17(1):1633. Publisher Full Text

PLS Toolbox 4.0 for use with MATLAB [http://software.eigenvector.com] webcite

Iwatsuki M, Mimori K, Yokobori T, Ishi H, Beppu T, Nakamori S, Baba H, Mori M: Epithelialmesenchymal transition in cancer development and its clinical significance.
Cancer Sci 2010, 101(2):293299. PubMed Abstract  Publisher Full Text

Weigelt B, Peterse JL, van 't Veer LJ: Breast cancer metastasis: markers and models.
Nat Rev Cancer 2005, 5(8):591602. PubMed Abstract  Publisher Full Text

Polyak K, Weinberg RA: Transitions between epithelial and mesenchymal states: acquisition of malignant and stem cell traits.
Nat Rev Cancer 2009, 9(4):265273. PubMed Abstract  Publisher Full Text

Yang J, Weinberg RA: Epithelialmesenchymal transition: at the crossroads of development and tumor metastasis.
Dev Cell 2008, 14(6):818829. PubMed Abstract  Publisher Full Text

Cardiff RD: The pathology of EMT in mouse mammary tumorigenesis.
J Mammary Gland Biol Neoplasia 2010, 15(2):225233. PubMed Abstract  Publisher Full Text  PubMed Central Full Text
Prepublication history
The prepublication history for this paper can be accessed here: