Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Twelfth International Conference on Bioinformatics (InCoB2013): Computational Biology

Open Access Research

Mining the tissue-tissue gene co-expression network for tumor microenvironment study and biomarker prediction

Yang Xiang*, Jie Zhang and Kun Huang*

Author Affiliations

Department of Biomedical Informatics, The Ohio State University, Columbus OH, USA

For all author emails, please log on.

BMC Genomics 2013, 14(Suppl 5):S4  doi:10.1186/1471-2164-14-S5-S4


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/14/S5/S4


Published:16 October 2013

© 2013 Xiang et al.; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Recent discovery in tumor development indicates that the tumor microenvironment (mostly stroma cells) plays an important role in cancer development. To understand how the tumor microenvironment (TME) interacts with the tumor, we explore the correlation of the gene expressions between tumor and stroma. The tumor and stroma gene expression data are modeled as a weighted bipartite network (tumor-stroma coexpression network) where the weight of an edge indicates the correlation between the expression profiles of the corresponding tumor gene and stroma gene. In order to efficiently mine this weighted bipartite network, we developed the Bipartite subnetwork Component Mining algorithm (BCM), and we show that the BCM algorithm can efficiently mine weighted bipartite networks for dense Bipartite sub-Networks (BiNets) with density guarantees.

Results

We applied BCM to the tumor-stroma coexpression network and find 372 BiNets that demonstrate statistical significance in survival tests. A good number of these BiNets demonstrate strong prognosis powers on at least one breast cancer patient cohort, which suggests that these BiNets are potential biomarkers for breast cancer prognosis. Further study on these 372 BiNets by the network merging approach reveals that they form 10 macro bipartite networks which show orchestrated key biological processes in both tumor and stroma. In addition, by further examining the BiNets that are significant in ER-negative breast cancer patient prognosis, we discovered a ubiquitin C (UBC) gene network that demonstrates strong prognosis power in nearly all types of breast cancer subtypes we used in this study.

Conclusions

The results support our hypothesis that the UBC gene network plays an important role in breast cancer prognosis and therapy and it is a potential prognostic biomarker for multiple breast cancer subtypes.

Introduction

The initiation, development and metastasis of tumor are complicated biological processes. The tumor microenvironment (TME), which surrounds the tumor immediately with secreted proteins, small signaling molecules, blood vessels, and normal cells, plays an essential role in each step. Tumor and its microenvironment consist of diverse cell types. For instance, for epithelial type of cancers, besides the epithelial cells, the TME includes fibroblast, endothelial cell, macrophage, and etc. All of them play critical roles in the formation and development of tumor [1]. In addition, recently it has been shown that genetic changes in the stroma (e.g., in fibroblast) can lead to the development of epithelial tumor [2]. Therefore, an important issue in cancer research is to understand how TME components interact with the tumor. It has been suggested that such interaction is mediated by extracellular molecules coded by the so-called stromal genes including signaling molecules such as cytokines/chemokines, structural molecules such as collagens (and the associated receptors such as DDR2) and extracellular proteinase such as metalloprotease (MMPs). It has been shown that some of these stromal genes may serve as important biomarkers to predicting drug responses for ER-negative breast cancer patients which are usually considered to have poor prognosis [3]. Recent research results provide further evidences that tumor-stroma interaction plays an important role in breast cancer tumor growth [4,5]. However, despite these progresses and intensive research efforts, many issues still remain unclear, including how such interactions lead to the intracellular changes in tumor and TME components.

Recently there has been a study using tissue-tissue gene co-expression network to characterize the interactions and the corresponding intracellular effects in obesity study [6]. Basically by identifying gene clusters that show high levels co-expression between different tissues, researchers discovered orchestrated biological processes between different tissues without the need to explicitly characterize the intercellular signaling mechanisms. In this paper, we adopted this approach to study the gene co-expression between tumor and its microenvironment in breast cancer. Specifically, we used a public gene expression microarray dataset consisting of 47 breast cancer biopsy samples, in which tumor and the matching surrounding stroma (TME) are isolated by laser capture microdissection (LCM) technology. The gene expression profiles in this dataset are generated for tumor and stroma separately for every sample [7]. By mining the tightly correlated gene expression profiles between the matching tumor and stroma, we identify dense networks of putative gene interactions between the two tissues (tumor and stroma). Our goals are to characterize orchestrated biological processes between the two tissues through the identified gene-gene communications/interactions, and at the same time to identify potential new biomarkers for breast cancer prognosis or treatment prediction.

From the bioinformatics point of view, our project falls into the category of gene co-expression network (GCN) analysis. GCN analysis has been shown to be very effective in discovering new gene functions [8], predicting disease biomarkers [9] and identifying disease genes [10]. However, most of GCN analysis methods focuse on a single type of sample. For tissue-tissue GCN, the problem was formulated as a bipartite graph mining problem in [6] in which a heuristic algorithm was used on a thresholded binary bipartite graph.

Mining dense components from bipartite graphs is a fundamental research problem in related fields. A simplest version of this problem is to find just one maximum clique in an unweighted bipartite graphs. Even for this simplest version, it was proved [11] to be an NP-hard problem. To tackle this problem, a few dense component mining algorithms, e.g. [12-15], having been proposed for unweighted bipartite graphs, for which many efficient pruning techniques are available. However, in biomedical research, many data are in the form of weighted bipartite graphs. Since the correlation coefficients between gene expression profiles can be used as weights of the edges in the graph, we expect a weighted bipartite graph mining approach would provide much more information on the gene-gene crosstalk between different tissue types. Given the successful cases of mining weighted network data [16,17], we want to extend our work to mine weighted bipartite networks in matching gene expression data from different tissue types. As a result, in this paper we propose a novel weighted Bipartite network Component Mining algorithm BCM which guarantees a lower bound on the densities of the identified components, i.e., Bipartite sub-Networks (BiNets). We tested and validated the prognosis power of identified BiNets on three separate breast cancer microarray studies. In addition, the results of BCM can be further summarized by our network merging approach which also guarantees a lower bound on the densities of summarized macro networks. We would like to point out that although clustering-based approaches such as [10,18,19] for gene co-expression network may be extended to handle weighted bipartite networks, our approach has clear advantages on exploring these networks for biomarker prediction. This is because BCM allows shared genes between BiNets and can find small dense BiNets that are suitable for biomarker prediction. At the same time, our approach is able to merge BiNets into Macro Bipartite Networks for understanding the general structure of the bipartite networks. Shared genes may still exist between Macro Bipartite Networks. In contrast, the clustering based approaches do not allow shared genes between two clusters and the clusters identified are often too large to find small gene networks with subtle functions.

Results

BiNets in tumor-stroma co-expression network

Using the BCM algorithm described in the Materials and Methods, we obtained 826 BiNets with a bounded density. Among them, 422 contain at least 10 distinct genes. These 422 BiNets were then subjected to survival analysis on five different breast cancer patient cohorts, i.e., the entire patients in the Netherlands Cancer Institute (NKI) dataset [20,21], the Lymph-Node-positive (LN-positive) patients in the NKI dataset, the Estrogen-Receptor-negative (ER-negative) patients in the NKI dataset, the entire patients in the GSE1456 (Stockholm) dataset [22], and the entire patients in the GSE2034 (Wang) dataset [23,24]. The results showed that 372 BiNets have significant prognostic power (p-values <0.05 from log-rank test) in at least one patient group. The percentage (372/422 ≈ 88.2%) demonstrates the effectiveness of mining tumor-stroma co-expression network using BCM. The number of BiNets with p-value less than 0.05 for each patient cohort is listed in Table 1, from which we can also observe that survival tests on three patient cohorts (NKI, NKI LN-Positive, GSE1456) yield a minimum p-value no more than 7.466e − 08.

Table 1. Log-rank test summary

To obtain a macro view on these BiNets, we further merge the identified BiNets into larger clusters. Figure 1 shows the dendrogram of merging the 372 BiNets using [25]. At a density boundary of 0.3, the merging algorithm yields 10 macro bipartite networks and we further apply Gene Ontology enrichment analysis on these clusters by Toppgene (http://toppgene.cchmc.org/enrichment.jsp webcite). Table 2 describes the most enriched GO term for each macro bipartite network.

thumbnailFigure 1. Merging bipartite networks. Merge the 372 BiNets into ten macro bipartite networks. The colors are for distinguishing different macro bipartite networks.

Table 2. GO ontology enrichment analysis for ten macro bipartite networks

BiNets as potential biomarkers for breast cancer prognosis

For each patient group or subtype, we have identified a number of BiNets with p-value less than 0.05 in log rank test as shown in Table 1. Many of them are good candidates for breast cancer prognosis in the corresponding patient group or subtype. In the past we have also successfully identified potential biomarkers for such patient groups [16,25]. However, identifying good biomarkers for prognosis on ER-negative breast cancer remains a challenge. In this work, we successfully identified several BiNets with strong ER-negative prognosis power. Among them, two BiNets (BiNet 52 and BiNet 228) have both low p-values and well-separated survival curves (Figure 2). We searched for interactions for genes from the two BiNets in IPA Knowledge Base, and found both BiNets contain genes surrounding the gene UBC, although UBC is not included in either BiNet. We wanted to find out if a combination of the two BiNets will reinforce their prognostic power in the survival test. Thus, we conducted another survival test on the combined gene list of the two BiNets plus UBC. It resulted in an even better separation of the ER-negative patient outcomes with a p-value of 4.924E − 5, as shown in Figure 2(d), whereas the breast cancer prognosis benchmark van't Veer-70 genes virtually has no prognosis power at all. By further examining the interactions among the genes in this BiNet using IPA (Figure 3), we obtain a gene interaction network centered on the UBC gene that possesses a strong prognosis power in survival tests on nearly all types of patient groups tested in this work (Figure 4).

thumbnailFigure 2. Survival tests on NKI ER-Negative. The survival test on NKI ER-Negative patients using (a) well-established 70-gene signature from [32], (b) Genes in BiNet 52 "C11orf51, DAP, EBP, HOMER2, LOC100129361, MAGT1, NDUFS6, NUDT21, PEX3, SDHA, SLC3A2", (c) Genes in BiNet 228 "C4BPB, CCR10, CKM, CPS1, CYP2F1, GPR6, GUCY1A2, HAUS6, HPD, HYAL1, PGAM2, PLA1A, PPP1R14D, PROC, REC8, SERPINA6, SFTPA2, STXBP5L, SYNPO2L, TGFB2, TPTE, VASH2", (d) the union of gene lists (b) and (c) plus gene UBC. Blue lines are the survival curves of good survival groups. Red lines are the survival curves of poor survival groups.

thumbnailFigure 3. IPA network visualization of BiNets 52 and 228. A network found by analyzing the combined network of BiNets 52 and 228 using IPA. The sub network within the red circle is the UBC network whose survival test result is shown in Figure 4.

thumbnailFigure 4. Survival tests of the UBC network. The survival results of UBC Network (containing genes "UBC, DAP, CPS1, GUCY1A2, TPTE/TPTE2, NDUFS6, SDHA, NUDT21, HAUS6, PGAM2") on (a) All patients in NKI dataset (b) LN-Positive patients in NKI dataset, (c) ER-Negative patients in NKI dataset, (d) All patients in GSE2034 datasets, (e) All patients in GSE1456 dataset, (f) ER Negative patients in GSE2034 dataset. Blue lines are the survival curves of good survival groups. Red lines are the survival curves of poor survival groups.

Discussion and conclusion

As shown in Table 2 the ten macro bipartite networks cover many key biological processes including cell cycle, immune response, cell-cell signaling, respiratory electron transport chain, and defense response to virus. Despite the size difference between the tumor side and the stroma side in these macro bipartite networks, top enriched Biological Process (BP) terms are often shared between the two sides. This indicates that the underlying biological processes are synchronized between tumor and stroma, presumably via cell-cell signaling mechanisms.

An interesting exception is the 3-rd macro bipartite network. The genes in the stroma side are enriched with the biological process of "development of primary male sexual characteristics" and "male sex differentiation". These male-gender specific GO terms seem to be inconsistent with the fact that the data were obtained from female breast cancer patients. A detailed inspection on the genes in this macro bipartite network indicates that it actually contains several key sex hormone related genes such as ESR1 (estrogen receptor α) and AR (androgen receptor) as well as ERBB4. These genes are all well known for their involvement with breast cancer prognosis [26-28]. The fact that the gene expression ESR1 shows high correlation between tumor and stroma suggests that estrogen, an important factor in breast cancer development, not only affects the tumor epithelial cells but may also affect the stroma cells in similar ways. Since ESR1 is a target for breast cancer drugs such as tamoxifen, it is thus important to study the effect of the drugs on the stroma cells such as fibroblast in addition to the cancer cells. Therefore a more comprehensive characterization of the drug effects and mechanisms can be pictured.

As shown in Table 1, there are a good number of BiNets that can separate patient cohorts from different breast cancer microarray studies into two subgroups with significant differences. Among them, some can achieve highly significant prognosis with very small p-values. In the past studies we have also successfully identified gene lists that demonstrate good prognostic power in survival tests [16,25]. But such discoveries on ER-negative patients are quite limited. Thus in this work, we are particularly interested in finding BiNets that are potential biomarkers for ER-negative patients.

The strong prognostic power of the combined BiNets on ER-Negative patients led us to hypothesize that UBC and its interacting genes play an important role in breast cancer prognosis. To test our hypothesis, we extracted a UBC network, which consists of the UBC gene and directly interacting genes (i.e., genes that have PPI with the UBC gene in BiNets 52 and 228), as shown in the red circle of Figure 3. Then we applied this gene network to survival analysis on all 5 patient cohorts (NKI ALL, NKI LN-Positive, NKI ER-Negative, GSE2034, GSE1456), and it generated p-values less than 0.05 in all of them (Figure 4(a-e)). We also tested it on the ER-negative group of GSE2034, and we also get a p-value quite close to 0.05 (Figure 4(f)). To the best of our knowledge, this is the first report of discovering a gene list that has significant prognosis results on all major subtypes of breast cancer and their mixture.

Our observation is further supported by the recent research on ubiquitin and cancers. Ubiquitin is a small regulatory protein that can be attached to proteins and label them for destruction. UBC is the gene encodes Ubiquitin C protein. It is known that many proteins studied by clinical breast cancer researchers, such as cyclins, CDK inhibitors, and the SCF in cell cycle control, are involved in ubiquitin pathways [29]. In addition, Mani and Gelmman [30] discovered that ubiquitin plays a critical role in protein degradation pathways, which are targets for cancer therapy. Our discovery provides biologists and clinicians an additional promising hypothesis that the UBC gene network is effective in the prognosis of multi types of breast cancers. Based on the previous discoveries [29,30], we conjecture that the UBC gene network is also a promising target for cancer therapy.

In summary, we developed a bipartite subnetwork component mining algorithm BCM for weighted bipartite graphs and applied it to mine the interaction networks between the breast cancer tumor and its microenvironment. Our results reveal highly coordinated biological processes such as cell cycle and immune responses between tumor and stroma. In addition, we identified potential biomakers which can perform very well on the ER-negative type of breast cancer prognosis.

Materials and methods

Datasets

GSE5847 was used to construct the tumor-stroma coexpression network. NKI dataset [20,21], GSE1456 (Stockholm) dataset [22], and GSE2034 (Wang) dataset [23,24] were used to perform survival tests.

Microarray data processing

Gene expression microarray dataset GSE5847 was obtained from the NCBI Gene Expression Omnibus. It contains 95 gene expression profiles on Affymetrix HU133 Plus 2.0 genechip from 48 patients. Out of them, 47 pairs of matched tumor and stroma samples were used in our study. Since the tumor and the stroma datasets are normalized separately, an additional linear global normalization between microarray data for the two tissues were performed. This normalization does not change the rank or linear relationship between any pair of genes except to match the median gene expression levels from all the probes.

Genes with small (<20%) variation in expression profiles were excluded, since low variation will lead to bias in correlation coefficient computing. Only probes with available matched gene names were used.

Construction of tissue-tissue gene co-expression network

Given a set of K samples with two types tissues, we compute the Pearson correlation coefficient (ρi,j) between any gene gi in tissue 1 and gene gj in tissue 2. As shown in Figure 5, a bipartite graph between the two tissues can thus be established. In this graph, nodes are the genes in both tissues. The weight of an edge is defined as the Pearson correlation coefficient (ρ) between the expression profiles for the two genes connected by the edge (for easy visualization, we did not show all the edges in Figure 5). Our goal is to identify densely-connected-bipartite components (i.e., bipartite sub networks) of the weighted bipartite graph.

thumbnailFigure 5. Bipartite graph of the tissue-tissue network. An illustration of the formulation of the tissue-tissue network as a weighted bipartite graph. For clarity of the figure, we do not show all the edges. Bipartite sub-networks in the two circles are examples of BiNets.

Bipartite sub network mining with bounded density gurantee

Let <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M1">View MathML</a> denote a bipartite graph with set <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M2">View MathML</a> of vertices on one side, and set <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M3">View MathML</a> of vertices on the other side. E is the set of edges connecting vertices between <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M2">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M3">View MathML</a>. <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M4">View MathML</a> is a BiNet of a bipartite graph <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M5">View MathML</a> if and only if <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M6">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M7">View MathML</a>, and E(B) be the set of edges induced by VX and VY on G.

Let w(e) be the weight of an edge e E. Let a = |VX|, b = |VY|. We define the density of B to be: <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M8">View MathML</a>. It is easy to see d(B) is the average weight of edges of B.

For a vertex <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M9">View MathML</a>, we define its density contribution to B by: <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M10">View MathML</a>. Similarly, for a vertex <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M11">View MathML</a>, we define its density contribution to B by: <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M12">View MathML</a>. A key idea of our algorithm is to grow a dense bipartite component by iteratively adding high contribution vertices from either side that can result in a density bound guarantee.

The pseudocode of our bipartite subnetwork component mining algorithm (BCM) is given in Algorithm 1. BCM discovers a BiNet by starting from an unselected edge with weight no less than a threshold. The density of the discovered BiNet is guaranteed to be bounded by a constant factor of the weight of the starting edge. The purpose of starting from an unselected edge is to avoid excessive numbers of highly overlapped BiNet. However, it shall be noted that unlike traditional clustering methods (such as k-means), BCM allows a vertex to be shared by multiple BiNets. It is also necessary to point out that quasi-clique (which resembles a fully connected graph) mining algorithm with bounded density known as QCM is available for weight graphs [31], which hints us to develop BCM. However, the QCM algorithm and its properties are not readily extendable to weighted bipartite graphs.

Algorithm 1 <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M13">View MathML</a>

1: Sort E such that edges in E are ranked in descending order of their weights;

2: Let wmax be the weight of the first edge in E.

3: Selected = ∅

4: for all e = (x, y) ∈ E do

5:     if w(e) < βwmax then

6:         break;

7:     end if

8:     if e Selected then

9:         continue;

10:     end if

11:     <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M14">View MathML</a>;

12:     Create an empty biclique B = (VX, VY);

13:     VX = VX ∪ {x}; VY = VY ∪ {y};

14:     while true do

15:         Pick <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M15">View MathML</a> such that <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M16">View MathML</a> is maximum;

16:         Pick <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M17">View MathML</a>such that <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M18">View MathML</a> is maximum;

17:         if <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M19">View MathML</a>then

18:             if <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M20">View MathML</a>then

19:                 VX = VX ∪ {p};

20:                 Insert into Selected any edge in E connecting p to any vertex in VY;

21:             else

22:                 break;

23:             end if

24:         else

25:             if d(q, B)Y αbd(B) then

26:                 VY = VY ∪ {q};

27:                 Insert into Selected any edge in E connecting q to any vertex in VX;

28:             else

29:                 break;

30:             end if

31:         end if

32:     end while

33:     <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M21">View MathML</a>

34: end for

36: return <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M25">View MathML</a>;

In BCM, we set <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M26">View MathML</a>, and <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M27">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M28">View MathML</a>. We use two parts in the following to show that every bipartite subgraph <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M22">View MathML</a>, which is outputted by Algorithm 1, has a bounded density.

In the first part, we analyze the density ratio between two consecutive steps of BCM. Let <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M29">View MathML</a> be the density of Bab, a transit bipartite subgraph in Algorithm 1, with a number of vertices in VX and b number of vertices in VY. Adding one more vertex v to Bab by Algorithm 1 will make the density of Bab be either f(a + 1, b) or f(a, b + 1). Without loss of generality, let us assume the new vertex v is added to VX and the density of Bab becomes f(a + 1, b). According to Algorithm 1, we have

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M30">View MathML</a>

which is equivalent to:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M31">View MathML</a>

(1)

(1) can be rewritten as

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M32">View MathML</a>

(2)

From (2) we have

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M33">View MathML</a>

(3)

Thus, we have

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M34">View MathML</a>

(4)

With similar analysis, we also have

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M35">View MathML</a>

(5)

Next we show that a bipartite subgraph <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M22">View MathML</a> has a bounded density with respect to the weight of the starting edge. Assume B = (VX, VY, E(B)) where |VX| = s and |VY| = t. Thus, the density of B is f(s, t). According to Algorithm 1, the density of B evolves from f(1, 1) to f(s, t), with a vertex added to either VX or VY in each step. To show that the density of B is bounded, we only need to show that <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M36">View MathML</a> is a constant.

Theorem 1. Let f(s, t) be the density of BiNet B = (VX, VY, E(B)) where |VX| = s and |VY| = t. Let f (1, 1) be the weight of the starting edge for B in Algorithm BCM. Then F = f(s, t)/f(1, 1) is larger than <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M37">View MathML</a>, where C and τ are nonnegative integer parameters.

(See Appendix for proof.)

C and τ are used for tuning the bound. For example, if we choose C = 100 and τ = 1, according to Theorem 1, we have:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M38">View MathML</a>

One can easy to get a large bound by setting a large C and a small τ. For example, when C = 10000 and τ = 0, we have F >0.96 by (14).

Evaluation of gene networks as potential prognostic biomarkers

Once the bipartite graph between the tumor genes and stromal genes was constructed, it was subjected to the BCM algorithm for BiNet discovery. We set the parameters C = 36, τ = 2 which guarantees F >47.4% according to Theorem 1. We also set β = 0.7 to ensure a reasonably large search space. After this step, we map the BiNets back to genes. Each BiNet is also corresponding to one combined gene set which is obtained by union the two separate gene sets in the BiNet into one.

For genes in each BiNet, their potential as breast cancer prognostic biomarker was tested using breast cancer microarray datasets. The primary dataset used for testing is the well known NKI dataset which are composed of 295 patients. To validate the survival analysis results, two more microarray datasets from GEO were used: GSE1456 containing data for 159 breast cancer patients, and GSE2034 dataset containing data for 286 breast cancer patients. The time-to-recurrence information for patients from these two datasets were used in the survival analysis.

In the survival test, genes in each BiNet (including both tumor and stroma sides) are used as features for the patients. The patients are then divided into two groups based on these feature values by K-means algorithm (K = 2, distance=cityblock, repeating 100 times). Log-rank test (publicly available at: http://www.mathworks.nl/matlabcentral/fileexchange/20388 webcite) was used to determine the statistical significance (p-value) between the survival time (or time-to-recurrence) for the two group of patients.

Summarize BiNets into macro bipartite networks by merging

In order to understand the general structure of the tumor-stroma network, BiNets were further summarized into a few macro bipartite networks by SINGEMERGE[25], a network merge algorithm that guarantees merge density. We set the density threshold to be 0.3 and we merged the 372 BiNets into 10 macro bipartite networks which were further subjected to gene ontology enrichment analysis. Figure 1 is the merging dendrogram and the parameters of the 10 macro bipartite networks including their GO analysis are listed in Table 2.

Gene Ontology (GO) analysis

GO enrichment analysis for the gene list from each macro bipartite network is carried out using ToppGene (http://toppgene.cchmc.org/enrichment.jsp webcite), a publicly available web tool. Pathway analysis on selected BiNets is further carried out using Ingenuity Pathway Analysis (https://analysis.ingenuity.com webcite).

Appendix

Proof of Theorem 1

Proof. To facilitate our discuss, we assume it takes n steps to generate B where fk denotes the density of B at step k, e.g., f1 = f(1, 1) and fn = f(s, t). Thus

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M39">View MathML</a>

(6)

For some <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M40">View MathML</a>, the change is on VX thus we apply (4); for others, the change is on VY and we apply (5). Let gi denote the fraction of the new density over the old one when it is the ith time of adding a vertex to VX. Similarly, let hj denote the fraction of the new density over the old one when it is the jth time of adding a vertex to VY. Thus (6) can be rewritten as:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M41">View MathML</a>

(7)

To analyze (7), we first consider <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M42">View MathML</a>, which can be factorized into two parts:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M43">View MathML</a>

(8)

Given <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M44">View MathML</a> and (4), we have:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M45">View MathML</a>

(9)

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M46">View MathML</a>

(10)

Let Mi = (i + 1)2 − 1 and Ni = (i + 1)2, then we have <a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M47">View MathML</a>. Thus, (10) can be further extended as:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M48">View MathML</a>

(11)

Combining (8), (9), and (11), we have

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M49">View MathML</a>

(12)

Using the similar analysis as above, we have:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M50">View MathML</a>

(13)

Combining (12) and (13), we eventually have the bound for F:

<a onClick="popup('http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2164/14/S5/S4/mathml/M51">View MathML</a>

(14)

(To facilitate our analysis, C is set to be larger than (τ + 2)2.)

Competing interest

The authors declare that they have no competing interests.

Authors' contributions

YX led the algorithm design and network analysis. JZ carried out the initial study and led the result analysis from the view of cancer biology. KH led the project including development of the idea, data selection and preprocessing, and the development of the whole workflow. All authors edited the manuscript.

Acknowledgements

The authors would like to thank the National Science Foundation and the National Institutes of Health for their supports.

Declarations

This work was supported in part by the National Science Foundation under Grant #1019343 to the Computing Research Association for the CIFellows Project, and by the National Institutes of Health under Grant NCI R01CA141090.

This article has been published as part of BMC Genomics Volume 14 Supplement 5, 2013: Twelfth International Conference on Bioinformatics (InCoB2013): Computational biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/14/S5.

References

  1. Eng C, Leone G, Orloff M, Ostrowski M: Genomic alterations in tumor stroma.

    Cancer research 2009, 69(17):6759. PubMed Abstract | Publisher Full Text OpenURL

  2. Trimboli A, Cantemir-Stone C, Li F, Wallace J, Merchant A, Creasap N, Thompson J, Caserta E, Wang H, Chong J, et al.: Pten in stromal fibroblasts suppresses mammary epithelial tumours.

    Nature 2009, 461(7267):1084-1091. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Farmer P, Bonnefoi H, Anderle P, Cameron D, Wirapati P, Becette V, André S, Piccart M, Campone M, Brain E, et al.: A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer.

    Nature medicine 2009, 15:68-74. PubMed Abstract | Publisher Full Text OpenURL

  4. Salem AF, Howell A, Sartini M, Sotgia F, Lisanti MP: Downregulation of stromal BRCA1 drives breast cancer tumor growth via upregulation of HIF-1α, autophagy and ketone body production.

    Cell Cycle 2012, 11(22):4-3. PubMed Abstract | Publisher Full Text OpenURL

  5. Barone I, Catalano S, Gelsomino L, Marsico S, Giordano C, Panza S, Bonofiglio D, Bossi G, Covington KR, Fuqua SA, et al.: Leptin mediates tumor-stromal interactions that promote the invasive growth of breast cancer cells.

    Cancer research 2012, 72(6):1416-1427. PubMed Abstract | Publisher Full Text OpenURL

  6. Dobrin R, Zhu J, Molony C, Argman C, Parrish M, Carlson S, Allan M, Pomp D, Schadt E: Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease.

    Genome biology 2009, 10(5):R55. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  7. Boersma B, Reimers M, Yi M, Ludwig J, Luke B, Stephens R, Yfantis H, Lee D, Weinstein J, Ambs S: A stromal gene signature associated with inflammatory breast cancer.

    International Journal of Cancer 2008, 122(6):1324-1332. PubMed Abstract | Publisher Full Text OpenURL

  8. Pujana M, Han J, Starita L, Stevens K, Tewari M, Ahn J, Rennert G, Moreno V, Kirchhoff T, Gold B, et al.: Network modeling links breast cancer susceptibility and centrosome dysfunction.

    Nature genetics 2007, 39(11):1338-1349. PubMed Abstract | Publisher Full Text OpenURL

  9. Zhang J, Xiang Y, Ding L, et al.: Using gene co-expression network analysis to predict biomarkers for chronic lymphocytic leukemia.

    BMC bioinformatics 2010, 11(Suppl 9):S5. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  10. Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Qi S, Chen Z, et al.: Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target.

    Proceedings of the National Academy of Sciences 2006, 103(46):17402. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. Peeters R: The maximum edge biclique problem is NP-complete.

    Discrete Applied Mathematics 2003, 131(3):651-654. Publisher Full Text OpenURL

  12. Xiang Y, Jin R, Fuhry D, Dragan FF: Summarizing transactional databases with overlapped hyper-rectangles.

    Data Mining and Knowledge Discovery 2011, 23(2):215-251. Publisher Full Text OpenURL

  13. Gibson D, Kumar R, Tomkins A: Discovering large dense subgraphs in massive graphs.

    Proceedings of the 31st international conference on Very large data bases 2005, 721-732.

    VLDB Endowment

    OpenURL

  14. Abello J, Resende M, Sudarsky S: Massive quasi-clique detection.

    LATIN 2002: Theoretical Informatics 2002, 598-612. OpenURL

  15. Li J, Sim K, Liu G, Wong L: Maximal quasi-bicliques with balanced noise tolerance: Concepts and co-clustering applications.

    Proceedings of SDM, Atlanta, GA, USA 2008, 72-83. OpenURL

  16. Zhang J, Lu K, Xiang Y, Islam M, Kotian S, Kais Z, Lee C, Arora M, Liu Hw, Parvin JD, et al.: Weighted Frequent Gene Co-expression Network Mining to Identify Genes Involved in Genome Stability.

    PLoS Computational Biology 2012, 8(8):e1002656. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Xiang Y, Zhang CQ, Huang K: Predicting glioblastoma prognosis networks using weighted gene co-expression network analysis on TCGA data.

    BMC Bioinformatics 2012, 13(Suppl 2):S12. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Zhang B, Horvath S, et al.: A general framework for weighted gene co-expression network analysis.

    Statistical applications in genetics and molecular biology 2005, 4:1128. PubMed Abstract | Publisher Full Text OpenURL

  19. Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis.

    BMC bioinformatics 2008, 9:559. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  20. van't Veer LJ, Dai H, van de Vijver MJ, et al.: Gene expression profiling predicts clinical outcome of breast cancer.

    Nature 2002, 415(6871):530-536. PubMed Abstract | Publisher Full Text OpenURL

  21. van de Vijver MJ, He YD, van't Veer LJ, et al.: A Gene-Expression Signature as a Predictor of Survival in Breast Cancer.

    The New England Journal of Medicine 2002, 347(25):1999-2009. PubMed Abstract | Publisher Full Text OpenURL

  22. Pawitan Y, Bjöhle J, Amler L, et al.: Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts.

    Breast Cancer Research 2005, 7(6):R953-R964. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  23. Wang Y, Klijn PG, Zhang Y, et al.: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

    The Lancet 2005, 365(9460):671-679. PubMed Abstract | Publisher Full Text OpenURL

  24. Carroll JS, Meyer CA, Song J, Li W, et al.: Genome-wide analysis of estrogen receptor binding sites.

    Nature Genetics 2006, 38(11):1289-1297. PubMed Abstract | Publisher Full Text OpenURL

  25. Xiang Y, Fuhry D, Kaya K, Jin R, atalyürek ÜV, Huang K: Merging network patterns: a general framework to summarize biomedical network data.

    Network Modeling and Analysis in Health Informatics and Bioinformatics 2012, 1(3):103-116. Publisher Full Text OpenURL

  26. Brown LA, Hoog J, Chin SF, Tao Y, Zayed AA, Chin K, Teschendorff AE, Quackenbush JF, Marioni JC, Leung S, et al.: ESR1 gene amplification in breast cancer: a common phenomenon?

    Nature genetics 2008, 40(7):806-807. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  27. Ogawa Y, Hai E, Matsumoto K, Ikeda K, Tokunaga S, Nagahara H, Sakurai K, Inoue T, Nishiguchi Y: Androgen receptor expression in breast cancer: relationship with clinicopathological factors and biomarkers.

    International Journal of Clinical Oncology 2008, 13(5):431-435. PubMed Abstract | Publisher Full Text OpenURL

  28. Sundvall M, Iljin K, Kilpinen S, Sara H, Kallioniemi OP, Elenius K: Role of ErbB4 in breast cancer.

    Journal of mammary gland biology and neoplasia 2008, 13(2):259-268. PubMed Abstract | Publisher Full Text OpenURL

  29. Ohta T, Fukuda M: Ubiquitin and breast cancer.

    Oncogene 2004, 23(11):2079-2088. PubMed Abstract | Publisher Full Text OpenURL

  30. Mani A, Gelmann EP: The ubiquitin-proteasome pathway and its role in cancer.

    Journal of Clinical Oncology 2005, 23(21):4776-4789. PubMed Abstract | Publisher Full Text OpenURL

  31. Ou Y, Zhang C: A new multimembership clustering method.

    Journal of Industrial and Management Optimization 2007, 3(4):619-624. OpenURL

  32. Van De Vijver M, He Y, van't Veer L, Dai H, Hart A, Voskuil D, Schreiber G, Peterse J, Roberts C, Marton M, et al.: A gene-expression signature as a predictor of survival in breast cancer.

    New England Journal of Medicine 2002, 347(25):1999. PubMed Abstract | Publisher Full Text OpenURL