Abstract
Background
The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually coexpress under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NPcomplete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis.
Results
We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomialtime complexity in the worst case instead of exponentialtime complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additiverelated and multiplicativerelated biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cellcycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm.
Conclusion
We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting coregulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.
Background
Gene expression matrix
Data from microarray experiments [2,3] is frequently given as a large matrix showing expression levels of genes (rows) under different experimental conditions (columns). The socalled gene expression data can thus be written as a matrix of size m × n where m is the number of genes and n is the number of experimental conditions. Typically m is much greater than n. For example, (m, n) is (6220, 15) and (4026, 47) respectively for the time series yeast samples [4] and lymphoma specimens [5]. One of the challenges in microarray data analysis is to identify groupings of genes with similar behaviours/functions. Several clustering algorithms have been applied to DNA gene expression data to identify biologically relevant groupings based on similarity in expression profiles [610]. However, traditional clustering techniques are global in nature in which the expression patterns are grouped either along the entire row or along the entire column [1,11]. This implies that one would find the grouping of genes that would express similarly for all conditions, or the groupings of conditions in which all genes exhibit similar behaviour. However, in practice only a subset of genes is highly correlated under a subset of conditions. This requires simultaneous clustering along both the row and column directions, and is often called biclustering [1116]. A bicluster often exhibit certain kinds of homogeneity, for example constant level of expression throughout the whole bicluster (constant bicluster), constant level of expression along either rows or columns (constant rows and constant columns), and rows/columns that are related by additions or multiplications [15], as shown in Figure 1. We have recently shown that the different bicluster patterns have a simple geometric interpretation as linear objects in a high dimensional feature space [14,15]. A comprehensive survey on different biclustering algorithms was given in references [11,13,16].
Figure 1. Examples of different biclusters. (A) A constant bicluster. (B) A constant row bicluster. (C) A constant column bicluster. (D) An additiverelated bicluster. (E) A multiplicativerelated bicluster. Note that Ci denotes the ith experimental condition.
Parallel coordinate plots
The parallel coordinate (PC) technique is a powerful method for visualizing and analyzing highdimensional data under a twodimensional setting [17,18]. In this technique, each dimension is represented as a vertical axis, and then the Ndimensional axis is arranged in parallel to each other. By giving up the orthogonal representation, the number of dimensions that can be visualized is not restricted to only two [1921]. Studies have found that geometric structure can still be preserved by the PC plot despite that the orthogonal property is destroyed [1721]. In gene expression matrix, each gene is represented by a vector of conditions (i.e., row) and each condition is considered as a vector of genes (i.e., column). Since gene expression data always involves a large number of genes as well as a certain number of experimental conditions, the PC technique is well suited to their analysis. Moreover, visualization of gene expression data is an important problem for biological knowledge discovery [22]. Thus, the PC plots have been studied for gene expression data visualization [23,24]. Further details about visualization of biclusters using PC plots are provided in Additional file 1. In section "Method", a new greedy algorithm for bicluster identification is presented. Meanwhile, an interactive approach of parameter determination for the proposed biclustering algorithm based on PC visualization is discussed.
Additional File 1. Supplementary materials. This file contains the content about the parallel coordinate (PC) representation of biclusters given in Figure 1 in the article, which can serve as preliminary for readers who are unfamiliar with PC plots. It also discusses visualization of biclusters in a data matrix which may help reader to understand the interactive adjustment of noise threshold for the proposed algorithm described in the article.
Format: PDF Size: 40KB Download file
This file can be viewed with: Adobe Acrobat Reader
Methods
Identification of biclusters from difference matrix
The biclusters given in Figure 1(A)–(D) can be described by an additive model in which each pair of rows has the same difference in all the related columns or each pair of columns has the same difference in all the related rows. Thus, a difference matrix, each column of which represents the column differences between a pair of columns in a data matrix, provides useful information for identification of additiverelated biclusters. Consider the data in Figure 2, there are two biclusters: the first one (shown in blue color) is a constant bicluster while the second one (shown in yellow color) is an additiverelated bicluster. As the rows in a bicluster is supposed to correlate in a subset of columns, the column difference between every two columns is computed so as to identify this column subset. There are altogether 6(61)/2 = 15 permutations as shown in the difference matrix in Figure 3. In the difference matrix, we can find special features that are related to the biclusters. For example, consider column "C5C3". There are only three distinct difference values: 0 (5 counts), 1 (1 count), 2 (5 counts). This suggests the existence of three biclusters formed between "C5" and "C3":
Figure 2. An example of gene expression matrix with two embedded biclusters.
Figure 3. The difference matrix for the dataset shown in Figure 2.
• the first bicluster is for rows R1, R3, R5, R9 and R11 in which the difference between "C5" and "C3" is zero, i.e., a constant bicluster;
• the second bicluster is for rows R2, R4, R6, R8 and R10 in which the difference between "C5" and "C3" is two, i.e., an additive bicluster; and
• the third bicluster involves row R7 only, thus it is not considered to be a valid bicluster.
Analyzing the distribution along the column direction in the difference matrix thus helps to identify possible biclusters. In the above example, we have two valid biclusters. Thus, C3 and C5 are merged to form two groups as shown in Figure 4. The analysis can be repeated for each of these two groups to find out whether any other columns can be merged to {C3, C5}, i.e., using either C3 or C5 as a reference, we check whether C1, C2, C4 and C6 can be merged with {C3, C5}. In particular, if C3 is used as a reference, two difference matrices as shown in Figure 5 can be obtained. Note that their difference values can be read directly from the original difference matrix of Figure 3. By examining the first difference matrix in Figure 5, we see that two paired columns, "C1C3" and "C2C3", show a single bicluster with a difference value equals to zero. This suggests that columns C1 and C2 can be merged to {C3, C5} for rows R1, R3, R5, R9 and R11. The second difference matrix also has a single cluster with a difference value equal to 1 at paired column "C6C3". Therefore, C6 can be merged to {C3, C5} for rows R2, R4, R6, R8 and R10. Thus by this repeated bicluster growing process – expanding the column set and refining the row set, we can identify possible biclusters embedded in the dataset. Also, note that the difference matrix needs to be calculated only once. This greatly reduces the computational complexity of our algorithm.
Figure 4. The two different groups formed by merging columns "C5" and "C3".
Figure 5. The difference matrix for the two different groups formed by merging columns "C5" and "C3".
Proposed algorithm for additive models
Additiverelated biclusters can be found by progressively merging columns through studying the data distribution along each column in the difference matrix. If there is just one bicluster between two columns in the gene expression matrix, the distribution will have a single peak in one of the columns of the difference matrix. Related rows for this bicluster can then be identified. If there are multiple biclusters formed between two columns in the gene expression matrix, we can separate the rows into different groups by examining the distribution in the corresponding columns of the difference matrix. Therefore, by analyzing the distributions of difference values along columns of the difference matrix, peaks that correspond to different biclusters can be identified.
An overview of the procedure of our proposed biclustering algorithm is shown in the flow chart provided in Figure 6 while the details are described in the pseudocode in Figure 7. There are four parameters in our algorithm: noise threshold ε, minimum number of rows N_{r}, minimum number of columns N_{c }and maximum bicluster overlap in percentage P_{o}. The parameter ε specifies the noise tolerance as well as the homogeneity in the identified biclusters. On the other hand, N_{r }and N_{c }set the lower bounds of the number of rows and columns of the identified biclusters respectively. P_{o }determines the maximum degree of overlap between identified biclusters. More specifically, no overlap exceeding P_{o }percentage in both the row and column dimensions simultaneously is allowed. In this paper, a bicluster with a subset of rows R and a subset of columns C is denoted by (R, C). At the beginning, the firstlevel difference matrix D_{1 }is calculated for the input expression matrix E as described in line 4 in Figure 7. Supposed that E has size m rows by n columns. There is altogether n(n  1)/2 different number of permutations so the size of D_{1 }is m × n(n  1)/2. In order to derive possible biclusters, a simple clustering algorithm can be applied to identify clusters for each column (lines 6–12). Let X = {x_{1}, x_{2},...,x_{N}} be a set of N expression values. By comparing x_{i }with all values in X, a set of values S_{i }similar to x_{i }can be found as follows,
Figure 6. The flow chart of the proposed biclustering algorithm.
Figure 7. The pseudocode of the proposed biclustering algorithm.
where i = 1, 2, ..., N. Also, the set of indices Q_{i }associated with the values in S_{i }can be obtained. Q_{i }can be expressed by
As an example, given that X = {1, 2, 9, 3} and ε = 2. S_{2 }= {1, 2, 3} and Q_{2 }= {1, 2, 4}. A clustering algorithm based on equation (1) would generate N clusters but these clusters may be very close to each other and have large overlap. In order to reduce unnecessary clusters, we adopt a twostep clustering approach presented in lines 56–84. In addition to the definitions in (1) and (2), let us denote the current collections of clusters and corresponding sets of indices by S and Q, which are both set to be empty initially. In the first step (lines 58–75), for i = 1, 2, ..., N, x_{i }and its associated cluster S_{i }are tested for the following three conditions with each S_{j }∈ S:
(1) x_{i }  ≥ ε where denotes the average operation of a set.
(2) S_{i} ≥ N_{r}, where • denotes the cardinality of a set.
If the above three conditions are satisfied for all S_{j }∈ S, S_{i }and Q_{i }are added to S and Q respectively. In the second step (lines 76–82), the clusters are refined. Denote the sets of output clusters and the corresponding indices by S' and Q' respectively. First, S' and Q' are set to be empty. For each S_{j }∈ S, a new cluster S'_{j }is derived as
The corresponding set of indices Q'_{j }is given by
S'_{j }and Q'_{j }are added to S' and Q' respectively if S_{j}' ≥ N_{r}. For the firstlevel difference matrix D_{1}, each Q'_{j }contains the row indices of the cluster S'_{j}. Each column of D_{1 }consists of difference values between column i and j of the original expression matrix. Define the collection of row indices sets of the clusters to be U_{ij}. After finding all U_{ij }for all distinct column pairs (i, j), the row indices set of the clusters and their associated column pairs are collected to form a list of possible biclusters L_{1 }which can be expressed by
As one always tries to find the biggest bicluster, a sorting is performed for the possible biclusters in L_{1 }based on the number of rows in line 13 so that a bicluster with the largest number of rows can be processed first.
Starting from the biggest bicluster l_{1 }in the sorted list of possible biclusters L_{2}, the secondlevel difference matrix D_{2 }is formed as in line 24 in which one of the bicluster columns (column c_{k1 }or c_{k2}) is compared with all the remaining columns on those chosen rows (e.g. difference matrices illustrated in Figure 3). Note that the secondlevel difference matrix D_{2 }can be obtained directly from the firstlevel difference matrix D_{1}. Before D_{2 }calculation, early termination can be introduced as presented in lines 17–23 as an optional step. In the early termination, the biclusters in L_{2 }which significantly overlap with the identified biclusters are skipped as they are unlikely to derive a welldistinguishable bicluster according to the given parameter P_{o}. Similar to the clustering done for D_{1}, clustering and sorting are performed for D_{2 }as described in lines 26–33. As a result, a list of possible column segments H_{2 }for growing the current bicluster is obtained. In lines 34–41, a possible bicluster (R, C) is constructed based on the row intersection with each column segment in H_{2}. Initially, (R, C) is set to be the current bicluster l_{k }in L_{2}. If the size of the row set R does not fall below the userdefined threshold N_{r }after the row intersects with a column segment, the column is included in C and R is updated. Otherwise, the process is moved to the next column segments until the last one is examined. Finally, the bicluster is validated with respect to the given requirements in bicluster size and degree of overlap as depicted in lines 42–50. Only a valid bicluster is output (lines 51–53).
Relation to existing δpCluster approaches
The proposed algorithm identifies biclusters which are homogeneous in each column pair. In this section, we show that the biclusters can be expressed as δpClusters [25]. Hence, any submatrix in an identified bicluster has similar homogeneity to that bicluster and the problem of outliers as in Cheng and Church algorithm [12] can be avoided. Denote a bicluster with a subset of rows U and a subset of columns V by B = (U, V). The bicluster B is a δpCluster if for each 2 × 2 submatrix M, the following condition holds
where , a_{ij }denotes a value of the expression matrix at position (i, j), i, m ∈ U and j, n ∈ V. In our algorithm, the clustering (the second step) performed in the secondlevel difference matrix ensures that there exists a column k ∈ V such that
where ε is the noise threshold parameter of the proposed algorithm. Hence, for any i, m ∈ U, we have
where the last inequality follows from inequality (7). For a column n ∈ V with n ≠ k, using inequality (8), it is shown that
This means that the bicluster B is a δpCluster with δ = 4ε. Although the biclusters identified by our algorithm are δpClusters, it should be emphasized that our algorithm is not designed specially for detecting δpClusters but rather is based on the clustering results in the difference matrix. Hence, there are some differences between our biclustering strategy and the other δpClusters algorithms like pCluster algorithm [25] and S. Yoon et al. approach [13]. Specifically, our algorithm takes into account the cluster density in which cluster centroids are considered. In contrast, the other two δpCluster based algorithms rely only on the interdistances between elements in the difference matrix as defined by the inequality (6). This results in an exponentialtime complexity in the worst case. Our proposed algorithm can be regarded as a greedy version of the other two algorithms. In particular, for each columnpair bicluster, our proposed algorithm derives a possible bicluster by greedily finding a larger column set through sequential intersection with other columnpair biclusters. The large columnpair biclusters usually contain the whole or a large part of the true gene set. On the other hand, these simplifications significantly reduce the complexity from exponentialtime to polynomialtime.
Complexity estimation
In general, a biclustering problem is NPcomplete [11]. However, we have adopted a simple clustering algorithm and bicluster growing strategy to reduce the complexity. Given a matrix of size m × n, the complexity of obtaining the difference matrix is O(mn^{2}). The simple clustering algorithm applied on each column requires operations on the order of O(m^{2}) because it involves comparing the value of each element with the others and the centroids of the found clusters. In addition, the total number of clusters found would not exceed m. Therefore, the complexity in obtaining clusters in the difference matrix is O(m^{2}n^{2}) and the number of clusters is at most mn(n1)/2. The sorting of the clusters requires a complexity of O(mn^{2 }log mn^{2}). After that, each identified cluster is used as a seed to construct a bicluster. In the biclusters growing process, a seed is first checked if it has significant overlap with other identified biclusters for early termination. The overlapping in rows can be checked by sorting followed by elementwise comparison. The complexity is thus O(mlogm). For columns, as a seed has only two columns, the complexity is O(n). Note that the number of identified biclusters is bounded by the number of seeds. Thus, the complexity for checking overlaps in all identified biclusters is O(mn^{2 }(n + m log m)). If the seed is valid, a submatrix of the difference matrix is extracted as the secondlevel difference matrix. This step requires no arithmetic operations due to data reuse. Clustering and sorting procedure are then performed on this secondlevel difference matrix. As the matrix has n1 columns only, the clustering and the sorting processes need operations on the order of O(m^{2}n) and O(mn log(mn)), respectively. Note that there are at most (n1)m clusters detected in the secondlevel difference matrix. In the bicluster construction, row intersection is performed. In total, the complexity is O(m^{2}n log m). Finally, the new identified bicluster is validated (i.e. filtered) with respect to the number of columns and degree of overlap with other biclusters. The validation requires an additional complexity of O(mn^{2 }(m log m + n log n)). Among the operations for obtaining each biclusters from the firstlevel difference matrix, the validation step dominates. So the entire processing for bicluster formation from seeds is O(m^{2}n^{4 }(m log m + n log n)). Since this cost dominates all other costs in previous steps, our algorithm has a polynomialtime complexity of O(m^{2}n^{4 }(m log m + n log n)). The above estimation shows the worst case complexity, in which the validation process dominates. In practice, the number of biclusters is far less than mn(n  1)/2. Moreover, some of the validation steps can be avoided through early termination of invalid biclusters. Elimination of invalid biclusters reduces the number of potential biclusters and this in turn reduces the complexity inside the validation step.
Modification for multiplicative models
As seen in Figure 1(E), a multiplicativerelated bicluster is a bicluster in which any two rows are related by the same ratio in all the related columns or any two columns are related by the same ratio in all the related rows. In order to modify the proposed framework for multiplicative models, the difference matrix is replaced by a ratio matrix which is in the form of c_{i}/c_{j }or c_{j}/c_{i }for all the n(n  1)/2 distinct combinations between columns i and j where c_{k }represents the values in the kth column. In practice, we select the column which has the largest average magnitude as the denominator because quotient is sensitive to noise when the divisor is small. Thus, the major change for detecting multiplicativerelated biclusters is to replace the difference matrix by a ratio matrix. Note that the complexity for multiplicative models is essentially the same as that for additive models.
Interactive adjustment of noise threshold using PC plots
The setting of the noise threshold ε is important for the proposed algorithm as it balances the homogeneity requirement and the noise tolerance in the identified biclusters. The noise threshold is determined through visual inspection of the homogeneity of the detected biclusters in the PC plots [26]. The PC visualization for a data matrix embedded with biclusters can be found in Additional file 1. Consider a noisy 100 × 10 dataset which contains uniformly distributed values between 5 and 5 embedded with a 30 × 4 additiverelated bicluster shown in Figure 8. Furthermore, an additive Gaussian noise with variance of 0.2, which was chosen empirically for clear demonstration, was introduced. In this example, we varied the values of the noise threshold ε while fixing the values of the minimum number of rows N_{r}, the minimum number of columns N_{c }and the maximum overlap with other biclusters P_{o }to be 20, 4 and 20%, respectively. Figure 9 shows a bicluster found by our algorithm when ε is set to 1.2. The four columns are found correctly, however, three rows are missed. Figure 9(A) shows the four columns but with all the rows in the original data while Figure 9(B) shows the difference between the last three columns with respect to the first column. Figures 9(C) and 9(D) illustrate the inconsistency between the identified bicluster and the true bicluster using PC plots of expression values and difference values respectively in which the three missed rows are displayed in blue. We can see that these three rows are missed because the noise threshold is not large enough. In practice, since we do not know the bicluster in advance, we should adopt an exploratory approach for setting the parameter ε. Start with the current value of ε, we gradually increase ε while visualizing the bicluster using the PC plot. Initially, we would see more and more related rows being included into the bicluster. Then, at some point, unrelated rows start to creep into the bicluster. When this is observed in the PC plot, we stop increasing the noise threshold. Using this procedure, we found that when ε is set to 1.5, all the rows are correctly detected. This example shows that the PC plot can be a powerful visualization and interactive tool that allows us to examine the biclusters found.
Figure 8. The PC plot of the true additiverelated bicluster.
Figure 9. The PC plots of the bicluster identified using noise threshold of 1.2. The expression values of all rows in the four related columns and their difference between the last three columns and the first column are drawn using PC plots in Figures (A) and (B) respectively. In Figures (A) and (B), red color shows rows from the identified bicluster while blue color shows rows from the original dataset. Figures (C) and (D) illustrate the inconsistencies between the identified bicluster and the true bicluster in the four related columns and the column differences respectively. In Figures (C) and (D), the red color indicates rows of the true bicluster that are found by our algorithm while the blue color represents the three missed rows of the true bicluster.
Results and Discussion
Evaluation methods
We analyze the performance of our algorithm on both artificial datasets and a real dataset. For artificial datasets, biclusters information is known in advance. So accuracy in bicluster discovery can be measured using the overall match score [16]. The overall match score of a set of biclusters M_{1 }with respect to another set of biclusters M_{2 }is defined as,
where and are gene and condition match scores respectively. is calculated as,
where a bicluster with a subset of genes U_{i }and a subset of conditions V_{i }is denoted by (U_{i}, V_{i}). is defined similarly with U replaced by V. Let M be the set of detected biclusters and M_{t }be the set of true biclusters embedded in the artificial expression dataset. The overall match score S*(M, M_{t}) quantifies the average relevance of the detected biclusters to the true biclusters. Conversely, S*(M_{t}, M) measures the average recovery of the true biclusters in the detected biclusters. To unify the two measures into a single quantity for evaluation, their average is computed as the biclustering accuracy.
The performance of the proposed algorithm for artificial datasets has been compared with two existing algorithms with the additive model assumption, namely the Cheng and Church (C&C) algorithm [12] and the pCluster algorithm [25]. We considered the biclustering accuracy together with other measures such as number of biclusters, bicluster size and processing time. The programs for both algorithms are publicly available [27,28]. The proposed algorithm was implemented in a C MEXfile and ran in Matlab 6.5. All the experiments were conducted on the Window XP platform in a computer with 2.4 GHz Intel Pentium 4 CPU and 512 MB RAM. In identification of multiplicativerelated biclusters, since C&C algorithm and the pCluster algorithm are designed for additive models, logarithm operation was applied to the expression data so that the multiplicative models become additive models. For comparison, we also applied the proposed algorithm for additive models to the logarithm values. Henceforth, the proposed algorithm for additive models and multiplicative models will be referred to as PA and PM respectively while the proposed algorithm for additive models with the logarithm operation as preprocessing will be referred to as PAL.
The evaluation on real datasets was performed on three aspects: biological, homogeneity and statistical assessment. In the biological assessment, we used the Gene Ontology (GO) annotations [29] to determine the functional enrichment of biclusters. The measure was the percentage of overrepresented biclusters in one or more GO annotation. A bicluster is said to be overrepresented in a functional category if it gives a small pvalue. Given that a bicluster B with k genes is identified in a gene expression matrix with a gene set S of size N. For a functional category with C genes in S, the bicluster B possesses r genes. The pvalue is defined as the probability of choosing k genes from S with r genes in that category [30], i.e.,
In other words, the pvalue is the probability of including genes of a given category in a cluster by chance. Thus, the overrepresented bicluster is a cluster of genes which is very unlikely to be obtained randomly. The annotations consist of three ontologies, namely biological process, cellular component and molecular function.
For the homogeneity aspect, mean squared residue score (MSRS) [12] and average correlation value (ACV) [31] were computed. For an m × n bicluster, the MSRS is defined as
where a_{ij }is the value of the bicluster at position (i, j), is the average of the ith row, is the average of the jth column and is the overall average. ACV is defined by
where c_row_{ij }is the correlation coefficient between rows i and j and c_col_{pq }is the correlation coefficient between columns p and q. ACV is applicable to additive models as well as multiplicative models but the MSRS is valid only for additive models. In order to measure homogeneity of multiplicativerelated biclusters, logarithm was applied onto the expression values before calculating MSRS values so that a multiplicativerelated bicluster can be formulated using an additive model. In order to avoid confusion, the MSRS for the logarithm of expression values is denoted by MSRS_{l}. A bicluster with high homogeneity in expression levels should have a low MSRS/MSRS_{l }value but a high ACV value. The minimum value of MSRS/MSRS_{l }is zero while ACV has a maximum value of one.
The statistical properties of the biclustering results refer to quantities including the number of discovered biclusters and the bicluster size. Comparative studies were performed in the three aspects with several existing biclustering algorithms such as C&C, iterative signature algorithm (ISA) [32,33], orderpreserving submatrix (OPSM) approach [1] and xMotifs [34], which are available in [27]. In addition, the computational complexity of the proposed algorithm and other approaches is estimated using processing time as done for the artificial datasets. Despite the dependence of factors such as programming language and parameter settings, a rough comparison in complexity can still be achieved.
Datasets
Two types of artificial datasets were considered, one for the additive models and the other for the multiplicative models. The first type of dataset TD1 had a size of 200 rows by 40 columns. Uniformly distributed random values were first generated. Then four biclusters were embedded. Their details are as follows:
• bicluster A is a constant row bicluster of size 40 × 7;
• bicluster B is a constant row bicluster of size 25 × 10;
• bicluster C is a constant column bicluster of size 35 × 8; and
• bicluster D has coherent values related by additions of size 40 × 8.
Biclusters A and B have two columns in common but in different rows; bicluster B overlaps with bicluster C in five rows and three columns; biclusters C and D have one column in common but in different rows. Finally, Gaussian noise with different standard deviation (s.d.) was added to the dataset. At each nonzero noise level, five expression matrices were generated. Figure 10 shows the dataset TD1 with 4 embedded biclusters before noise was added.
Figure 10. The first type of dataset with four additiverelated biclusters before noise is added.
The second type of dataset TD2 consists of 60 × 15 positive values embedded with two 25 × 7 multiplicativerelated biclusters. The two biclusters overlap in two columns. A positivebiased Gaussian noise was added to the dataset so that all the values in the resultant datasets remained positive. The positivevalued dataset was essential for Cheng and Church algorithm, the pCluster algorithm and our proposed algorithm for the additive models PAL due to the use of the logarithm operation. It should be noted that the proposed algorithm for multiplicativerelated biclusters PM can be applied on datasets with negative values because no logarithm operation is needed. Figure 11 shows the dataset TD2 with two embedded multiplicative biclusters before noise was added. These two artificial datasets allowed us to test the performance of our algorithm in realistic situations as real expression data often involves various types of biclusters with overlaps (i.e. regulatory complexity) and noise.
Figure 11. The second type of dataset with two multiplicativerelated biclusters before noise is added.
The real dataset used was the yeast Saccharomyces cerevisiae cell cycle dataset as used in [12], which contains 2884 genes and 17 conditions. The nonmissing values were all nonnegative. As multiplicative models were also investigated, those zero nonmissing values were set to some small positive values. The missing values were filled with positive uniformly distributed random values to minimize the influence to our analysis.
Performance on artificial datasets
For the artificial datasets with additiverelated biclusters, biclusters with rows and columns more than or equal to 21 and 5 respectively were identified. It was further required that any detected bicluster cannot have more than 50% overlap with another bicluster simultaneously in the row and column dimensions. Since the Cheng and Church (C&C) algorithm and the pCluster algorithm cannot be directly configured to discover biclusters with all the given requirements, a postfiltering procedure was adopted to eliminate those invalid biclusters. The postfiltering parameters are provided in Table 1 together with the parameters of the biclustering algorithms. Note that parameters for noise tolerance (ε/δ) were determined for optimal performance under different noise levels. The biclustering accuracies are plotted against various noise levels in Figure 12. As can be seen, the proposed algorithm always has higher biclustering accuracy than C&C and the pCluster algorithm. For the expression dataset with noise of standard deviation at or below 0.1, we detect the four embedded biclusters perfectly. The pCluster algorithm did not attain perfect discovery even in the noisefree case because more than one maximal δpCluster (defined by equation (6)) exists for one or more column pair due to column overlap between some biclusters in the datasets [35]. In more noisy case such as when the noise s.d. is 0.5, the biclustering accuracy of our algorithm still has a high value of 0.89. In contrast, the accuracies of C&C and the pCluster algorithm are 0.70 and 0.26 respectively.
Table 1. Parameter settings for biclustering algorithms and postfiltering in the experiments on artificial datasets
Figure 12. Biclustering accuracy against noise level for additive models. The biclustering accuracies of the proposed algorithm, Cheng and Church algorithm and pCluster algorithm are represented by the curves 'Proposed', 'C&C' and 'pCluster' respectively.
Statistical properties of the biclustering results before filtering are given in Table 2. Unlike the pCluster algorithm, the number of biclusters identified by the proposed algorithm is insensitive to noise level. On average, there were 6.6 biclusters identified at the highest noise level which was close to the true number 4. For the pCluster algorithm, a large number of biclusters with high overlap were detected under noisy situation. The postfiltering procedure was therefore necessary for the pCluster algorithm to extract the significant biclusters. The number of biclusters identified by C&C was 40 which is the same as that specified in its parameter setting. In fact, this parameter setting was necessary to acquire high biclustering accuracy. With respect to the biclusters size, the proposed algorithm shows the closest agreement to those embedded in the datasets. The average numbers of rows and columns in the biclustering results are always around 34 and 7.8 respectively while the actual average numbers of rows and columns are 35 and 8.25 respectively. The pCluster algorithm also produced good results. C&C gave the worst performance as it does not allow any constraints to be imposed on the biclusters dimensions. Therefore, the postfiltering procedure is essential for C&C to find the embedded biclusters.
Table 2. Statistical properties of biclustering results for the artificial datasets embedded with additiverelated biclusters before postfiltering
For the datasets with two multiplicativerelated biclusters, a bicluster was considered to be valid if its size is no smaller than 18 and 4 in row and column dimensions respectively and the overlap with other valid biclusters is less than or equal to 25%. The settings for the biclustering algorithms and the postfiltering procedure are also included in Table 1. The biclustering accuracies of the proposed algorithms PAL and PM, together with C&C and the pCluster algorithm (applied on log values) at various noise levels is shown in Figure 13. At all the noise levels, our two proposed algorithms outperform C&C and the pCluster algorithm. Both PAL and PM can exactly detect the true biclusters in the noisefree case while the other two algorithms fail to do so. In particular, the failure of perfect discovery in the pCluster algorithm can be attributed to the column overlap in the datasets. The performance of PM is slightly better than that of PAL in general. The biclustering accuracy decreases when the noise level increases except in the case of C&C when noise level changes from 0.4 to 0.5. It was probably because outlier is less likely to be included in biclusters at high noise levels. In terms of the statistical properties given in Table 3, the two proposed algorithms exhibit closest match to the true embedded biclusters, with PM performs slightly better than PAL. Similar to the case of the additive models, the proposed algorithms can return more reasonable number of biclusters with similar dimensions to those embedded than the other two algorithms without any postfiltering procedure.
Figure 13. Biclustering accuracy against noise level for multiplicative models. The biclustering accuracy of the proposed algorithms for multiplicative models is denoted by 'PM' while the proposed algorithms for additive models, Cheng and Church algorithm and pCluster algorithm on logarithm of expression data are labelled by 'PAL', 'C&C (log)' and 'pCluster (log)' respectively.
Table 3. Statistical properties of biclustering results for the artificial datasets embedded with multiplicativerelated biclusters before postfiltering
In order to justify the efficiency of our proposed algorithms, processing time for the artificial datasets with noise s.d. of 0.3 was measured and provided in Table 4. The proposed algorithm PA required an average of 3.8 sec for the artificial datasets with additiverelated biclusters. This showed substantial improvement over the pCluster algorithm which needed 2716 sec to finish. The reduction in computational complexity is achieved by the bicluster growing strategy in which similar patterns in columnpair are combined to form biclusters through row intersection. The proposed algorithm is also more efficient than C&C by a factor of 4.5. For datasets embedded with multiplicativerelated biclusters, the matrices sizes are smaller than those used for additivemodel experiments so less processing time was obtained in all the algorithms. However, it can be seen that the proposed approach PM has the lowest computational complexity. The average processing time was 0.0232, 1 and 105.2 sec for PM, C&C and the pCluster algorithm respectively. In conclusion, the results on artificial datasets demonstrate that our proposed algorithms have high accuracy in detecting additiverelated and multiplicativerelated biclusters, even in the presence of overlap and noise contamination. The computational complexity of the proposed algorithms is lower than several biclustering algorithms with similar model assumption.
Table 4. Average processing time for the artificial datasets
Performance on a real dataset
Experiments have been conducted on the yeast cell cycle dataset using the proposed algorithms and Cheng and Church (C&C) algorithm [12], iterative signature algorithm (ISA) [32,33], orderpreserving submatrix (OPSM) approach [1] and xMotifs [34]. Postfiltering was applied to the biclustering results in order to eliminate insignificant biclusters as well as impose common constraints for comparison. The parameter settings of various algorithms and postfiltering are provided in Table 5. These values were selected based on the guideline in [16] and our experimental work. The functional enrichment was studied over a number of upper bounds on pvalue, p_{0 }and illustrated in Figure 14. Compared with C&C which possesses the same model assumption as the proposed algorithm for additive model (PA), higher percentage of functionallyenriched biclusters were identified by the proposed algorithm at p_{0 }≥ 5 × 10^{4}, 1 × 10^{2 }and 5 × 10^{3 }in the biological process, cellular component and molecular function ontologies respectively. In particular, at p_{0 }= 1 × 10^{2}, the percentage of functionallyenriched biclusters found by PA is 96.0%, 88.0% and 80.0% which correspond to an improvement of 18.6%, 13.8% and 9.0% to C&C in the biological process, cellular component and molecular function ontologies, respectively. At the lowest value of p_{0 }= 1 × 10^{5}, our proposed algorithm PA outperforms C&C in the cellular component ontology but not in the other two ontologies. However, the reduction in the percentage of functionallyenriched biclusters is less than 7.5% in the biological process ontology and 2.5% in the molecular function ontology, which is relatively small compared with the improvement at the large values of p_{0}. The homogeneous analysis provided in Table 6 shows that the biclusters identified by PA are more homogeneous than C&C with the average MSRS lower by 142.4 and the average ACV higher by 0.0199. From the statistical results in Table 7, it can be found that PA can also avoid identification of very large bicluster as is in the case of C&C. The largest bicluster size found using PA is 597 × 17 while that found using C&C is 1391 × 17.
Table 5. Parameter settings of algorithms and postprocessing investigated in experiments based on the yeast dataset
Table 6. Homogeneity comparison of biclusters identified in the yeast cellcycle dataset using various algorithms
Table 7. Statistical comparison of biclusters identified in the yeast cellcycle dataset using various algorithms
Figure 14. Percentage of additiverelated biclusters enriched with GO annotations of different ontologies. (A) Biological process ontology. (B) Cellular component ontology. (C) Molecular function ontology.
When multiplicative model is concerned, i.e. the proposed algorithm for multiplicative model (PM) and C&C applied on log value (C&C (log)), the functional enrichment drops in general. At first glance, PM gives poorer performance in term of functional enrichment. Nonetheless, if the number of identified biclusters is also considered, PM actually outperforms C&C (log) by identifying more significant biclusters. The total number of biclusters identified by PM was 59 but CC only found 5 biclusters. In addition, the biclusters identified by PM exhibit higher homogeneity. The average values of MSRS_{l }and ACV are 9.573 × 10^{3 }and 0.9219 for PM respectively. In comparison, the average values of MSRS_{l }and ACV are 6.262 × 10^{2 }and 0.5740 for the C&C (log) respectively.
In addition to C&C based algorithms, Figure 14 shows the comparative results of ISA, OPSM and xMotifs for different values of p_{0}. Although OPSM shows high percentage of functionallyenriched biclusters at large values of p_{0}, there are only two biclusters found which are far from expectation. Thus, the proposed algorithms actually identify more functionallyenriched biclusters. Also, the percentage of functionallyenriched biclusters of OPSM drops to zero at low values of p_{0}. At low values of p_{0}, the results of ISA are the best in most cases. For p_{0 }≥ 5 × 10^{4}, the performance of the proposed algorithm PA, however, is close to or even better than that of ISA. For both OPSM and ISA, the identified biclusters are less homogeneous in terms of average MSRS and ACV because their bicluster models are different from those studied in this paper. PA and PM show better performance than xMotifs in the percentage of functionallyenriched biclusters despite that our algorithms have lower average value of ACV. The reason is that xMotifs is designed to find biclusters with coherent state in each gene, which is only a subclass of additive models. The homogeneity analysis suggests that the difference in biological relevance of identified biclusters between various algorithms such as the proposed algorithm PA and ISA is not merely due to implementation architecture but also due to the model assumption.
In addition to the identification of biologicallysignificant biclusters, the efficiency of the proposed algorithm is justified by the processing time provided in Table 8. PA and PM require 0.72 and 1.35 sec respectively to finish. The results are the best and show improvement by a factor of at least 23.7 compared with the others. This implies that our algorithms have low computational complexity.
Table 8. Processing time for the yeast cellcycle expression dataset using various biclustering algorithms
Details of annotation results of the proposed algorithms PA and PM are shown in Tables 9, 10, 11 and Tables 12, 13, 14 at pvalue < 0.001 respectively. In these tables, Bonferroni correction of pvalue which adjusts the probability of random annotation for multiple tests [30] is provided. Consideration of the corrected pvalue is important when multiple terms are tested for annotation in a single bicluster. The 4th additiverelated bicluster identified by PA has the lowest pvalue in all the three ontologies. For the biological process ontology, 69 out of 201 genes are assigned to category "translation" at pvalue of 6.92 × 10^{44}. The annotation is also significant after multiple test correction as it has a low corrected pvalue of 8.51 × 10^{42}. For the cellular component ontology, 33 out of 201 genes are annotated with category "cytosolic small ribosomal subunit (sensu Eukaryota)" at pvalue of 7.45 × 10^{29 }(corrected pvalue of 5.29 × 10^{27}). For the molecular function ontology, 66 out 201 genes are associated with category "structural constituent of ribosome" at pvalue of 7.61 × 10^{47 }(corrected pvalue of 3.96 × 10^{45}). For the multiplicative model, the 24th bicluster found by PM exhibits the lowest pvalue in all the three ontologies. In fact, there are 141 genes shared between the biclusters with the lowest pvalue identified by PA and PM, which correspond to 70.15% of genes in the bicluster identified by PA. As a result, the 24th bicluster identified by PM are annotated with the similar categories as the 4th bicluster found by PA in all the three ontologies. The annotations are also overrepresented in the bicluster as found in the experiments using PA except that the cellular component category with the lowest pvalue is "cytosolic large ribosomal subunit (sensu Eukaryota)". The pvalues are 6.16 × 10^{69}, 1.11 × 10^{45 }and 3.69 × 10^{75 }(corrected pvalues of 7.51 × 10^{67}, 7.07 × 10^{44 }and 1.88 × 10^{73}) while out of 225 genes there are 91, 46 and 88 genes annotated in the biological process, cellular component and molecular function categories respectively.
Table 9. Annotations of biological process ontology for biclusters identified by the proposed algorithm for additive models at pvalue < 0.001.
Table 10. Annotations of cellular component ontology for biclusters identified by the proposed algorithm for additive models at pvalue < 0.001.
Table 11. Annotations of molecular function ontology for biclusters identified by the proposed algorithm for additive models at pvalue < 0.001.
Table 12. Annotations of biological process ontology for biclusters identified by the proposed algorithm for multiplicative models at pvalue < 0.001.
Table 13. Annotations of cellular component ontology for biclusters identified by the proposed algorithm for multiplicative models at pvalue < 0.001.
Table 14. Annotations of molecular function ontology for biclusters identified by the proposed algorithm for multiplicative models at pvalue < 0.001.
The experiments on the real dataset show that our proposed algorithms PA and PM can identify biclusters with high biological relevance efficiently. Furthermore, PA can always give a reasonable number of biclusters, and with a good degree of homogeneity. Although GO annotation only provides descriptions currently known in the biological community, the results still give a reasonable indication of performance. Furthermore, the biclusters which have no GO terms assigned should be investigated for any new biological discoveries.
Determination of biclusters homogeneity
In previous experiments, the homogeneity parameter, i.e. noise threshold ε of our algorithms is determined empirically. In fact, the aforementioned exploratory approach based on the PC plots can be employed to determine this parameter in an interactive manner for a given dataset. This exploratory approach uses an assumption that the homogeneity decreases monotonically with ε while the biclustering accuracy is a concave function of ε. To see this, we apply the proposed algorithm for additive models to artificial datasets with noise s.d. of 0.3 over a wide range of ε. Figure 15 shows the graphs of biclustering accuracy and ACV against ε. The biclustering accuracy first rises rapidly to its maximum value when ε changes from 0.5 to 1. The biclustering accuracy then decreases slightly until ε becomes 1.75. A steeper drop is found when ε is larger than 1.75. In other words, the biclustering accuracy is approximately concave with respect to ε. On the other hand, when ε increases, the average ACV of detected biclusters decreases as expected. From the graph, it can be observed that the ACV decreases faster when ε exceeds 1.25. Meanwhile, the biclustering accuracy remains high for ε between 1 and 1.25. These observations support the use of the interactive approach for parameter determination.
Figure 15. Biclustering accuracy (solid line) and the ACV (dashed line) against noise threshold ε.
Conclusion
In this paper, a novel biclustering algorithm for additive models is proposed. First, we performed analysis on the difference matrix computed from a gene expression matrix. It was shown that the columnwise differences of an additiverelated bicluster appear as clusters in each corresponding column in the difference matrix. Similarly, clusters can be found from the columnwise ratios calculated from multiplicativerelated biclusters. These observations were then explored to construct biclusters greedily from the clustering results in columnwise differences or ratios in the proposed algorithms.
The proposed algorithms have been analyzed by comparing with pCluster algorithm. The results suggest that the proposed algorithms can be regarded as a greedy version of the pCluster algorithm. The biclusters found by the proposed algorithms can be expressed as δpClusters but clustering density is utilized in pattern discovery. Although the identified δpClusters is not guaranteed to be maximal, the proposed algorithm is much more efficient. Experiments showed that the computational time of the proposed algorithms is lower than that of the pCluster algorithm by a factor of hundreds or more. Moreover, we have verified that the worst case complexity of the proposed algorithms is polynomialtime instead of exponentialtime as in the case of the pCluster algorithm or other δpCluster based approaches.
The robustness of our algorithms to noise and regulatory complexity has been verified empirically using artificial datasets. It was found that our algorithm is capable of discovering overlapping biclusters under noisy condition. Biological significance of biclustering results has been verified on the yeast cellcycle dataset using Gene Ontology annotations. Comparative study shows that the proposed algorithm is the best or close to be the best one among several existing algorithms in terms of the percentage and the number of functionallyenriched biclusters for pvalues below a range of value from 5 × 10^{3 }to 5 × 10^{2}. In particular, there are 96.0%, 88.0% and 80.0% of the biclusters annotated with pvalue below 0.01. The proposed algorithm can identify biclusters with less deviation from the additive models. The identified biclusters also have reasonable size ranged from 10 to 597 genes and 5 to 17 conditions. Comparison in processing time suggests that the proposed algorithm has the highest efficiency.
In the proposed algorithm, the noise threshold is a crucial parameter as it balances the homogeneity requirement and the noise tolerance in the identified biclusters. In order to determine an appropriate value for the noise threshold, an exploratory approach based on the PC plots is adopted. We believe that the proposed biclustering algorithm and the interactive PC plots offer an effective data analysis tool for gene expression data. In future, our research will be focused on detecting bicluster types other than additive or multiplicative models, e.g. biclusters of coherent evolution.
Availability and requirements
Project home page: http://www.eie.polyu.edu.hk/~nflaw/Biclustering/index.html webcite.
Operating system: Window XP
Programming language: Matlab 6.5 or above
License: Free for academic use. For nonacademic use, please contact the author.
Authors' contributions
KOC worked on the proposed biclustering algorithm, implementation and experimental analysis. NFL formulated the biclustering problem and designed the algorithm. WCS initiated the project. AWCL worked on the experimental analysis. All authors read and approved the final manuscript.
Acknowledgements
The authors thank the anonymous reviewers for their constructive comments. This work is supported by the Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering and the Hong Kong Polytechnic University (APA2P). K.O. Cheng acknowledges the research studentships provided by the University.
References

BenDor A, Chor B, Karp R, Yakhini Z: Discovering local structure in gene expression data: the orderpreserving submatrix problem.
Journal of Computational Biology 2003, 10(3–4):373384. PubMed Abstract  Publisher Full Text

Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
Science 1995, 270:467470. PubMed Abstract  Publisher Full Text

Lockhart DJ, Winzeler EA: Genomics, gene expression and DNA arrays.
Nature 2000, 405:827836. PubMed Abstract  Publisher Full Text

Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture.
Nature Genetics 1999, 22:281285. PubMed Abstract  Publisher Full Text

Raychaudhuri S, Sutphin PD, Chang JT, Altman RB: Basic microarray analysis: grouping and feature reduction.
Trends in Biotechnology 2001, 19:189193. PubMed Abstract  Publisher Full Text

Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide array.
Proceedings of the National Academy of Sciences of the United States of America 1999, 96(12):67456750. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genomewide expression patterns.
Proceedings of the National Academy of Sciences of the United States of America 1998, 95:1486314868. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Shamir R, Sharan R: Click: a clustering algorithm for gene Expression analysis. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. AAAIPress; 2000:307316.

Wu S, Liew AWC, Yan H: Cluster Analysis of Gene Expression Data Based on SelfSplitting and Merging Competitive Learning.
IEEE Transactions on Information Technology in Biomedicine 2004, 8(1):515. PubMed Abstract

Szeto LK, Liew AWC, Yan H, Tang SS: Gene Expression data clustering and visualization based on a binary hierarchical clustering framework.
Special issue on Biomedical Visualization for Bioinformatics, Journal of Visual Languages and Computing 2003, 14:341362.

Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey.
IEEE/ACM Trans Comput Biol Bioinform 2004, 1(1):2445. PubMed Abstract  Publisher Full Text

Cheng Y, Church GM: Biclustering of expression data.
Proceedings of 8th International Conference on Intelligent Systems for Molecular Biology 2000, 93103.

Yoon S, Nardini C, Benini L, Micheli GD: Discovering coherent biclusters from gene expression data using zerosuppressed binary decision diagrams.
IEEE/ACM Trans Comput Biol Bioinform 2005, 2(4):339354. PubMed Abstract  Publisher Full Text

Zhao H, Liew AWC, Xie X, Yan H: A new geometric biclustering algorithm based on the Hough transform for analysis of largescale microarray data.
Journal of Theoretical Biology 2008, 251(2):264274. PubMed Abstract  Publisher Full Text

Gan X, Liew AW, Yan H: Discovering biclusters in gene expression data based on highdimensional linear geometries.
BMC Bioinformatics 2008, 9:209.
accepted
PubMed Abstract  Publisher Full Text 
Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E: A Systematic comparison and evaluation of biclustering methods for gene expression data.
Bioinformatics 2006, 22(9):11221129. PubMed Abstract  Publisher Full Text

Inselberg A, Dimsdale B: Parallel coordinates: a tool for visualizing multidimensional geometry.

Wegman EJ: Hyperdimensional data analysis using parallel coordinates.
Journal of the American Statistical Association 1990, 85(411):664675.

Peng W, Ward MO, Rundensteiner EA: Clutter reduction in multidimensional data visualization using dimension reordering.
Proceedings of IEEE Symposium on Information Visualization 2004, 8996.

Ericson D, Johansson J, Cooper M: Visual data analysis using tracked statistical measures within parallel coordinate representations.
Proceedings of the 3rd IEEE International Conference on Coordinated & Multiple Views in Exploratory Visualization 2005, 4253.

Yang J, Ward MO, Rundensteiner EA: Interactive hierarchical displays: a general framework for visualization and exploration of large multivariate data sets.

Prasad TV, Ahson SI: Visualization of Microarray Gene Expression Data.
Bioinformation 2006, 1:141145. PubMed Abstract  PubMed Central Full Text

Craig P, Kennedy J: Coordinated graph and scatterplot views for the visual exploration of microarray timeseries data.
Proceedings of IEEE Symposium on Information Visualization 2003, 173180.

Hochheiser H, Baehrecke EH, Mount SM, Shneiderman B: Dynamic querying for pattern identification in microarray and genomic data.
Proceedings of IEEE International Conference on Multimedia and Expo 2003, 3:453456.

Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data 2002, 394405.

Cheng KO, Law NF, Siu WC, Lau TH: BiVisu: software tool for bicluster detection and visualization.
Bioinformatics 2007, 23(17):23422344. PubMed Abstract  Publisher Full Text

BicAT (Biclustering Analysis Toolbox) [http://www.tik.ee.ethz.ch/sop/bicat/] webcite
2006.

Clustering by Pattern Similarity: the pCluster Algorithm [http://wis.cs.ucla.edu/~hxwang/proj/delta.html] webcite
2002.

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, IsselTarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: tool for the unification of biology.
Nature Genetics 2000, 25:2529. PubMed Abstract  Publisher Full Text

CastilloDavis CI, Hartl DL: GeneMerge – postgenomic analysis, data mining, and hypothesis testing.
Bioinformatics 2003, 19(7):891892. PubMed Abstract  Publisher Full Text

Teng L, Chan LW: Biclustering gene expression profiles by alternately sorting with weighted correlated coefficient.
Proceedings of IEEE International Workshop on Machine Learning for Signal Processing 2006, 289294.

Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N: Revealing modular organization in the yeast transcriptional network.
Nature Genetics 2002, 31:370377. PubMed Abstract  Publisher Full Text

Ihmels J, Bergmann S, Barkai N: Defining transcription modules using largescale gene expression data.
Bioinformatics 2004, 20:19932003. PubMed Abstract  Publisher Full Text

Murali TM, Kasif S: Extracting conserved gene expression motifs from gene expression data.
Pac Symp Biocomput 2003, 7788. PubMed Abstract

Yoon S, Nardini C, Benini L, Micheli GD: Enhanced pClustering and its applications to gene expression data.
Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering 2004, 275282.