Department of Computer Science, King's College London, UK

School of Information & Communication Technology, Griffith University, Queensland, Australia

Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China

School of Electrical & Information Engineering, University of Sydney, NSW 2006, Australia

Abstract

Background

In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns.

Results

In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes.

Conclusion

We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.

Background

In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification

When a subset of genes shares similar transcriptional characteristics only across a subset of measures, the conventional algorithm may fail to uncover useful information between them. In Fig.

An illustrative example where conventional clustering fails but biclustering works

**An illustrative example where conventional clustering fails but biclustering works:** (a) A data matrix, which appears random visually even after hierarchical clustering. (b) A hidden pattern embedded in the data would be uncovered if we permute the rows or columns appropriately.

The hidden pattern in Fig.

Examples of different bicluster patterns

**Examples of different bicluster patterns:** (a) constant values, (b) constant rows, (c) constant columns, (d) additive coherent values, (e) multiplicative coherent values, and (f) linear coherent values.

In this work, we deal with numerical biclusters only. There are also works

Previous work on biclustering

Throughout the paper, we use ^{N × M }to denote a gene expression data matrix with _{i }∈ ℜ^{1 × M }represents the expression of the gene

Bicluster of constant values is obviously the simplest type. A bicluster of constant values can be modeled as

_{IJ }+

where _{IJ }is the typical value of the bicluster and

If the noise is additive, a bicluster of constant rows can be modeled as

_{IJ }+ _{i }+

where _{i }is the _{i }for each row of a bicluster. However, for an accurate estimate of _{i}, we need to know the location of a bicluster, which is exactly the problem we need to solve. The noise _{i}. Instead of relying on data normalization, Califano

and then add additional rows or columns into it to produce a bicluster that is as large as possible. Sheng

A bicluster of additive coherent values with additive noise can be modeled as

_{IJ }+ _{i }+ _{j }+

Cheng and Church

A bicluster of multiplicative coherent values with additive noise can be modeled as

_{IJ }× _{i }× _{j }+

Kluger

Madeira and Oliveira

Although the classification into additive or multiplicative patterns is not perfect, it is nevertheless applicable to many existing biclustering algorithms, which can all be formulated using the general linear model proposed in this paper. In fact, in most biclustering algorithms that deal with expression values only, the underlying theme is the coherency in expression values within the biclusters. Our general linear model of Fig.

A high-dimensional geometric method for biclustering

As pointed out in

In this paper, we extend our previous work

Results

We tested our algorithm using synthetic dataset and human lymphoma dataset. For synthetic dataset, we use a test model proposed in

Synthetic dataset

We generated a synthetic dataset containing four overlapping biclusters of constant columns, constant rows, and multiplicative coherent values, and tested the ability of our approach to detect these patterns simultaneously. To test noise resistance of our method, we embedded the biclusters into a noisy background generated by a uniform distribution

A synthetic dataset with multiple overlapping biclusters of different patterns and the biclusters extracted using the proposed method

**A synthetic dataset with multiple overlapping biclusters of different patterns and the biclusters extracted using the proposed method.** (a) The data matrix before random row and column permutation, (b) bicluster 1 of constant rows, (c) bicluster 2 of constant columns, (d) bicluster 3 of constant columns, (e) bicluster 4 of multiplicative coherent values, (f) the extra bicluster extracted by the proposed method, and (g) the multiplicative coefficients of each row in bicluster 4.

In this experiment, the three biclusters contain additive coherent values, and both the Gibbs sampling method

Biological Data: Human Lymphoma Dataset

We apply our algorithm to the lymphoma dataset

We compare our algorithm with six existing algorithms, i.e., OPSM

The histogram in Fig.

Proportion of biclusters significantly enriched by a GO Biological Process category for the six selected biclustering methods

**Proportion of biclusters significantly enriched by a GO Biological Process category for the six selected biclustering methods.** The columns are grouped method-wise, and different bars within a group represent the results obtained for five different significance levels

Our method is also capable of detecting biclusters with general linear coherent values. Fig. _{0}, _{1}, ..., _{10}, the pattern of this bicluster can be expressed as _{0 }= 0.57_{1 }- 0.08 = 0.38_{2 }- 0.24 = 0.27F_{3 }- 0.15 = 0.36_{4 }- 0.26 = 0.36_{5 }- 0.27 = 0.30_{6 }- 0.25 = 0.37_{7 }- 0.22 = 0.28_{8 }- 0.27 = 0.27_{9 }- 0.28 = 0.22_{10 }- 0.29. The detailed results from the GOTermFinder at significance level of 5% are provided in Fig.

Biclusters detected in the lymphoma dataset

**Biclusters detected in the lymphoma dataset.** (a) A bicluster of linear coherent values detected by our algorithm (for the full size image, please see Additional File

The GO-based evaluation for the bicluster of Fig. 5a using the GOTermFinder

**The GO-based evaluation for the bicluster of Fig. 5a using the GOTermFinder.** The upper table is from the biological process ontology; the middle table is from the molecular function ontology; and the lower table is from the cellular component ontology.

In Additional File

Information for additive biclusters detection on the Human Lymphoma Dataset. The parameters used in the proposed biclustering algorithm for the Human Lymphoma Dataset are given.

Click here for file

All detected biclusters. A list of all biclusters with 1 showing corresponding genes/arrays covered by the bicluster while 0 is the contrary.

Click here for file

GO annotation of six selected biclusters. The expression heat map and GO annotation table of six biclusters are given here.

Click here for file

All detected biclusters with full data. All the detected biclusters with full data are given here.

Click here for file

A bicluster of linear coherent values in the lymphoma dataset. A full size image showing the linear coherent bicluster detected.

Click here for file

Conclusion

We analyzed the different type of numerical biclusters and proposed a general linear coherent bicluster model that effectively captures the zero and first order coherent relationships within a bicluster. Then, we presented a novel interpretation of the biclustering problem in terms of the geometric distributions of data points in a high dimensional data space. In this perspective, the biclustering problem becomes that of detecting structures of known linear geometries, i.e., hyperplanes, in the high dimensional data space. We have shown that many common types of bicluster are just different spatial arrangements of the hyperplanes in the high dimensional data space. This novel perspective allows us to perform biclustering geometrically using a hyperplane detection algorithm. The experiment results on both synthetic and real gene expression datasets have demonstrated that our algorithm is very effective.

Method

Although the six patterns in Fig.

When a pattern is embedded in a larger data matrix with extra measurements, i.e., a bicluster that covers only part of the measurements in the data, the points or lines defined by the bicluster would sweep out a hyperplane in a high dimensional data space. Assume that we have a three-measurement experiment with the measurements denoted by

_{0 }+ _{1}_{3}

where _{i}, (_{2}_{2 }= 0. The coordinates that appeared in Eq. (4) denote the measurements the bicluster covers, and the points on the plane denote the objects or genes in that bicluster. In Fig.

If we visualize the data in Fig. 1a in a high-dimensional space, the hidden pattern stands out

**If we visualize the data in Fig. 1a in a high-dimensional space, the hidden pattern stands out.** Due to the difficulties in visualizing data beyond 3D, we only select columns 32, 41 and 45 in Fig. 1a to form a new data matrix with a 2-column bicluster embedded inside. In this figure, there exists an obvious plane, which provides clues about the hidden bicluster in the data.

In general, different bicluster patterns discussed above can be uniquely defined by specific geometric structures (lines, planes or hyperplanes) in a high dimensional data space. In a 3D space, if we denote the three measurements as

Different geometries (lines or planes) in the 3D data space for corresponding bicluster patterns

**Different geometries (lines or planes) in the 3D data space for corresponding bicluster patterns.** In each table, the shaded columns are covered by a bicluster. (a) A bicluster with constant values: represented by one of the lines that are parallel to the

Based on the geometric perspective discussed above, we propose a geometric gene expression biclustering framework that involves the following two steps. First, we detect the hyperplanes that exist in the gene expression data. Then we analyze whether a required pattern exists for the genes that lie in these hyperplanes.

A powerful technique for line detection in noisy 2-D images and for plane detection in noisy 3-D data called the Hough transform (HT)

However, it may be difficult to use the standard HT for more than 3 dimensions because of the large computational complexity and storage requirement. In this work, we use the Fast Hough transform (FHT)

Plane detection using the fast Hough transform

We use {_{0}, _{1}... _{M-1}} to denote the coordinates of _{0}(_{1}(_{M-1}(

In a 2-D space, a line can be described by

where (

is used for lines with |

Suppose that among all the observed data [_{0}(_{1}(_{M-1}(

where {_{0}, _{1}, ..., _{M-1}} are coordinates of points in observed data space and {_{1}, _{2}, ..., _{M}} are

We find that the parameters {_{1}, _{2}, ..., _{M}} are given by the intersection of many hyperplanes given by Eq. (8).

Suppose that we know the initial ranges of value {_{1}, _{2}, ..., _{M}} are centered at {_{1}, _{2}, ..., _{M}} and with half-length {_{1}, _{2}, ..., _{M}}. We can divide these ranges into very small "array accumulators" so that each array accumulator can determine a unique array of values {_{1}, _{2}, ..., _{M}} within the acceptable tolerance. According to Eq. (8), one feature point in the observed signal space is mapped into many points (e.g., hyperplanes) in the parameter space. An accumulator in the parameter space containing many mapped points (e.g., the intersection of many hyperplanes) reveals the potential feature of interest.

According to above analysis, the FHT-based plane detection method includes three parts. First, we need a hyperplane formulation as in Eq. (8). Second, we divide the parameter space into accumulators that is small enough so that the desired resolution is satisfied. Third, for the accumulators, let every point in the observed data vote for them. If the votes that an accumulator receives is more than a selected threshold, we detect a hyperplane in the observed data space as given by Eq. (7), where the values of {_{1}, _{2}, ..., _{M}} are given by the accumulator. Now we introduce each part of the algorithm in details.

Hyperplane formulation

The FHT does not use Eq. (8) directly. Suppose that we know the initial ranges of values {_{1}, _{2}, ..., _{M}} are centered at {_{1}, _{2}, ..., _{M}} and with half-length {_{1}, _{2}, ..., _{M}}. According to Eq. (8), we have

where

In fact, it is not necessary for the dimension of the parameter space

where _{i }is the _{i}(_{i }is an interval of length 2, with center at _{i}/_{i}. All these ranges comprise a _{1}, ...., _{k}).

Vote counting scheme

As mentioned before, every point in the observed data votes for supporting accumulators. We know that each accumulator corresponds to a group of range values of (_{1}, _{2}, ..., _{M}). For each point _{1}, _{2}, ..., _{M}) lie in this accumulator, and it will give a vote to this accumulator. An accumulator receiving votes more than a threshold reveals a corresponding hyperplane in the observed data space.

So, to determine whether an accumulator received a vote from a point _{1}, ..., _{k }] and

If Eq. (12) is satisfied, gene

K-tree representation

For simplicity, we have assumed above that the parameter space was directly divided into very small accumulators. Actually, this is not necessary. The FHT algorithm recursively divides the parameter space into hypercubes from low to high resolutions. It performs the subdivision and the subsequent "vote counting" is done only in hypercubes with votes exceeding a selected threshold. This hierarchical approach leads to a significant reduction in both computational time and storage space compared to the conventional HT.

For the FHT, we represent the parameter space as a nested hierarchy hypercube. We can associate a _{0 }with side-length _{0}. Each node of the tree has 2^{k }children arising when that node's hypercube is halved along each of its **b **= [_{1}, ..., _{k}], where each _{i }is - 1 or 1. The child index is interpreted as follows: if a node at level **C**_{l}, then the center of its child node with index [_{1}, ..., _{k}] is

where S_{l+1 }is the side length of the child at level _{l+1 }= _{l}/2.

Since we use a coarse-to-fine mechanism, for each accumulator at different levels we need to make a test using Eq. (12). For an accumulator of level _{l}, the normalized distance can be computed incrementally for a child node at level _{1}, ..., _{k}] as follows,

Test of Eq. (12) can now be expressed as: for the gene _{1}, ..., _{k}] at level

gene

According to the above analysis, the FHT is a mapping from an observed

The proposed geometric biclustering algorithm and parameter selection

To summarize, when given a set of genes expression data [_{0}(_{1}(_{M-1}(

Parameters that need to be predetermined:

(1) The minimum votes count "

(2) A transformation that maps gene expression data [_{0}(_{1}(_{M-1}(_{i }and the root hypercube.

(1) Map gene expression data onto the parameter space.

(2) Compute the initial normalized distance from the hyperplane to the root node and perform the voting procedure for the root node. For each gene, if Eq. (16) is satisfied, add one to the vote count of the root node. If the vote count for root node is larger than the threshold T and the resolution is coarser than

(3) Vote for each child node and subdivide them if needed. A similar vote-and-subdivide mechanism is performed for each new node until no new node appears.

(4) When there is no node with resolution equal to

(5) For each bundle of hyperplanes, check the common conditions (variables) and compare the hyperplanes with the models corresponding to different types of biclusters. A bundle of hyperplanes that are not consistent with any patterns in Fig.

In the procedure above, there are two parameters: minimum vote count "T" and the desired finest resolution "

In many situations, one has no knowledge about the noise in the data. An appropriate range of

Computational complexity

For FHT, the following theorem from

The FHT algorithm is highly parallel. As shown, the processing for the hypercubes or accumulators is independent of each other. Furthermore, the intersection test for a hyperplane does not depend on that of other hyperplanes. Actually, in our implementation, some simple multi-processing optimization, such as OpenMP or OpenMPI library, can achieve a high level of speedup.

In the above discussion, we assume that all the possible linear hyperlanes are to be detected using the FHT. In practice, detecting a small portion of hyperplanes is already enough for our biclustering algorithm. For example, in a dataset [_{0}(_{1}(_{M-1}(_{0}. However, using Equation _{i }= 0 _{0}. The second equation can significantly lower the comptational burden^{1}. Another optimization direction is to take advantage of the property of the gene expression data. Since the gene expression data values are distributed in the range of [-5 5], the hyperplanes

In certain special cases, we can simplify the problem according to the bicluster model. For example, if we extract biclusters of constant row, we only need to detect all the hyperplanes with _{i }= 0, 1 or -1, and if we extract multiplicative biclusters, we only need to detect those hyperplanes without intercept.

In term of CPU time, our algorithm is computationally intensive in its un-optimized general form. Based on the complexity of the FHT, the computational demands of the proposed biclustering algorithm depends on how many biclusters exist in the dataset. To give an indication of the computational cost, we run the un-optimized algorithm on a small test dataset on a personal computer (Linux OS with 2.0 G Intel Core 2 Duo processor and 1 GB memory) and record the CPU time.

We randomly select 16 conditions in Human Lymphoma Dataset to produce a 4026 × 16 matrix. The CPU time for over 800 biclusters is 1953 seconds (32.55 minutes). We can adjust the parameters to exclude small and noisy biclusters and reduce the computing time. For example, the CPU time reduces to 397 seconds (6.62 minutes) if we discard biclusters with less than 8 conditions.

For larger dataset, we need to run our algorithm on a computer cluster. For the entire 4026 × 96 Human Lymphoma Dataset, we run our algorithm on a computer cluster of 8 nodes with 2 processors each and it takes about 22 hours. Hence, the proposed algorithm is very time-consuming for large datasets if we search through the entire high-dimensional Hough space to obtain the optimal solution and detect all possible additive and multiplicative coherent patterns in the data.

The computing time can be substantially reduced if we allow the solution to the sub-optimal. For example, we can divide 96 conditions into 6 sets with 16 conditions in each set. Then, only 39.7 (6 × 6.62) minutes are needed on Linux computer described above for the biclustering process. The biclusters from the 6 sets can then be combined. Such a strategy has already been used in

Abbreviations

GO: Gene Ontology, 2D: Two dimensional, 3D: Three dimensional, NP: Non-deterministic polynomial time, HT: Hough transform.

Authors' contributions

XG worked on the hyperplane modeling, implementation and experimental analysis when he was a Ph.D. student at City University of Hong Kong. AWCL proposed the geometric perspective for biclustering, problem formulation, and algorithm design. Both XG and AWCL contributed equally to this work and should be considered as joint first author. HY initiated the project and worked on the Hough transform. All authors read and approved the final manuscript.

Note

^{1 }This method is easy to implement by only testing the hyperplane/accumulator with equal non-zero gradients. Assume there are _{i }. If we do not consider the coarse-to-fine optimization of FHT, the first equation need to process ^{M }accumulator while the second equation only need to process about ^{2}*2^{M-1}. In the case of ^{-15 }times that of the first scheme.

Acknowledgements

This work is supported by a grant from the Hong Kong Research Grant Council (project CityU122506). X.Gan is now supported by EPSRC grant EP/D062012/1.