Abstract
Background
Newly microarray technologies yield largescale datasets. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. Systematic analysis of those datasets provides the increasing amount of information, which is urgently needed in the postgenomic era. Biclustering, which is a technique developed to allow simultaneous clustering of rows and columns of a dataset, might be useful to extract more accurate information from those datasets. Biclustering requires the optimization of two conflicting objectives (residue and volume), and a multiobjective artificial immune system capable of performing a multipopulation search. As a heuristic search technique, artificial immune systems (AISs) can be considered a new computational paradigm inspired by the immunological system of vertebrates and designed to solve a wide range of optimization problems. During biclustering several objectives in conflict with each other have to be optimized simultaneously, so multiobjective optimization model is suitable for solving biclustering problem.
Results
Based on dynamic population, this paper proposes a novel dynamic multiobjective immune optimization biclustering (DMOIOB) algorithm to mine coherent patterns from microarray data. Experimental results on two common and public datasets of gene expression profiles show that our approach can effectively find significant localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a speciesindependent manner.
Conclusions
The proposed DMOIOB algorithm is an efficient tool to analyze large microarray datasets. It achieves a good diversity and rapid convergence.
Background
Rapid development of the DNA microarray technology makes it very possible to study the transcriptional response of a complete genome to different experimental conditions. The rapid increasing of microarray datasets provides unique opportunities to perform systematic functional analysis in genome research. A subset of genes showing correlated coexpression patterns across a subset of conditions are expected to be functionally related. One important research area in bioinformatics and clinical research is finding patterns which relate to disease diagnosis, drug discovery and the function prediction.
Biclustering is proposed for grouping simultaneously genes set and condition set over which the gene subset exhibit similar expression patterns. Cheng and Church [1] introduce first biclustering to mine genes clusters with respect to a subset of the conditions from microarray data. Up to date, a number of biclustering algorithms for microarray data analysis have been developed such as δbiclustering [1], FLOC [2], pClustering [3], statisticalalgorithmic method for biclustering analysis(SAMBA) [4], spectral biclustering [5].
During the last three decades, inspired by biology views, some heuristic approachs such as evolutionary algorithms [6] have been proposed to discover global optimal solutions in gene expression data. For multiobjective optimization (MOO) problem, multiobjective evolutionary algorithms (MOEAs) [7,8] are proposed to discover efficiently global optimal solution.
Recently an artificial immune system is introduced to deal with MOO problem. Jiao [9] proposes immune genetic algorithm(IGA) which improves the searching ability and adaptability, greatly increase the converging speed. Yoo and Hajela [10] first extends the immune system to solve multiobjective optimization problems. Coello [11,12] propose an algorithm based on the immune response principle to solve MOO problem and effectively improve the diversity of Pareto optimal solutions. BICaiNet (Artificial immune Network for Biclustering) [13] being an immuneinspired biclustering algorithm is used to group similar texts efficiently and extract implicit useful information from groups of texts. Coelho [14] combines the multipopulation of aiNet and the biclustering techniques, and proposes MOMaiNet (MultiObjective Multipopulation Artificial Immune Network) algorithm to mining biclusters. Liu[15] proposes a novel multiobjective immune biclustering (MOIB) algorithm to find more significant biclusters from gene expression data.
Most MOPs use a fixed population size to find nondominated solutions for obtaining the Paterto front. The computational cost is the greatest influence of population size on these populationbased metaheuristic algorithms. Hence dynamically adjusting the population size need consider the balance between computational cost and the algorithm performance. Some methods using dynamic size are proposed. Tan [16] proposed an incrementing MOEA(IMOEA) that adaptively computes am appropriate population size according to the online discovered tradeoff surface and its desired population size that corresponds to the distribution density. Yen and Lu [17] proposed dynamic population size MOEA(DMOEA) that includes a populationgrowing strategy based on the converted fitness and a populationdeclining strategy that resorts to the following age, health and crowdedness. Leong and Yen [18] introduced dynamic population size and a fixed number of multiple swarms into multiobjective optimization algorithm that improved diversity and convergence of optimization algorithm.
Methods
Based on the immune response principle and εdominance strategy [19], this paper incorporating dynamic population size [18] into MOIB [15] algorithm, and proposes a novel dynamic multiobjective immune optimization biclustering(DMOIOB) algorithm to find one or more significant biclusters of maximum size in microarray data. In the proposed algorithm, the feasible solutions are regarded as antibodies and Pareto optimal solutions are preserved in an antigen population updated by εdominance relation and computation of crowding distance. Many Pareto optimal solutions can be effectively obtained and distributed onto the Pareto front in this way. Three objectives, the size, homogeneity and row variance of biclusters, are satisfied simultaneously by applying three fitness function in optimization framework. A low mean squared residue (MSR) score of bicluster denotes that the expression level of each gene within the bicluster is similar over the range of conditions. Therefore, we focus on finding biclusters of maximum size, with mean squared residue lower than a given δ, with a relatively high row variance.
Biclusters
Given a gene expression data matrix D=G×C={d_{ij}} (here i∊[1, n] , j∊[1, m]) is a realvalued n×m matrix, here G is a set of n genes {g_{1}, g_{2}, ⋯, g_{n}}, C a set of m biological conditions {c_{1}, c_{2}, ⋯, c_{n}}. Entry d_{ij} means the expression level of gene g_{i} under condition c_{j}. If there is a submatrix B=g×c, where g⊂G, c⊂C, to satisfy certain homogeneity and minimal size of the cluster, we say that B is a bicluster.
Bicluster encoding
Each bicluster is encoded as an individual of the population. Each individual is represented by a binary string of fixed length n+m, where n, m is the number of genes, conditions of the microarray dataset, respectively. The first n bits are responding to n genes, the following m bits to m conditions. If a bit is set to 1, it means that the responding gene or condition belongs to the encoded bicluster; otherwise it does not. This encoding presents the advantage of having a fixed size, thus using simply of standard variation operations. Therefore, the string “0110100010#0110100110” presents the individual encoding a bicluster with 4 genes and 5 conditions, and its size is 4×5=20. Where # is a symbol used to delimit the bits of the rows from the columns.
Fitness function
Our hope is mining biclusters with low mean squared residue, with high volume and genedimensional variance, and those three objectives in conflict with each other are well suited for multiobjective to model. To achieve these aims, this paper uses the same fitness functions as [20].
Update of ϵPareto set of the population
In order to guarantee the convergence and maintain diversity in the population at the same time, we implement updating of ϵPareto set of the population during clonal selection operation. A general scheme of the updating algorithm is given in [19].
Immune response principle
An immune system can collect biological processes of an organism that protects against disease by identifying and killing pathogens and tumour cells. It can detect a wide variety of viruses and parasitic worms, and distinguish them from the organism's own healthy cells and tissues to protect an organism. It is highly distributed, highly adaptive, selforganization in nature [21]. Artificial Immune System (AIS) is a new computational approach for the computational intelligence community. It has widely such as pattern recognition, data analysis, function approximation and optimization.
The immune selection principle [22] is used to describe the basic properties of an adaptive immune response to an antigenic stimulus [21]. When applying the immune selection principle to solve multiobjective problem, it can generate several elements from the Pareto optimal set at one run. Clonal selection operation is used to implement local search in many different directions along the Pareto front. Mutation operator is applied to explore through the whole search space, thus attain the exact Pareto front of the problem.
DMOIO biclustering algorithm
Multipleobjective optimization aim at the following two competing objectives: 1) to quickly obtain a nondominated front that is close to the true Pareto front and 2) to maintain the diversity of the solutions along the resulting Pareto front. These two objectives are in conflict each other because maintaining the diversity will slow down the convergence speed and may degrade the quality of the resulting Pareto front. On one hand, MOIO algorithms tend to the optimal regions. On the other hand, the clonal selection behaviour may lead to premature convergence in the search space and produce a uniformly distributed Pareto front. The influence of population size on the performance of MOIO is the computational cost. It is difficult to deal with this conflict issues for a MOIO with a fixed population size because a predetermined computation resource has to be allocated and properly distributed between two competing objectives. Hence, inspired by [18], during biclustering of the microarray datasets, dynamically adjusting the population size to explore the search space in balance between two competing objectives is applied in this paper.
Initial population
In most multiobjective optimization methods the initial archive is set to empty. The first archive contains the nondominated solutions of the initial population. Each antigen selects best local guide from the archive members using Sigma method [23]. Selecting the first local guides from the archive has a great impact on the diversity of solutions in the next generations. Hence the diversity of solutions depends on the first nondominated solutions. But if the initial archive is not empty and contains some welldistributed nondominated solutions, the solutions converge faster than before, while keeping a good diversity. There are two methods to find a good initial archive. The first possibility is to run the MOIO with an empty archive for a large population and a few generations. The large population gives us a good diversity and a few generations (e.g., 10 generations) are used to develop the population to a little convergence. On another hand, MOEA can produce some good solutions with a very good diversity after a few generations. So another possibility is to use the results of a small MOEA method. Here, small means a MOEA with a few individuals and a few generations (e.g., 10 individuals and 10 generations). This paper first runs stateofart MOEA(NSGAII [24] ) with 30 individuals and 10 generations to produce the initial archive of DMOIOB.
Fining the global best solution
To order to find the global best solutions, this paper uses the basic idea of Sigma method [23] and by considering the objective space, finding the best local guide p_{g} among the archive members for the antigen iof population as follows. In the first step, we assign the value σ_{j}, to each antigen j in the archive. In the second step, σ_{i}for antigen i of the population is calculated, and then calculates the distance between the σ_{i} and σ_{j}, ∀_{j}j=1,⋯,A. Finally, the antigen kin the archive A which its σ_{k}has the minimum distance to σ_{i} is selected as the best local guide for the antigen i. Therefore, antigen p_{g} = x_{k}is the best local guide for antigen i.In other words, each antigen that has a closer sigma value to the sigma value of the archive member, must select that archive member as the best local guide. In the case of two dimensional objective spaces, closer means the difference between the sigma values and in the case of m dimensional objective space, it means the mdimensional Euclidean distance between the sigma values. Algorithm 1 shows the algorithm of the Sigma method for finding the best local p_{g} for the antigen i of the population [23]. Here, the function Sigma calculates the σ value and dist computes the Euclidian distance. y_{i} denotes the objective value of the jth element of the antigen population P.
Population adding method
Population adding strategy mainly consist in increasing the population size to ensure sufficient number of individuals to contribute to the search process and to place those new individuals in unexplored areas to discover new possible solutions. Based on the strategies of dynamic population size [18], the following procedures is proposed to facilitate exploration and exploitation capabilities for DMOIOB.
Step 1: Selecting candidate antibodies added
The nondominated set considered as candidate antibodies must have the highest probability of generating new antibodies that will improve the convergence toward the Pareto front. Therefore the number of potential antibodies determined via ns = INT(r1× (total no. of antibodies in nondominated set)) is randomly selected from the nondominated set. Where r_{1} denotes a random number obtained from a uniform distribution within [0, 1].
Step 2: Defining the number of mutation
The number of mutation of the selected antibody is adaptively determined every iteration. Each selected antibody’s responsibility is to generate a certain number of new antibodies from the selected antibody. A probability value is used to determine the number of perturbations adaptively in which the number of mutation (number of new antibodies to be generated) is bound by the minimum and maximum number of mutation.
Step 3: Limiting the range of new antibodies
In proposed algorithm, to balance the exploitation and exploration capabilities and to avoid generating too many new antibodies from being too far away from the selected antibodies, it is necessary to generate a higher number of new antibodies within the neighbourhood than outside of the neighbourhood which similar to [16].
Population decreasing method
To prevent the excessive growth in population, a population decreasing strategy which similar to [16] is proposed to adaptively control the population size. In DMOIOB, the condition to remove a antibody depends upon Sigma values. Sigma value is utilized to select potential antibodies to be deleted. After computing all the distance between Sigma value of each antibody and Sigma value of its corresponding best local guide, the rank of the distance of each antibody can be attained. If the removal of antibodies is only based upon the distance rank of each antibody, then there is a possibility of eliminating an excessively large quantity of antibodies in which some may carry unique schema to contribute in the search process. A selection ratio is implemented to regulate the number of antibodies to be removed and to provide some degrees of diversity preservation at the same time. A selection ratio that is inspired by Coello and Montes [25] is used to stochastically allocate a small percentage of antibodies in the population for removal. Hence, given a selection ratio S ∊ [0, 1], at iteration t, the number of antibodies to be eliminated is S×Pt. Note that the choice of the selection ratio is dependent upon the user’s preference, where it can be a function of the swarm population size or it can be a fixed ratio. For this paper, the selection ratio is a fixed number, which is set to be a small number, i.e., S ≤ 0.2. With a small selection ratio, there is a possibility that those selected antibodies in Pt may not be eliminated. In other words, some of the selected antibodies in Pt whose rank indicators are low may remain in the next iteration. In addition, a small selection ratio can prevent the removal of an uncontrollable large number of antibodies while providing some degree of diversity preservation. This paper set S =0.02.
DMOIOB algorithm
We propose a dynamic MOIO biclustering algorithm (DMOIOB) to mine biclusters from the microarray datasets to attain the global optimum solutions. We incorporates the following three strategies: 1) ϵdominance to quicken convergence speed; 2) Sigma method to find good local guides; 3) populationgrowing strategy to increase the population size to promote exploration capability; and 4) population declining strategy to prevent the population size from growing excessively.
The pseudocode of the proposed DMOIOB algorithm is given in Algorithm 2.
DMOIOB algorithm iteratively updates the antigens population until userdefined number of generation are generated and last converges to the optimal solution.
Results
This paper applies the proposed DMOIOB algorithm to mine biclusters from two well known datasets and compare the diversity and convergence of the DMOIOB algorithm with MOIB algorithm. Lastly, the biological relevance of the biclusters found by DMOIOB is given.
Datasets and data preprocessing
The first dataset is the yeast Saccharomyces cerevisiae cell cycle expression data [26], and the second dataset is the human Bcells expression data [27].
The yeast dataset collects expression level of 2,884 genes under 17 conditions. All entries are integers lying in the range of 0600. Out of the yeast dataset there are 34 missing values. The 34 missing values are replaced by random number between 0 and 800 [1].
The human Bcells expression dataset is collection of 4,026 genes and 96 conditions, with 12.3% missing values, lying in the range of integers 750650.The missing values are replaced by random numbers between 800800 [1]. However, those random values affect the discovery of biclusters[28]. For providing a fair comparison with existing methods here set the same parameter for δ as [1], i.e., for the yeast data δ=300, for the human Bcells expression data δ=1200. The two gene expression dataset are taken from [1].
Testing
DMOIOB algorithm is implemented in JAVA programming language and is performed on a 1.7GHz Pentium 4PC with 512M of RAM running Windows XP. To evaluate its performance, the proposed algorithm is compared to MOIB [15] algorithm on two well known datasets, the yeast cell cycle expression data [26] and the human Bcells expression data [27].
Yeast dataset
Table 1 shows the information of six biclusters out of the one hundred biclusters found on the yeast dataset. The first hundred biclusters found by the proposed algorithm cover 75.8% of the genes, 100% of the conditions and in total 56.7% cells of the expression matrix. The biclusters found by MOIB [15] cover 62.4% of the genes, 100% of the conditions and in total 54.8% cells of the expression matrix. While an average coverage of 51.34% cells is reported in MOEB [7].
Table 1. Information of biclusters found on yeast dataset
Figure 1 depicts sample gene expression profiles for small biclusters (bicluster 22) for the yeast dataset. They present a similar behaviour, and there are two genes with lower expression level than that those of the main group of genes on all the conditions.
Human Bcells expression dataset
Table 2 shows the information of six biclusters out of the one hundred found on the human dataset. Table 2 shows that the first hundred biclusters found by the proposed algorithm cover 39.1% cells of microarray dataset (51.2%of the genes and 100% of the conditions). The one hundred biclusters found by MOIB [15] on the human dataset cover 33.6% cells of dataset(43.7%of the genes and 100% of the conditions), whereas an average of 20.96% cells are covered in MOEB [7].
Table 2. Biclusters found on human dataset
Comparative analysis
In this section, this paper compares the proposed algorithm with MOIB algorithm on the yeast dataset and the human dataset and the results are showed in Table 3.
Table 3. Comparative study of three algorithms
From Table 3, the biclusters found by DMOIOB has a slightly higher squared residue and a higher bicluster size than those by MOIB on both yeast dataset and human dataset. It is clear from the above results that the proposed MOIB algorithm performs best in maintaining the diversity of solutions.
For computation cost we find that the computation time of MOIOB is 88.02s on yeast dataset and 258.48s on human dataset, is superior to that of MOIB.
In total it is clear from the above results that the proposed DMOIOB algorithm performs best in maintaining diversity, achieving convergence.
Biological analysis of biclusters
We determine the biological relevance of the biclusters found by DMOIOB on the yeast dataset in terms of the statistically significant GO annotation database. The gene ontology (GO) project (http://www.geneontology.org webcite) provides three structured, controlled vocabularies that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a speciesindependent manner. To better understand the mining results, we feed genes in each bicluster to OntoExpress(http://vortex.cs.wayne.edu/Projects.html webcite) and obtain a hierarchy of functional annotations in terms of Gene Ontology for each bicluster.
The degree of enrichment is measured by pvalues which use a cumulative hypergeometric distribution to compute the probability of observing the number of genes from a particular GO category (function, process and component)within each bicluster. The pvalues are calculated for each functional category in each bicluster to denote how well those genes match with the corresponding GO category given in Table 4.
Table 4. Significant GO terms of genes in three biclusters
Conclusions
This paper has provided a novel dynamic multiobjective immune optimization biclustering framework for mining biclusters from microarray datasets. We focus on finding maximum biclusters with lower mean squared residue and higher row variance. Those three objectives are incorporated into the framework with three fitness functions. We apply immune clonal selection principle and Sigma method to find better local guide in objective space and combine εdominance and crowding distance strategy to improve the diversity of the solutions and to quicken convergence of the algorithm; a population adding method that dynamically grows new individuals with enhanced exploration and exploitation capabilities; a population decreasing strategy to balance and control the dynamic population size. The results on the yeast microarray dataset and the human Bcells expression dataset verify the good quality of the found biclusters, and comparative analysis show that the proposed MOIB is superior to MOIB algorithm in terms of the diversity of solutions and the convergence of the algorithm.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JL design DMOBIO to mine biclusters from gene expression data and drafted the manuscript. ZL, XH and EP were involved in study design and coordination and revised the manuscript. YC conducted the algorithm design.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (60973105), the fund of the State Key Laboratory of Software Development Environment, Scientific Research Fund of Hunan Provincial Education Department (09A105), Talents Import Fund of Central South University of Forestry and Technology (1040177), NSF CCF 0905291 and NSF CCF 1049864
This article has been published as part of BMC Genomics Volume 12 Supplement 2, 2011: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2010. The full contents of the supplement are available online at http://www.biomedcentral.com/14712164/12?issue=S2.
References

Cheng Y, Church GM: Biclustering of expression data.
Proc Int Conf Intell Syst Mol Biol 2000, 8(1):93103. PubMed Abstract

Jiong Yang, Wang Wei, Wang Haixun, Yu P: Enhanced biclustering on expression data. In Proceedings of Third IEEE Symposium on Bioinformatics and Bioengineering: 1012 March, 2003 Bethesda, MD. Edited by H.J.a.V. Magalooikonomou. IEEE computer society; 2003:321327.

Wang H, Wang W, Yang J, Yu PS: Clustering by pattern similarity in large data sets. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data: 0206 June, 2002 Madison, WI. Edited by M.J. Franklin, B. Moon, and A. Ailamaki. ACM Press; 2002:394405.

Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data.
Bioinformatics 2002, 18(S1):S13644. PubMed Abstract  Publisher Full Text

Dhillon IS: Coclustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining: 2629 August, 2001; San Francisco, CA. ACM; 2001:269274.

Bleuler S, Prelic A, Zitzler E: An EA framework for biclustering of gene expression data. In Proceedings of 2004 Congress on Evolutionary Computation (CEC2004) 1923 June, 2004; Portland, Or. IEEE Computer Society; 2004.

Mitra S, Banka H: Multiobjective evolutionary biclustering of gene expression data.
Pattern Recognition 2006, 39(12):24642477. Publisher Full Text

Divina F, AguilarRuiz JS: A multiobjective approach to discover biclusters in microarray data. In Proceedings of the 9th annual conference on Genetic and evolutionary computation: 711 July, 2007 London, England UK. ACM; 2007:385392.

Jiao L, Wang L: A novel genetic algorithm based on immunity.
IEEE Transactions on Systems, Man and Cybernetics, Part A 2000, 30(5):552561. Publisher Full Text

Yoo J, Hajela P: Immune network simulations in multicriterion design.
Structural and Multidisciplinary Optimization 1999, 18(2):8594.

Coello CAC, Cortes NC: An approach to solve multiobjective optimization problems based on an artificial immune system. In Proceedings of First International Conference on Artificial Immune Systems 2002; Canterbury, UK. Edited by J. Timmis and P.J. Bentley. Unversity of Kent at Canterbury Printing Unit; 2002:212221.

Coello CAC, Cortés NC: Solving Multiobjective Optimization Problems Using an Artificial Immune System.
Genetic Programming and Evolvable Machines 2005, 6(2):163190. Publisher Full Text

de Castro PAD, de Franca FO, Ferreira HM, Von Zuben FJ: Applying Biclustering to Text Mining: An ImmuneInspired Approach. In Proceedings of 6th International Conference, ICARIS 2007: 2629 August, 2007; Santos, Brazil. Edited by L.N.d. Castro, F.J.V. Zuben, and H. Knidel. Springer, Heidelberg; 2007:8394.

Coelho GP, Franca FO, Zuben FJV: A MultiObjective Multipopulation Approach for Biclustering. In Proceedings of 7th International Conference, ICARIS 2008: 1013 August, 2008; Phuket, Thailand. Edited by P.J. Bentley, D. Lee, and S. Jung. Springer, Heidelberg; 2008:7182.

Junwan Liu, Li Zhoujun, Chen Yiming: Microarray Data Biclustering with MultiObjective Immune Algorithm. In Proceedings of The 5th International Conference on Natural Computation: 1416 August, 2009; Tianjin, China. Edited by H. Wang, et al.. IEEE Computer Society; 2009:200204.

Tan KC, Lee TH, Khor EF: Evolutionary algorithms with dynamic population size and localexploration for multiobjective optimization.
Evolutionary Computation, IEEE Transactions 2001, 5(6):565588. Publisher Full Text

Yen GG, Lu H: Dynamic multiobjective evolutionary algorithm: adaptive cellbased rank and density estimation.
IEEE Transactions on Evolutionary Computation 2003, 7(3):253274. Publisher Full Text

Leong WF, Yen GG: PSObased multiobjective optimization with dynamic population size and adaptive local archives.
IEEE Transactions on Systems Man and Cybernetics Part BCybernetics 2008, 38(5):12701293.

Laumanns M, Thiele L, Deb K, Zitzler E.: Combining Convergence and Diversity in Evolutionary Multiobjective Optimization.
Evolutionary Computation 2002, 10(3):263282. PubMed Abstract  Publisher Full Text

Junwan Liu, Li Zhoujun, Hu Xiaohua, Chen Yiming: Biclustering of microarray data with MOSPO based on crowding distance.

Timmis J, Knight T: Artificial Immune Systems: Using The Immune System as Inspiration for Data Mining.

Mostaghim S, Teich J: Strategies for finding good local guides in multiobjective particle swarm optimization (MOPSO). In Proceedings of The 2003 IEEE Swarm Intelligence Symposium: 2426 April, 2003; Indianapolis, IN. IEEE Computer Society; 2003:2633.

Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective genetic algorithm: NSGAII.
IEEE Transactions on Evolutionary Computation 2002, 6(2):182197. Publisher Full Text

Coello Coello CA, Mezura E: Constrainthandling in genetic algorithms through the use of dominancebased tournament selection.
Advanced Engineering Informatics 2002, 16(3):193203. Publisher Full Text

Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ: A GenomeWide Transcriptional Analysis of the Mitotic Cell Cycle.
Molecular Cell 1998, 2(1):6573. PubMed Abstract  Publisher Full Text

Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X: Distinct types of diffuse large Bcell lymphoma identified by gene expression profiling.
Nature 2000, 403(6769):503511. PubMed Abstract  Publisher Full Text

Jiong Yang, Wang Wei, Wang Haixun, Yu P: δclusters: capturing subspace correlation in a large dataset. In Proceedings of 18th International Conference on Data Engineering: 26 February  1 March 2002; San Jose CA. IEEE Computer Society; 2002:517528.