This article is part of the supplement: Tenth International Conference on Bioinformatics. First ISCB Asia Joint Conference 2011 (InCoB/ISCB-Asia 2011): Computational Biology
Extracting regulatory modules from gene expression data by sequential pattern mining
- Equal contributors
1 Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110799, Korea
2 Dept. of Industrial and Information Systems Engineering, Ajou University, Suwon 443749, Korea
3 Systems Biomedical Informatics National Core Research Center, Seoul National University College of Medicine, Seoul 110799, Korea
4 Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul 110799, Korea
5 Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea
BMC Genomics 2011, 12(Suppl 3):S5 doi:10.1186/1471-2164-12-S3-S5Published: 30 November 2011
Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray gene-expression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. However, previous sequential pattern mining-based biclustering has several weak points in that they can easily be computationally intractable in the real-size of microarray data and sensitive to inherent noise in the expression value.
In this paper, we propose a novel sequential pattern mining algorithm that is scalable in the size of microarray data and robust with respect to noise. When applied to the microarray data of yeast, the proposed algorithm successfully found long order-preserving patterns, which are biologically significant but cannot be found in randomly shuffled data. The resulting patterns are well enriched to known annotations and are consistent with known biological knowledge. Furthermore, RMs as well as inter-module relations were inferred from the biologically significant patterns.
Our approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level.