Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Tenth International Conference on Bioinformatics. First ISCB Asia Joint Conference 2011 (InCoB/ISCB-Asia 2011): Computational Biology

Open Access Proceedings

Extracting regulatory modules from gene expression data by sequential pattern mining

Mingoo Kim1, Hyunjung Shin2, Tae Su Chung1, Je-Gun Joung134 and Ju Han Kim135*

  • * Corresponding author: Ju Han Kim juhan@snu.ac.kr

  • † Equal contributors

Author affiliations

1 Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110799, Korea

2 Dept. of Industrial and Information Systems Engineering, Ajou University, Suwon 443749, Korea

3 Systems Biomedical Informatics National Core Research Center, Seoul National University College of Medicine, Seoul 110799, Korea

4 Institute of Endemic Diseases, Seoul National University College of Medicine, Seoul 110799, Korea

5 Div. of Biomedical Informatics, Seoul National University College of Medicine, Seoul 110799, Korea

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12(Suppl 3):S5  doi:10.1186/1471-2164-12-S3-S5

Published: 30 November 2011

Abstract

Background

Identifying a regulatory module (RM), a bi-set of co-regulated genes and co-regulating conditions (or samples), has been an important challenge in functional genomics and bioinformatics. Given a microarray gene-expression matrix, biclustering has been the most common method for extracting RMs. Among biclustering methods, order-preserving biclustering by a sequential pattern mining technique has native advantage over the conventional biclustering approaches since it preserves the order of genes (or conditions) according to the magnitude of the expression value. However, previous sequential pattern mining-based biclustering has several weak points in that they can easily be computationally intractable in the real-size of microarray data and sensitive to inherent noise in the expression value.

Results

In this paper, we propose a novel sequential pattern mining algorithm that is scalable in the size of microarray data and robust with respect to noise. When applied to the microarray data of yeast, the proposed algorithm successfully found long order-preserving patterns, which are biologically significant but cannot be found in randomly shuffled data. The resulting patterns are well enriched to known annotations and are consistent with known biological knowledge. Furthermore, RMs as well as inter-module relations were inferred from the biologically significant patterns.

Conclusions

Our approach for identifying RMs could be valuable for systematically revealing the mechanism of gene regulation at a genome-wide level.