Open Access Methodology article

Towards precise classification of cancers based on robust gene functional expression profiles

Zheng Guo123*, Tianwen Zhang1, Xia Li123, Qi Wang2, Jianzhen Xu2, Hui Yu2, Jing Zhu2, Haiyun Wang3, Chenguang Wang2, Eric J Topol4, Qing Wang4 and Shaoqi Rao24*

Author Affiliations

1 Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China

2 Department of Bioinformatics, Harbin Medical University, Harbin 150086, China

3 School of Biological Science and Technology, Tongji University, Shanghai, 200092, China

4 Department of Molecular Cardiology and Department of Cardiovascular Medicine, the Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA

For all author emails, please log on.

BMC Bioinformatics 2005, 6:58  doi:10.1186/1471-2105-6-58

Published: 17 March 2005



Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level.


Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s) for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles.


This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge has facilitated the interpretation of the underlying molecular mechanisms for complex human diseases at the modular level.