This article is part of the supplement: Eleventh International Conference on Bioinformatics (InCoB2012): Computational Biology
Meta-analytical biomarker search of EST expression data reveals three differentially expressed candidates
1 Institute of Biomedical Informatics, National Yang Ming University, Taipei, Taiwan, R.O.C
2 Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan, R.O.C
3 Department of Biotechnology and Laboratory Science in Medicine and Institute of Biotechnology in Medicine, National Yang Ming University, Taipei, Taiwan, R.O.C
4 Bioinformatics Center, Chang Gung University, Taoyuan, Taiwan, R.O.C
5 Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan, R.O.C
6 Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan, R.O.C
7 Center for Systems and Synthetic Biology, National Yang Ming University, Taipei, Taiwan, R.O.C
BMC Genomics 2012, 13(Suppl 7):S12 doi:10.1186/1471-2164-13-S7-S12Published: 13 December 2012
Researches have been conducted for the identification of differentially expressed genes (DEGs) by generating and mining of cDNA expressed sequence tags (ESTs) for more than a decade. Although the availability of public databases make possible the comprehensive mining of DEGs among the ESTs from multiple tissue types, existing studies usually employed statistics suitable only for two categories. Multi-class test has been developed to enable the finding of tissue specific genes, but subsequent search for cancer genes involves separate two-category test only on the ESTs of the tissue of interest. This constricts the amount of data used. On the other hand, simple pooling of cancer and normal genes from multiple tissue types runs the risk of Simpson's paradox. Here we presented a different approach which searched for multi-cancer DEG candidates by analyzing all pertinent ESTs in all categories and narrowing down the cancer biomarker candidates via integrative analysis with microarray data and selection of secretory and membrane protein genes as well as incorporation of network analysis. Finally, the differential expression patterns of three selected cancer biomarker candidates were confirmed by real-time qPCR analysis.
Seven hundred and twenty three primary DEG candidates (p-value < 0.05 and lower bound of confidence interval of odds ratio ≧ 1.65) were selected from a curated EST database with the application of Cochran-Mantel-Haenszel statistic (CMH). GeneGO analysis results indicated this set as neoplasm enriched. Cross-examination with microarray data further narrowed the list down to 235 genes, among which 96 had membrane or secretory annotations. After examined the candidates in protein interaction network, public tissue expression databases, and literatures, we selected three genes for further evaluation by real-time qPCR with eight major normal and cancer tissues. The higher-than-normal tissue expression of COL3A1, DLG3, and RNF43 in some of the cancer tissues is in agreement with our in silico predictions.
Searching digitized transcriptome using CMH enabled us to identify multi-cancer differentially expressed gene candidates. Our methodology demonstrated simultaneously analysis for cancer biomarkers of multiple tissue types with the EST data. With the revived interest in digitizing the transcriptomes by NGS, cancer biomarkers could be more precisely detected from the ESTs. The three candidates identified in this study, COL3A1, DLG3, and RNF43, are valuable targets for further evaluation with a larger sample size of normal and cancer tissue or serum samples.