Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Second International Workshop on Data and Text Mining in Bioinformatics (DTMBio) 2008

Open Access Proceedings

Mining metastasis related genes by primary-secondary tumor comparisons from large-scale databases

Sangwoo Kim and Doheon Lee*

Author Affiliations

Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-Dong, Yu-seong Gu, Daejeon, 305-701, Republic of Korea

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 3):S2  doi:10.1186/1471-2105-10-S3-S2

Published: 19 March 2009



Metastasis is the most dangerous step in cancer progression and causes more than 90% of cancer death. Although many researchers have been working on biological features and characteristics of metastasis, most of its genetic level processes remain uncertain. Some studies succeeded in elucidating metastasis related genes and pathways, followed by predicting prognosis of cancer patients, but there still is a question whether the result genes or pathways contain enough information and noise features have been controlled appropriately.


We set four tumor type classes composed of various tumor characteristics such as tissue origin, cellular environment, and metastatic ability. We conducted a set of comparisons among the four tumor classes followed by searching for genes that are consistently up or down regulated through the whole comparisons.


We identified four sets of genes that are consistently differently expressed in the comparisons, each of which denotes one of four cellular characteristics respectively – liver tissue, colon tissue, liver viability and metastasis characteristics. We found that our candidate genes for tissue specificity are consistent with the TiGER database. And we also found that the metastasis candidate genes from our method were more consistent with the known biological background and independent from other noise features.


We suggested a new method for identifying metastasis related genes from a large-scale database. The proposed method attempts to minimize the influences from other factors except metastatic ability including tissue originality and tissue viability by confining the result of metastasis unrelated test combinations.