A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors
1 Laboratory of Theoretical Life Science, Graduate School of Information Sciences, Nara Institute of Science and Technology, 8916-5 Takayama, Nara, 630-0101, Japan
2 Osaka Medical Center for Cancer and Cardiovascular Diseases, 1-3-2, Nakamichi, Higashinari-ku, Osaka, 537-8511, Japan
3 Department of Surgical Oncology, Osaka University Medical School, 2-2 Yamadaoka, Suita-ku, Osaka, 565-0871, Japan
BMC Genomics 2006, 7:190 doi:10.1186/1471-2164-7-190Published: 27 July 2006
Although microscopic diagnosis has been playing the decisive role in cancer diagnostics, there have been cases in which it does not satisfy the clinical need. Differential diagnosis of malignant and benign thyroid tissues is one such case, and supplementary diagnosis such as that by gene expression profile is expected.
With four thyroid tissue types, i.e., papillary carcinoma, follicular carcinoma, follicular adenoma, and normal thyroid, we performed gene expression profiling with adaptor-tagged competitive PCR, a high-throughput RT-PCR technique. For differential diagnosis, we applied a novel multi-class predictor, introducing probabilistic outputs. Multi-class predictors were constructed using various combinations of binary classifiers. The learning set included 119 samples, and the predictors were evaluated by strict leave-one-out cross validation. Trials included classical combinations, i.e., one-to-one, one-to-the-rest, but the predictor using more combination exhibited the better prediction accuracy. This characteristic was consistent with other gene expression data sets. The performance of the selected predictor was then tested with an independent set consisting of 49 samples. The resulting test prediction accuracy was 85.7%.
Molecular diagnosis of thyroid tissues is feasible by gene expression profiling, and the current level is promising towards the automatic diagnostic tool to complement the present medical procedures. A multi-class predictor with an exhaustive combination of binary classifiers could achieve a higher prediction accuracy than those with classical combinations and other predictors such as multi-class SVM. The probabilistic outputs of the predictor offer more detailed information for each sample, which enables visualization of each sample in low-dimensional classification spaces. These new concepts should help to improve the multi-class classification including that of cancer tissues.