Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

A novel Mixture Model Method for identification of differentially expressed genes from DNA microarray data

Kayvan Najarian1*, Maryam Zaheri1, Ali A Rad2, Siamak Najarian3 and Javad Dargahi3

Author Affiliations

1 Computer Science Department, University of North Carolina Charlotte, University City Blvd, Charlotte, NC, USA

2 Computer Engineering and IT Department, Amirkabir University of Technology, Tehran, Iran

3 Mechanical and Industrial Engineering Department, Concordia University, CONCAVE Research Centre, CR-200, Concordia University, Quebec, Canada

For all author emails, please log on.

BMC Bioinformatics 2004, 5:201  doi:10.1186/1471-2105-5-201

Published: 16 December 2004

Abstract

Background

The main goal in analyzing microarray data is to determine the genes that are differentially expressed across two types of tissue samples or samples obtained under two experimental conditions. Mixture model method (MMM hereafter) is a nonparametric statistical method often used for microarray processing applications, but is known to over-fit the data if the number of replicates is small. In addition, the results of the MMM may not be repeatable when dealing with a small number of replicates. In this paper, we propose a new version of MMM to ensure the repeatability of the results in different runs, and reduce the sensitivity of the results on the parameters.

Results

The proposed technique is applied to the two different data sets: Leukaemia data set and a data set that examines the effects of low phosphate diet on regular and Hyp mice. In each study, the proposed algorithm successfully selects genes closely related to the disease state that are verified by biological information.

Conclusion

The results indicate 100% repeatability in all runs, and exhibit very little sensitivity on the choice of parameters. In addition, the evaluation of the applied method on the Leukaemia data set shows 12% improvement compared to the MMM in detecting the biologically-identified 50 expressed genes by Thomas et al. The results witness to the successful performance of the proposed algorithm in quantitative pathogenesis of diseases and comparative evaluation of treatment methods.