Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

A comparative study of discriminating human heart failure etiology using gene expression profiles

Xiaohong Huang1, Wei Pan1*, Suzanne Grindle2, Xinqiang Han2, Yingjie Chen2, Soon J Park2, Leslie W Miller2 and Jennifer Hall2

Author Affiliations

1 Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA

2 Cardiovascular Division, Department of Medicine, Medical School, University of Minnesota, Minneapolis, MN 55455, USA

For all author emails, please log on.

BMC Bioinformatics 2005, 6:205  doi:10.1186/1471-2105-6-205

Published: 24 August 2005



Human heart failure is a complex disease that manifests from multiple genetic and environmental factors. Although ischemic and non-ischemic heart disease present clinically with many similar decreases in ventricular function, emerging work suggests that they are distinct diseases with different responses to therapy. The ability to distinguish between ischemic and non-ischemic heart failure may be essential to guide appropriate therapy and determine prognosis for successful treatment. In this paper we consider discriminating the etiologies of heart failure using gene expression libraries from two separate institutions.


We apply five new statistical methods, including partial least squares, penalized partial least squares, LASSO, nearest shrunken centroids and random forest, to two real datasets and compare their performance for multiclass classification. It is found that the five statistical methods perform similarly on each of the two datasets: it is difficult to correctly distinguish the etiologies of heart failure in one dataset whereas it is easy for the other one. In a simulation study, it is confirmed that the five methods tend to have close performance, though the random forest seems to have a slight edge.


For some gene expression data, several recently developed discriminant methods may perform similarly. More importantly, one must remain cautious when assessing the discriminating performance using gene expression profiles based on a small dataset; our analysis suggests the importance of utilizing multiple or larger datasets.