Linear combination test for gene set analysis of a continuous phenotype
1 School of Public Health, University of Alberta, Edmonton, Alberta T6G 1C9, Canada
2 Department of Medicine, University of Alberta, Edmonton, Alberta T6G 2E1, Canada
3 Department of Population Health Research, Alberta Health Services-Cancer Care, Calgary, Alberta, Canada
4 Departments of Medical Genetics and Oncology, University of Calgary, Calgary, Alberta, Canada
5 CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, India
6 Public Health Foundation of India, Delhi, India
BMC Bioinformatics 2013, 14:212 doi:10.1186/1471-2105-14-212Published: 1 July 2013
Gene set analysis (GSA) methods test the association of sets of genes with a phenotype in gene expression microarray studies. Many GSA methods have been proposed, especially methods for use with a binary phenotype. Equally, if not more importantly however, is the ability to test the enrichment of a gene signature or pathway against the continuous phenotypes which are routinely and commonly observed in, for example, clinicopathological measurements. It is not always easy or meaningful to dichotomize continuous phenotypes into two classes, and attempting to do this may lead to the inaccurate classification of samples, which would affect the downstream enrichment analysis. In the present study, we have build on recent efforts to incorporate correlation structure within gene sets and pathways into the GSA test statistic. To address the issue of continuous phenotypes directly without the need for artificial discrete classification and thus increase the power of the test while ensuring computational efficiency and rigor, new GSA methods that can incorporate a covariance matrix estimator for a continuous phenotype may present an effective approach.
We have designed a new method by extending the GSA approach called Linear Combination Test (LCT) from a binary to a continuous phenotype. Simulation studies and a real microarray dataset were used to compare the proposed LCT for a continuous phenotype, a modification of LCT (referred to as LCT2), and two publicly available GSA methods for continuous phenotypes.
We found that the LCT methods performed better than the other two GSA methods; however, this finding should be understood in the context of our specific simulation studies and the real microarray dataset that were used to compare the methods. Free R-codes to perform LCT for binary and continuous phenotypes are available at http://www.ualberta.ca/~yyasui/homepage.html webcite. The R-code to perform LCT for a continuous phenotype is available as Additional file 1.