Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

svapls: an R package to correct for hidden factors of variability in gene expression studies

Sutirtha Chakraborty*, Somnath Datta and Susmita Datta

Author Affiliations

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY-40202, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:236  doi:10.1186/1471-2105-14-236

Published: 24 July 2013



Hidden variability is a fundamentally important issue in the context of gene expression studies. Collected tissue samples may have a wide variety of hidden effects that may alter their transcriptional landscape significantly. As a result their actual differential expression pattern can be potentially distorted, leading to inaccurate results from a genome-wide testing for the important transcripts.


We present an R package svapls that can be used to identify several types of unknown sample-specific sources of heterogeneity in a gene expression study and adjust for them in order to provide a more accurate inference on the original expression pattern of the genes over different varieties of samples. The proposed method implements Partial Least Squares regression to extract the hidden signals of sample-specific heterogeneity in the data and uses them to find the genes that are actually correlated with the phenotype of interest. We also compare our package with three other popular softwares for testing differential gene expression along with a detailed illustration on the widely popular Golub dataset. Results from the sensitivity analyes on simulated data with widely different hidden variation patterns reveal the improved detection power of our R package compared to the other softwares along with reasonably smaller error rates. Application on the real-life dataset exhibits the efficacy of the R package in detecting potential batch effects from the dataset.


Overall, Our R package provides the user with a simplified framework for analyzing gene expression data with a wide range of hidden variation patterns and delivering a differential gene expression analysis with substantially improved power and accuracy.

The R package svapls is freely available at webcite.