Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

Hierarchical Parallelization of Gene Differential Association Analysis

Mark Needham1, Rui Hu2*, Sandhya Dwarkadas1 and Xing Qiu2

Author Affiliations

1 Department of Computer Science, University of Rochester, PO Box 270226, Rochester, New York 14627, USA

2 Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue Box 630, Rochester, New York 14642, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:374  doi:10.1186/1471-2105-12-374

Published: 21 September 2011



Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today.


Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from webcite


The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.