Open Access Open Badges Research article

Efficient α, β-motif finder for identification of phenotype-related functional modules

Matthew C Schmidt12, Andrea M Rocha3, Kanchana Padmanabhan12, Zhengzhang Chen12, Kathleen Scott4, James R Mihelcic3 and Nagiza F Samatova12*

Author Affiliations

1 Department of Computer Science, North Carolina State University, Raleigh, 27695, USA

2 Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, 37831, USA

3 Department of Civil and Environmental Engineering, University of South Florida, Tampa, 33620, USA

4 Department of Integrative Biology, University of South Florida, Tampa, 33620, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:440  doi:10.1186/1471-2105-12-440

Published: 11 November 2011



Microbial communities in their natural environments exhibit phenotypes that can directly cause particular diseases, convert biomass or wastewater to energy, or degrade various environmental contaminants. Understanding how these communities realize specific phenotypic traits (e.g., carbon fixation, hydrogen production) is critical for addressing health, bioremediation, or bioenergy problems.


In this paper, we describe a graph-theoretical method for in silico prediction of the cellular subsystems that are related to the expression of a target phenotype. The proposed (α, β)-motif finder approach allows for identification of these phenotype-related subsystems that, in addition to metabolic subsystems, could include their regulators, sensors, transporters, and even uncharacterized proteins. By comparing dozens of genome-scale networks of functionally associated proteins, our method efficiently identifies those statistically significant functional modules that are in at least α networks of phenotype-expressing organisms but appear in no more than β networks of organisms that do not exhibit the target phenotype. It has been shown via various experiments that the enumerated modules are indeed related to phenotype-expression when tested with different target phenotypes like hydrogen production, motility, aerobic respiration, and acid-tolerance.


Thus, we have proposed a methodology that can identify potential statistically significant phenotype-related functional modules. The functional module is modeled as an (α, β)-clique, where α and β are two criteria introduced in this work. We also propose a novel network model, called the two-typed, divided network. The new network model and the criteria make the problem tractable even while very large networks are being compared. The code can be downloaded from webcite