Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, OK 73019, USA

Glomics Inc, Norman, OK 73072, USA

State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing, 100084, China

Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

School of Computing, Clemson University, Clemson, SC 29634, USA

Abstract

Background

Understanding the interaction among different species within a community and their responses to environmental changes is a central goal in ecology. However, defining the network structure in a microbial community is very challenging due to their extremely high diversity and as-yet uncultivated status. Although recent advance of metagenomic technologies, such as high throughout sequencing and functional gene arrays, provide revolutionary tools for analyzing microbial community structure, it is still difficult to examine network interactions in a microbial community based on high-throughput metagenomics data.

Results

Here, we describe a novel mathematical and bioinformatics framework to construct ecological association networks named molecular ecological networks (MENs) through Random Matrix Theory (RMT)-based methods. Compared to other network construction methods, this approach is remarkable in that the network is automatically defined and robust to noise, thus providing excellent solutions to several common issues associated with high-throughput metagenomics data. We applied it to determine the network structure of microbial communities subjected to long-term experimental warming based on pyrosequencing data of 16 S rRNA genes. We showed that the constructed MENs under both warming and unwarming conditions exhibited topological features of scale free, small world and modularity, which were consistent with previously described molecular ecological networks. Eigengene analysis indicated that the eigengenes represented the module profiles relatively well. In consistency with many other studies, several major environmental traits including temperature and soil pH were found to be important in determining network interactions in the microbial communities examined. To facilitate its application by the scientific community, all these methods and statistical tools have been integrated into a comprehensive Molecular Ecological Network Analysis Pipeline (MENAP), which is open-accessible now (

Conclusions

The RMT-based molecular ecological network analysis provides powerful tools to elucidate network interactions in microbial communities and their responses to environmental changes, which are fundamentally important for research in microbial ecology and environmental microbiology.

Background

In an ecosystem, different species/populations interact with each other to form complicated networks through various types of interactions such as predation, competition and mutualism. On the basis of ecological interactions, ecological networks can be grouped as antagonistic, competitive and mutualistic networks

Various network approaches have been developed and widely applied in genomic biology

High-throughput technologies such as microarrays and high throughout sequencing have generated massive amounts of data on microbial community diversity and dynamics across various spatial and temporal scales _{2}

Results

Overview of MENA

An ecological network is a representation of various biological interactions (e.g., predation, competition, mutualism) in an ecosystem, in which species (nodes) are connected by pairwise interactions (links)

The whole process of MENA can be divided into two phases and each phase is comprised of several major steps (Figure

Overview of the Random Matrix Theory (RMT)-based molecular ecological network analysis

**Overview of the Random Matrix Theory (RMT)-based molecular ecological network analysis**. Two major parts are included, network construction and network analyses. In each of them, several key steps are outlined.

Process of random matrix theory-based approach for automatically detecting threshold to construct molecular ecological networks

**Process of random matrix theory-based approach for automatically detecting threshold to construct molecular ecological networks.**

**Indexes**

**Formula**

**Explanation**

**Note**

**Ref**

**Part I: network indexes for individual nodes**

Connectivity

It is also called node degree. It is the most commonly used concept for desibing the topological property of a node in a network.

Stress centrality

It is used to desibe the number of geodesic paths that pass through the i^{th} node. High Stress node can serve as a broker.

Betweenness

It is used to desibe the ratio of paths that pass through the i^{th} node. High Betweenness node can serve as a broker similar to stress centrality.

Eigenvector centrality

^{th} node and

It is used to desibe the degree of a central node that it is connected to other central nodes.

Clustering coefficient

_{i} is the number of links between neighbors of node _{i}

It desibes how well a node is connected with its neighbors. If it is fully connected to its neighbors, the clustering coefficient is 1. A value close to 0 means that there are hardly any connections with its neighbors. It was used to desibe hierarchical properties of networks.

Vulnerability

_{i} is the global efficiency after the removal of the node

It measures the deease of node i on the system performance if node i and all associated links are removed.

**Part II: The overall network topological indexes**

Average connectivity

_{i} is degree of node

Higher

Average geodesic distance

_{ij} is the shortest path between node

A smaller

Geodesic efficiency

all parameters shown above.

It is the opposite of

Harmonic geodesic distance

The reciprocal of

Centralization of degree

max(_{i} represents the connectivity of ^{th} node. Finally this value is normalized by the theoretical maximum centralization score.

It is close to 1 for a network with star topology and in contrast close to 0 for a network where each node has the same connectivity.

Centralization of betweenness

max(_{i} represents the betweenness of ^{th} node. Finally this value is normalized by the theoretical maximum centralization score.

It is close to 0 for a network where each node has the same betweenness, and the bigger the more difference among all betweenness values.

Centralization of stress centrality

max(_{i} represents the stress centrality of ^{th} node. Finally this value is normalized by the theoretical maximum centralization score.

It is close to 0 for a network where each node has the same stress centrality, and the bigger the more difference among all stress centrality values.

Centralization of eigenvector centrality

max(_{i} represents the eigenvector centrality of ^{th} node. Finally this value is normalized by the theoretical maximum centralization score.

It is close to 0 for a network where each node has the same eigenvector centrality, and the bigger the more difference among all eigenvector centrality values.

Density

_{exp} is the number of possible links.

It is closely related to the average connectivity.

Average clustering coefficient

It is used to measure the extent of module structure present in a network.

Transitivity

_{i} is the number of links between neighbors of node _{i}

Sometimes it is also called the entire clustering coefficient. It has been shown to be a key structural property in social networks.

Connectedness

It is one of the most important measurements for summarizing hierarchical structures.

**Terminology**

**Explanation**

**Scale-free**

It is a most notable characteristic in complex systems. It was used to desibe the finding that most nodes in a network have few neighbors while few nodes have large amount of neighbors. In most cases, the connectivity distribution asymptotically follows a power law

**Small-world**

It is a terminology in network analyses to depict the average distance between nodes in a network is short, usually logarithmically with the total number of nodes

**Modularity**

It was used to demonstrate a network which could be naturally divided into communities or modules

**Hierarchy**

It was used to depict the networks which could be arranged into a hierarchy of groups representing in a tree structure. Several studies demonstrated that metabolic networks are usually accompanied by a hierarchical modularity ^{−γ} (scaling law), in which

Molecular network under experimental warming

Here we used 16 S rRNA gene-based pyrosequencing data from a long-term experimental warming site

Habitats of communities^{b}

Empirical networks

Random networks

Similarity threshold (_{t})

Network size (

R^{2} of power law

R^{2} of scaling law

Average path (GD)

Average Clustering coefficient (

Modularity & (the number of modules)

Average path (GD)

Average clustering coefficient (

Modularity (

^{a}Various parameters of the empirical networks and generation of random networks are explained in the Table

^{b}Sample sources: (i) the grassland soils under elevated and ambient CO_{2} were collected from a free-air CO_{2} enrichment field in Minnesota which were analyzed with both GeoChip3.0 and 16 S pyrosequencing

**Functional MENs**

Grassland soils under elevated CO_{2}, MN ^{(i)}

0.80

254

0.79

0.25

3.09

0.22

0.44 (18)

3.00 ± 0.03

0.099 ± 0.009

0.31 ± 0.01

Grassland soils under ambient CO_{2}, MN ^{(i)}

0.80

184

0.88

0.11

4.21

0.10

0.65 (16)

3.84 ± 0.06

0.028 ± 0.007

0.52 ± 0.01

Lake sediment, Lake DePue, WI ^{(ii)}

0.92

151

0.85

0.73

3.47

0.09

0.48 (8)

3.46 ± 0.05

0.046 ± 0.010

0.45 ± 0.01

Groundwater, Well 101–2, Oak Ridge, TN ^{(iii)}

0.95

107

0.74

0.44

3.12

0.29

0.52 (11)

3.13 ± 0.07

0.081 ± 0.017

0.40 ± 0.01

Groundwater Well 102–2, Oak Ridge, TN ^{(iii)}

0.89

140

0.79

0.21

4.22

0.17

0.67 (12)

3.89 ± 0.08

0.033 ± 0.012

0.53 ± 0.01

Groundwater Well 102–3, Oak Ridge, TN ^{(iii)}

0.87

117

0.85

0.19

3.57

0.25

0.64 (13)

3.54 ± 0.09

0.049 ± 0.013

0.48 ± 0.01

**Phylogenetic MENs (454 pyrosequencing)**

Grassland soils under warming, Norman, OK ^{(iv)}

0.76

177

0.83

0.48

3.91

0.13

0.67 (18)

3.94 ± 0.20

0.020 ± 0.008

0.44 ± 0.01

Grassland soils under unwarming, Norman, OK ^{(iv)}

0.76

152

0.88

0.10

2.71

0.09

0.61 (20)

3.39 ± 0.23

0.038 ± 0.010

0.47 ± 0.01

Grassland soils under elevated CO_{2}, MN ^{(i)}

0.78

263

0.89

0.26

3.95

0.25

0.81 (34)

3.98 ± 0.22

0.015 ± 0.006

0.61 ± 0.02

Grassland soils under ambient CO_{2}, MN ^{(i)}

077

292

0.87

0.22

4.26

0.27

0.85 (36)

4.10 ± 0.20

0.017 ± 0.005

0.59 ± 0.01

Agricultural soil, Africa ^{(v)}

0.77

384

0.86

0.20

4.99

0.34

0.86 (32)

3.99 ± 0.04

0.020 ± 0.004

0.48 ± 0.01

Human intestine, Stanford, CA ^{(vi)}

0.86

215

0.92

0.18

3.55

0.13

0.69 (27)

4.23 ± 0.10

0.025 ± 0.009

0.58 ± 0.01

The robustness of MENs to noise

In order to examine the robustness of MEN approach to noise, different levels (1 to 100 % of original standard deviation) of Gaussian noise were added to the warming dataset. Once various levels of noise were added, new correlation matrices based on these noise-added datasets were calculated. The same similarity threshold used for the original datasets was used for defining adjacency matrices in the new datasets. When less than 40 % noise was added, roughly 90 % of the original OTUs were still detected in the perturbed networks (Figure

The robustness to noise of RMT-based MEN construction

**The robustness to noise of RMT-based MEN construction.** Ineasing levels of Gaussian noise were added to the pyrosequencing datasets under experimental warming. The mean of noise was zero and standard deviation (σ_{noise}) was set to 5, 10, 20, 30 to 100 % of the average of relative abundance of whole dataset. The thresholds (_{t}) of all permutated datasets were set to 0.76 that was consistent with original dataset.

The overall MENs topology

Scale-free, small-world, modularity and hierarchy are common network properties in many complex systems (Table ^{2} values from 0.74 to 0.92), indicative of scale-free networks. Also, the average path lengths (^{2} values of the linear relationship between logarithms of clustering coefficients and the logarithms of connectivity ranged from 0.10 to 0.73, indicating the hierarchical behavior was quite variable. MENs from certain habitats may have highly hierarchical structures like sediment samples from Lake DePue (0.73), but others may not (Table

Modular structure

Modularity is a very important concept in ecology. It could originate from specificity of interactions (e.g. predation, pollination), habitat heterogeneity, resource partition, ecological niche overlap, natural selection, convergent evolution, and phylogenetic relatedness, and it could be important for system stability and resilience

We used several methods, including short random walks

The submodules of the warming pMEN

**The submodules of the warming pMEN.** (**A**) The network graph with submodule structure by the fast greedy modularity optimization method. Each node signifies an OTU, which could correspond to a miobial population. Colors of the nodes indicate different major phyla. A blue edge indicates a positive interaction between two individual nodes, while a red edge indicates a negative interaction. (**B**) The correlations and heatmap to show module eigengenes of warming pMEN. The upper part is the hierarchical clustering based on the Pearson correlations among module eigengenes and the below heatmap shows the coefficient values (**C**) ZP-plot showing distribution of OTUs based on their module-based topological roles. Each dot represents an OTU in the dataset of warming (red), or unwarming (green). The topological role of each OTU was determined according to the scatter plot of within-module connectivity (

Eigengene network analysis and the modular topological roles

After modules and submodules are determined, the eigengene analysis is used to reveal higher order organizations in the network structure

Different nodes play distinct topological roles in the network _{i}) and among-module connectivity (_{i}). The topological roles of nodes in warming and unwarming pMENs were illustrated in ZP-plot (Figure _{i} and _{i}, the roles of nodes were classified into four categories: peripherals, connectors, module hubs and network hubs. From ecological perspectives, peripherals might represent specialists whereas module hubs and connectors were close to generalists and network hubs as super-generalists

The correlations between network topologies with environmental traits

The relationships between microbial network topology and environmental characteristics can be examined in both direct and indirect ways. Indirectly, as a first step, the OTU significance (^{2}) of OTU abundance profile with environmental traits. Then the correlation between ^{-5}), indicating that the nodes with higher connectivity were inclined to have closer relationships with temperature. If multiple _{3}-nitrogen and soil carbon contents when the effect of temperature was controlled (r_{M} = 0.104, P = 0.018). Meanwhile, the _{M} = 0.159, P = 0.003) (Table _{M} = 0.59 and 0.926 respectively, both P = 0.013). These results suggested that the OTUs topology in warming pMEN was significantly associated with both temperature and the selected soil variables. In addition, OTUs from

**Phylogeny**

**# nodes**

**of soil geochemistry**^{a}**partial****of temperature**

**of temperature partial****of soil geochemistry**

**r**_{M}^{b}

**P**^{c}

**r**_{M}

**P**

^{a}Soil variables used for OTU significance calculations: pH values, NO_{3}-Nitrogen and soil carbon contents.

^{b}Correlation coefficient based on Mantel test.

^{c}The significance (probability) of Mantel test.

All detected OTUs

177

0.104

**0.018**

0.159

**0.003**

35

0.059

0.234

−0.054

0.800

63

−0.033

0.650

0.077

0.135

5

−0.339

0.663

0.367

0.108

6

−0.082

0.521

−0.202

0.788

26

−0.057

0.721

0.096

0.155

12

0.590

**0.013**

−0.001

0.430

6

0.338

0.088

−0.298

0.877

4

0.030

0.772

0.796

0.243

5

0.926

**0.013**

−0.755

1.000

The correlations between module-based eigengenes and environmental factors can be used to detect the modules’ response to environmental changes. In warming pMEN, the coefficients (_{3}^{-} concentration (

The correlations between module eigengenes and environmental traits in the warming pMEN

**The correlations between module eigengenes and environmental traits in the warming pMEN.** The color of each plot indicates the correlation between corresponding module eigengene and environmental trait. Red color means highly positive correlation and green color means highly negative correlation. The numbers in each plot are the correlation coefficient (_{3}-nitrogen content (NO_{3}N), soil carbon content (SC) and average soil temperature (avgT).

Open-access pipeline

To facilitate the application of MENA in the scientific community, an open-access pipeline for MEN construction and analysis (MENAP) was implemented (

An overview of molecular ecological network analysis pipeline (MENAP)

**An overview of molecular ecological network analysis pipeline (MENAP).**

The network analysis component is further divided into three major parts:

(a) Network characterization. Various network properties are calculated and evaluated, such as connectivity, betweenness, clustering coefficient, and geodesic distance. The module/submodule detection and modularity analyses is performed using fast greedy modularity optimization

(b) Network visualization. An automatic pipeline is constructed to visualize the constructed network. Moreover, the file format for software Cytoscape 2.6.0

(c) Network comparison. Various randomization methods like the Maslov-Sneppen method

Discussion and conclusions

Most previous studies on the biodiversity of microbial communities have been focused on the number of species and the abundance of species, but not interactions among species. However, species interactions could be more important to ecosystem functioning than species richness and abundance, especially in complex ecosystems

The network approach described is based on the transition of two universal distributions from the random matrix theory. A major advantage of RMT method is that the threshold to construct network is automatically determined. In contrast, most other methods studies use arbitrary thresholds, which are usually based on limited knowledge of biological information

Nevertheless, characterizing ecological network of microbial communities poses major challenges. MENs are constructed by the adjacency matrix originated from the pair-wise correlations of relative OTU abundance across different samples. Therefore, a network interaction between two OTUs or genes describes the co-occurrence of these two OTUs or genes across different samples. The co-occurrence might be caused by species or genes performing similar or complementary functions, or shared environmental conditions that microbial species coexist in

A long-held tenet is that the structure of ecological networks has significant influence on the dynamics

In addition to interactions among microbes within a community, MENs allow for analyses of interactions with their environment through correlations with abiotic environmental measurements, which might provide insights on the conditions that have significant impact on the co-occurring organisms. It is also possible to link groups of organisms with biogeochemical measurements to reveal the functional role of organism in biogeochemical processes. These kinds of data are important for generating hypotheses to help explain natural environments that microbial communities reside, which might lead to forecasting responses of microbial communities when environment changes

In summary, our study provides a mathematical/bioinformatic framework for network construction based on metagenomics data such as sequencing

Methods

Data standardization

The network construction begins with a data table with _{ik} represent the abundance or relative abundance of the ^{nxm} = [_{ik}] is the abundance matrix. Usually, the abundance profile of _{i} across all samples are _{ik} has mean value of 0 and variance value of 1. ^{nxm} is the standardized data matrix and used for subsequent correlation analysis.

Defining adjacency matrix

Molecular ecological networks can be built on the basis of the measurements of relative OTU abundance in microbial communities. In MENs, each OTU corresponds to a node. Each network corresponds to an adjacency matrix (or interaction matrix), ^{nxn} = _{ij}, which encodes the connection strength between each pair of nodes _{ij} =1 if nodes _{ij} =0 otherwise _{ij} ≤ 1. The adjacency matrix is the foundation of all subsequent steps in network analysis.

To define the adjacency matrix, the similarity of OTU abundance across all samples should be measured first. Such similarity measures the degree of concordance between the abundance profiles of OTUs across different samples. Similar to widely used gene co-expression analyses _{ij}) are used to measure the similarity between ^{nxn} = _{ij} be the Pearson correlation matrix, then

where _{ik} and _{jk} are the standardized abundance of the _{ij}) is used to define the abundance similarity between _{ij}), that is

Let ^{nxn} = [s_{ij}, which is a similarity matrix of the OTU abundance. In molecular ecological network analysis, the adjacency matrix is derived from the OTU abundance similarity matrix by applying a threshold. Similar to relevant gene co-expression network analysis _{tb}), OTU abundance similarity matrix, ^{n×n} = _{ij}, is converted into the adjacency matrix, ^{p×p} = _{ij}, where p ≤ n. The adjacency _{ij} between the

where _{tb} is the threshold parameter. The resulting adjacency matrix, ^{p×p}, is generally smaller than the similarity matrix because the rows or columns are removed if all of their elements are less than the threshold value.

Determining the threshold by random matrix theory-based approach

The structure of relevance network strongly depends on the threshold value, _{t}. In some network analysis, the threshold value is chosen arbitrarily based on known biological information or set by the empirical study _{t}.

Basic concept of RMT

Initially proposed by Wigner and Dyson in the 1960s for studying the spectrum of complex nuclei

RMT predicts two universal extreme distributions of the nearest neighbor spacing distribution (NNSD) of eigenvalues: Gaussian orthogonal ensemble (GOE) statistics, which corresponds to random properties of complex system, and Poisson distribution, which corresponds to system-specific, nonrandom properties of complex systems

The key concept of RMT is to mainly concern with the _{i} with_{av} is the continuous density of eigenvalues obtained by fitting and smoothing the original integrated density of eigenvalues to a cubic spline or by local density average.

After unfolding the eigenvalues, three statistical quantities can be used to extract information from a sequence of eigenvalues, namely, eignevalue spacing distribution

On the other hand, for the correlated eigenvalues,

We use the ^{2} goodness-of-fit test to assess whether NNSD follows Wigner-Dyson distribution or Poisson distribution. We assume that the NNSD of any biological system obeys these two extreme distributions

Algorithms of detecting the threshold value

The following major steps are used to define the threshold (_{t}) based on the standardized relative abundance of OTUs across different samples (Figure

(a) Calculate the Pearson correlation matrix, ^{nxn}, based on the standardized relative abundance of OTUs, ^{nxm} with

(b) Obtain similarity data, ^{nxn}, by taking the absolute value of correlation matrix ^{n×n}.

(c) Set an initial threshold value, _{tb} (e.g., 0.3 based on our experiences).

(d) Calculate the adjacency matrix, ^{pxp} = [_{ij}] according to _{tb}, where

(e) Calculate eigenvalues _{i} of the adjacency matrix based on the equation

(f) To get unfolded eigenvalues, replace _{i} with_{av} is the continuous density of eigenvalues and can be obtained by fitting the original integrated density to a cubic spline or by local average.

(g) Calculate the nearest neighbor spacing distribution of eigenvalues,

(h) Using the ^{2} goodness-of-fit test to determine whether the probability density function _{0}: _{1}: ^{2} goodness-of-fit test has the test statistics, _{i} is the observed nearest neighbor spacing and _{i}) is an expected (theoretical) nearest neighbor spacing from Poisson distribution. The resulting ^{2} value is compared to the ^{2} distribution. Let ^{2} distribution with

(i) If _{0} is not rejected. Then go to step (j).If _{0} is rejected. Then, increase the threshold by 0.1, _{tb} + 0.1, and repeat the steps from (e) to (h).

(j) Find a finer scale threshold value by increasing the threshold with 0.01 within the range of [_{tb}-0.1, _{tb}]. Then repeat the steps from (e) to (h).

(k) If H_{0} is accepted, i.e., the

Once the final threshold value _{t} is determined at a finer scale, an adjacency matrix is obtained by retaining all the OTUs whose abundance similarity values are greater than the determined threshold. Currently we have only adopted the unweighted network in the following network topological analysis. Hence, the final adjacency _{ij} is:

where _{t} is the final threshold parameter. Two nodes are linked if the similarity between their abundance profiles across all samples is equal to 1.

Calculation of MEN topological indices and general features

Once MENs are determined, various network topology indices can be calculated based on the adjacency matrix (Table

Scale-free, small world, modularity and hierarchy are most common network characteristics of interest ^{−γ}, in which

Module detection

Modularity is a fundamental characteristics of biological networks as well as many engineering systems

where _{M} is the number of modules in the network, _{b} is the number of links among all nodes within the ^{th} module, ^{th} module.

Several different algorithms can be used to separate modules, including short random walks, leading eigenvector of the community matrix, simulated annealing approach, and fast greedy modularity optimization

Once the network modularity value (^{nxn} that can be obtained by the adjacent matrix A^{nxn} subtracting an expected edges matrix P^{nxn} from a null model. Then the network can be split into two groups by finding the leading eigenvector that was corresponding to the largest positive eigenvalue of modularity matrix. This splitting process can be looped until any further divisions will not increase the

The algorithm of simulated annealing approach usually produces the best separation of the modules by direct maximization of

The algorithm of fast greedy modularity optimization is to isolate modules via directly optimizing the

Identification of key module members

After all modules are separated, each node can be assigned a role based on its topological properties _{i}) and among-module connectivity (_{i}) as follows

and

where _{M} is the number of modules in the network.

The within-module connectivity, _{i}, describes how well node _{i}, reflects what degree that node _{i} is also referred as the among-module connectivity _{i} = 0. If the links of node _{i} → 1. The topological roles of individual nodes can be assigned by their position in the _{i} ≤ 2.5, _{i} ≤ 0.62), which have only a few links and almost always to the nodes within their modules, (ii) Connectors (_{i} ≤ 2.5, _{i} > 0.62), which are highly linked to several modules, (iii) Module hubs (_{i} > 2.5, _{i} ≤ 0.62), which are highly connected to many nodes in their own modules, and (iv) Network hubs (_{i} > 2.5, _{i} > 0.62), which act as both module hubs and connectors. From ecological perspective, peripheral nodes represent specialists whereas the other three are generalists.

Eigen-gene analysis

One of the grand challenges in dealing with high throughput metagenomics data is the high dimensionality. Various statistical approaches are used to reduce dimensions and extract major features, including principal component analysis (PCA), detrended correspondence analysis (DCA), and singular value decomposition (SVD). SVD is an orthogonal linear transformation of data (e.g., microbial data) from the complexity to the comprehensibility

Suppose there are ^{b} OTUs in the ^{b} can be decomposed as follows:

where both ^{b} and ^{b} are denoted as

Assuming that the singular values are arranged in decreasing order, the first column of ^{b} is referred as the Module Eigen-gene, ^{b}, for the _{.}

The relative abundance profile of the OTUs within a module is represented by the eigen-gene. In addition, the sum of variance of OTU abundances equals to the sum of the diagonal matrix in SVD. Therefore, the percentage of the variance explained by the eigen-gene is given by

Generally, the module eigen-gene can explain approximately 50 % or more of the variance of the OTU abundances in the module ^{b} is the first principal component based on PCA analysis

Module membership

Module eigen-gene provides the best summary of variation in relative abundance of OTUs within a module, but it is a centroid of a module rather than a real OTU. In practice, it is always important to understand how close it is between a given actual OTU and its eigen-gene. The correlation of the eigen-gene in module

If

Random network construction and network comparison

Since only a single data point is available for each network parameter, we are not able to perform standard statistical analyses to assess statistical significances. Similar to the concept of hypothesis testing, the null model is generated to assess the performance of the alternative model. Thus, the random networks are generated to compare different complex networks using the Maslov-Sneppen procedure

Trait-based gene significance measure

In gene expression network analyses, the gene significance (_{i,h}) is the correlation between the expression profile of the _{h}_{i,h}, the more biologically significant gene

where _{i} is the relative abundance of the _{h} is the ^{nxg}, is obtained.

Relationships of microbial interaction networks with soil variables

To discern the relationships between molecular ecological networks and soil properties, Mantel tests can be performed

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

YD carried out the analysis, constructed the pipeline and wrote the article. YJ carried out part of method development and method writing. YY provided the RMT method and contributed part of the discussion. ZH contributed the paper writing and oversight the work. FL provided the RMT method. JZ provided oversight of the work and helped finalize the article. All authors read and approved the final manuscript.

Acknowledgements

This work has been supported, through contract DE-SC0004613 and contract DE-AC02-05CH11231 (as part of ENIGMA, a Scientific Focus Area) and contract DE-SC0004601, by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomics: GTL Foundational Science, the United States Department of Agriculture (Project 2007-35319-18305) through NSF-USDA Microbial Observatories Program, the Oklahoma Bioenergy Center (OBC) of State of Oklahoma, and State Key Joint Laboratory of Environment Simulation and Pollution Control (Grant 11Z03ESPCT) at Tsinghua University.