Computational Biology Center, T.J. Watson IBM Research Center, Yorktwon Heights, New York, USA

Dept. of Physiology, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA

Abstract

Background

Biological experiments increasingly yield data representing large ensembles of interacting variables, making the application of advanced analytical tools a forbidding task. We present a method to extract networks of correlated activity, specifically from functional MRI data, such that: (a) network nodes represent voxels, and (b) the network links can be directed or undirected, representing temporal relationships between the nodes. The method provides a snapshot of the ongoing dynamics of the brain without sacrificing resolution, as the analysis is tractable even for very large numbers of voxels.

Results

We find that, based on topological properties of the networks, the method provides enough information about the dynamics to discriminate between subtly different brain states. Moreover, the statistical regularities previously reported are qualitatively preserved, i.e. the resulting networks display scale-free and small-world topologies.

Conclusion

Our method expands previous approaches to render large scale functional networks, and creates the basis for an extensive and -due to the presence of mixtures of directed and undirected links- richer motif analysis of functional relationships.

Background

A growing number of biological experiments are producing datasets consisting of large numbers of interacting variables, from genomics to neural networks to eco-systems, giving rise to the nascent field of systems biology. The eminent challenge of this discipline is how to simplify the analysis of these high dimensional dynamical systems while retaining their relevant features. In particular, despite the rich, complex dynamics of the brain, and the highly interconnected and non-linear nature of its information processing capabilities, the bulk of the literature on brain imaging involves slight variations on the main theme: the identification of the degree of correlation between the activation of a local brain area and external markers. A variety of attempts have been made trying to go beyond this paradigm, including ICA, Volterra kernels and supervised classification techniques

Recently, a different approach was introduced

Another approach to capture the dynamics of complex systems is the causality analysis pioneered by Granger. The essence of this approach is to identify possible causal relationship between the variables of a system by analyzing how much the time course of one variable contributes to that of another one, based on an auto-regressive model. The method has been applied with success to neural data in the context of a few electro-physiological recordings, and to small numbers of brain areas represented by aggregates of several voxels ^{2}, where

In the present work, we try to circumvent both the limitations of covariance-based analysis and auto-regression-based causality analysis. We extend our previous findings by attempting to capture a larger signature of the dynamics of the brain by including directionality in the edges of the network, based on the concept of delayed covariance.

In our previous work, we defined a functional network by considering all functional voxels {_{i}} as possible nodes; their covariance determines whether a binary functional link (or edge) exists between them: _{ij }= ⟨(_{i}(_{i})(_{j}(_{j})_{i }= ⟨(_{i}(_{t }and _{i}(_{i})^{2}⟩, such that if the correlation between **if **_{ij }> _{T }**then **_{ij }= 1, **else **_{ij }= 0. We extend here this approach by considering the delayed or lagged covariance: _{ij}(_{i}(_{i})(_{j}(_{j})**if **_{ij}(_{T }**then **_{ij }= _{ji }= 1; **else if **_{ij}(_{T }**then **_{ij }= 1 **and **_{ji }= 0; **else **_{ij }= 0.

In other words, two voxels whose activity is highly correlated and simultaneous are considered to be symmetrically linked; a voxel that is highly correlated with the future of another one will be considered as a "source", and the latter as a "sink". This approach clearly can break the symmetry of the covariance, but as described cannot deal with the problem of the transitivity described above. Taking into consideration the relatively poor temporal resolution of fMRI, we reasoned that for the time being we could only in earnest tackle one of the confounding sources of undirected links, namely the explanation of a zero-lag covariance (i.e. undirected link) between two voxels by the presence of a common source, of which they are both targets. That is, after identifying all sources and sinks, every potential undirected link that can be explained by a common source is removed. We also considered possible reductions of triangulations of directed links (as in

Results

We analyzed a dataset discussed in the literature

After standard functional pre-processing (see Methods), and delayed covariance analysis as explained above, the resulting networks were studied using the tools of statistical network theory. The first observation is that most links are undirected, comprising on average of 50% to 60% of total links; this is compatible with the notion that most neural interactions result from fast, local and presumably symmetric connections, whose subtle dynamics are for the most part beyond the present reach of functional MRI. However, directed links account for a significant number of the observed correlations between voxels, suggesting that our approach can indeed be fruitful in terms of capturing ongoing dynamics.

Although directed links have less statistical power than undirected ones, their degree distribution still shows a power-law behavior. This is exemplified in Fig.

(A) Degree distribution of links

(A) Degree distribution of links. The black trace corresponds to the degree for all the links (directed and undirected), the blue one to the degree of sources (out-directed links) and the red trace to sinks (in-directed links), averaged over subjects. Observe that (a) both directed link degree distributions are very similar, and (b) there is a scale-free trend, more marked when all links are considered. The dotted line corresponds to a power-law of 3/2. (B) Small-world topology of the networks. Clustering and average minimal path for functional networks (open circles), equivalent Erdös random networks (red crosses) and equivalent regular lattices (blue x's). Data points correspond to different subjects and tasks. (C) Assortative mixing of directed networks. The horizontal axis represents the total degree of each node, and the vertical axis the average degree of the first neighbors; all the data points correspond to one single subject.

Interestingly, the networks hold enough information about the dynamics of brain states so that even a global measure of their properties can discriminate between tasks. Figure _{a }> _{sv }and _{a }> _{lv}) yields a p-value of 0.017. This indicates that the subtle differences in activation elicited by the tasks have a measurable effect in the overall structure of correlations and flow of information of the networks. This tendency was probably amplified by the fact that only the

Discrimination of tasks based on global topological properties

Discrimination of tasks based on global topological properties. The first column of data points corresponds to the small visual cue task, the second to the auditory cue, and the third to the large visual cue in all panels. The upper row corresponds to female subjects, and the lower one to males. (A) Total number of nodes. (B) Mean degree of the networks. (C) Normalized mean path for each network. Observe the inverted V shape in 5 out of the 6 subjects, such that the normalized mean path of the auditory cue task is consistently larger than that of both visual cue tasks.

A remarkable regularity displayed by these networks is the tendency for nodes to be mostly "sources" (i.e. heavy out-hubs) or "sinks" (heavy in-hubs). That is, nodes with a large number of out-links tend to have relatively few in-edges, and vice versa, although, interestingly, this is not a strictly enforced rule. Moreover, in-hubs tend to have relatively few undirected links, whereas out-hubs tend to be also undirected hubs. This seems to be counter-intuitive at face value, as one may naïvely think that the hubs are balanced; however, they need not be so, as one would expect in, for instance, tracffic hubs. In other words, there are no conserved quantities at the hub level to be balanced. These results are summarized in Fig.

Panel A: relationship between the in-degree and the undirected-degree for each node; observe that there is a negative correlation, i.e. in-hubs tend to have very few undirected links and vice versa (insets correspond to covariance between the plotted variables)

Panel A: relationship between the in-degree and the undirected-degree for each node; observe that there is a negative correlation, i.e. in-hubs tend to have very few undirected links and vice versa (insets correspond to covariance between the plotted variables). Panel B: same as Panel A, but for the out-degree of the nodes; in this case, there is a strong correlation between out-hubs and undirected-hubs. Panel C: relationship between out- and in-degree for each node. The plot makes evident that nodes tend to be either in-hubs or out-hubs, as the correlation between the degrees is basically insignificant; moreover, large hubs tend to lay on the axes, i.e. they have a bias to be pure "sources" (horizontal) or "sinks" (vertical). Another way to see this phenomenon is presented in Panel D, where the maximum between the in- and out-degrees is plotted against the absolute value of the difference between the same quantities. As the plot shows, nodes tend to cluster near the identity line, which corresponds to pure sources and sinks.

Discussion

Topological regularities of functional brain networks have been described before in the literature, including our own work, but they were based on a narrower window on the properties of the underlying dynamical system (i.e. zero-lag correlation), and could not provide for discernibility of subtly different brain states. The finding that hybrid networks with directed and undirected links can be discriminated based on global topological measures is very relevant to theoretical approaches to brain function, as it is a formalization of the collective properties of complex systems.

However, to move beyond a simply phenomenological description of such systems, we are compelled to bridge the gap between emergent global behavior and local functional properties. One possibility is to interpret the results depicted in Fig.

Density maps of undirected and directed links

Density maps of undirected and directed links. The local degree of neutral or undirected connectivity is color-coded, such that high brightness signifies high degree. (A) Undirected links, small visual cue. (B) Undirected links, auditory cue. (C) Directed links, small visual cue. (D) Directed links, auditory cue. Observe the similarity between the undirected links maps for the two different tasks (A and B), while the directed links maps are remarkably different (C and D).

The other novel, and certainly unexpected, result is the one summarized in Figure

Conclusion

Building on our previous work on graph theoretic analysis of functional magnetic resonance imaging data, we have introduced a novel method to capture more of the complex dynamical interactions of brain networks. We find that this approach yields networks with similar topological properties as those described previously, i.e. they display scale-free connectivity, non-hierarchical architecture and small-world topology, even when considering only the giant component for the analysis and strictly enforcing directionality in the computation of the average mean path. However, the topological information contained in directed and undirected links is enough to reveal subtle brain state differences, consisting of a very short auditory or visual cue to trigger a relatively much longer finger-tapping motor sequence. Initial results suggest that these topological differences are concurrent with distinct patterns in the spatial distribution of hubs of directed links, i.e. the location of in- and out-hubs. These findings point in the direction of a functional dissection based on the density and architecture of directed connections, and more specifically on the spatial patterns of topological motifs, similar to approaches already advanced in the study of genetic networks

Finally, we conclude that the computational feasibility of the approach (see Methods), even when dealing with 10,000 to 20,000 independent variables, renders it applicable to other similarly large biological networks, like gene-expression patterns generated by cDNA microarrays

Methods

The implementation of the delayed covariance calculation is relatively straight-forward, although it can be challenging from a computational-resources point-of-view. In a typical functional task, there are in the order of 20,000 voxels with significant zero-lag covariance, for functional scans with a resolution of 64 × 64 × 36; for each of them we need to compute a delayed covariance. This was implemented in a 24-way shared-memory machine in an MPI environment, taking approx. 2 hrs. for scans of 400 volumes and a time window of 11 time points; the same algorithm, also in MPI, takes 8 minutes in a 1024-way IBM Blue Gene rack _{GC }= _{2 }_{2}(1 -_{GC }is close to the maximum of 1 for a range of thresholds that includes the one we selected. In all cases, the threshold for non-zero-lag was set 0.05 below that for zero-lag, for the reasons explained above.

Upper left: relative size of the giant connected component (

Upper left: relative size of the giant connected component (

Left: distance of the relative size of the giant connected component to 50%, averaged over different networks

Left: distance of the relative size of the giant connected component to 50%, averaged over different networks. This emphasizes that the giant component tends to be near 50% for the chosen threshold values. Right: change of the clustering for the giant component (blue) and the entire network (red).

The weeding of confounding undirected or neutral links was implemented as follows: if two nodes have an undirected link, _{g}(^{5}(1 - ^{6}, where ∏(_{a }> _{sv}, _{a }> _{lv}) = 1/3. The clustering is defined as the average of the ratio, for each node, between how many of its neighbors share connections amongst them, and the total possible number of connections _{i}/∏(_{i}, 2)⟩, where _{i }is the number of neighbors of node _{i }is the number of connections between the neighbors. This topological measure can also be interpreted as the density of triangulations of the network, as reflecting the presence of local structures. The theory developed by Erdös establishes that the geodesic path and clustering of a random network with _{xy }= ⟨(

Competing interests

It must be noted that G.A.C. and A.R.R. are employees of IBM Corp., whose Blue Gene supercomputer was utilized to claim computational tractability of the method.

Authors' contributions

GAC worked on problem formulation, solution design and implementation, ARR worked on solution design and implementation, MVC worked on data collection, MB worked on data collection and experimental design, AVA worked on experimental design and problem formulation, DRC worked experimental design, problem formulation and solution design. All authors read and approved the final manuscript.

Acknowledgements

G.A.C. would like to acknowledge useful discussions with G. Stolovitzky and J. Kozloski. Experimental work funded through NIH-NINDS grants 42660 and 35115.

This article has been published as part of