The Broad Institute of MIT & Harvard University, 320 Bent Street, Cambridge, MA 02141, USA

Howard Hughes Medical Institute, Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA 02138, USA

Abstract

Background

Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that can carry different weights. SpectralNET is a flexible application for analyzing and visualizing these biological and chemical networks.

Results

Available both as a standalone .NET executable and as an ASP.NET web application, SpectralNET was designed specifically with the analysis of graph-theoretic metrics in mind, a computational task not easily accessible using currently available applications. Users can choose either to upload a network for analysis using a variety of input formats, or to have SpectralNET generate an idealized random network for comparison to a real-world dataset. Whichever graph-generation method is used, SpectralNET displays detailed information about each connected component of the graph, including graphs of degree distribution, clustering coefficient by degree, and average distance by degree. In addition, extensive information about the selected vertex is shown, including degree, clustering coefficient, various distance metrics, and the corresponding components of the adjacency, Laplacian, and normalized Laplacian eigenvectors. SpectralNET also displays several graph visualizations, including a linear dimensionality reduction for uploaded datasets (Principal Components Analysis) and a non-linear dimensionality reduction that provides an elegant view of global graph structure (Laplacian eigenvectors).

Conclusion

SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. SpectralNET is publicly available as both a .NET application and an ASP.NET web application from

Background

The field of graph theory concerns itself with the formal study of graphs – structures containing vertices and edges linking these vertices. Scientifically, graphs can be used to represent networks embodying many different relationships among data, including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as nodes (vertices) and interactions (edges) that can carry different weights.

Graph-theoretic metrics, including eigenspectra, have been used to analyze diverse sets of data in the fields of computational chemistry and bioinformatics. Protein-protein interaction networks in

Despite the widespread use of graph theory in these fields, however, there are few user-friendly tools for analyzing network properties. SpectralNET is a graphical application that calculates a wide variety of graph-theoretic metrics, including eigenvalues and eigenvectors of the adjacency matrix (a simple matrix representation of the nodes and edges of a graph)

Implementation

SpectralNET was originally written as an ASP.NET application in C#, and has subsequently been ported to a standalone .NET executable version (also written in C#). ASP.NET was originally chosen because it offered a fast, easy way to offer a thin client to users, obviating the need for large amounts of computational power on the client machine, as is often needed to perform large matrix calculations. A standalone version was created for three primary reasons: it avoids the problem of time-outs inherent when using a web interface (a potential issue when performing long-duration calculations), it is more easily distributable, and porting from ASP.NET to a .NET executable is a relatively simple matter.

Many computations are performed directly in C#, such as graph instantiation and metric calculation. Matrix computations (including eigendecomposition) are performed using the NMath Suite (CenterSpace Software, Corvallis, Oregon). Because the NMath Suite is a commercially licensed library, those receiving source code from the authors must supply their own means of performing matrix eigendecomposition in order to modify and redeploy the application. The implementation of the Fermi-Dirac integral, used in the calculation of spectral density, is ported from Michele Goano's implementation in FORTRAN (Goano, 1995). Because SpectralNET uses a third-party library for matrix calculations that is partially implemented using Managed Extensions for C++, SpectralNET will not be portable to Linux until the Mono implementation of this C++ language feature is complete.

Results and discussion

Graph creation

Idealized random networks can be automatically generated by the application, or networks can be uploaded by the user for analysis. SpectralNET can automatically generate random Erdos-Renyi graphs

Networks can be uploaded by the user in the form of a Pajek file

Human PPI network definition file. Network definition file representing a network of human protein-protein interactions. Data for this network was parsed from the MIPS Mammalian Protein-Protein Interaction Database. The numbers contained in this file correspond to the "shortLabel" annotation of proteins in the XML representation of the MIPS database.

Click here for file

Graph analysis

After processing the input network, SpectralNET displays for the user a wide variety of graph-analytic metrics. For example, the degree and clustering coefficient is displayed for each vertex. The degree of a vertex is the number of edges incident upon that vertex; for weighted graphs, SpectralNET calculates this as the sum of these edges' weights. The clustering coefficient of a node represents the proportion of its neighbors that are connected to each other, and is calculated for a node

where _{i }denotes the number of edges connecting neighbors of node _{i }denotes the number of neighbors of node

the Laplacian matrix is defined as the matrix L with the following elements:

where

where _{i }denotes the degree of node

Many large networks derived from biological data are composed of multiple subgraphs that are not always connected together. SpectralNET computes many properties based on the selected or "active" connected component. For the active connected component, its size and average diameter are displayed in addition to graphs of degree distribution

where _{j }represents the eigenvector. Spectral density, or the density of the eigenvalues, is plotted for each eigenvalue as

where

Visualization and dimensionality reduction

The main graph display window of SpectralNET offers two interactive graphical networks displays that support zooming and allow vertex selection by mouse-click. The default display view is the resulting graph processed by the Fruchterman-Reingold algorithm

In conjunction with uploaded raw data, Laplacian embedding allows the user to see a reduced-dimensionality view of high-dimensionality input, once this input is converted into a network. If the user chooses to process input data using the Eigenmap algorithm, Laplacian embedding shows the reduced-dimensionality result

Example analysis of a randomly-generated small-world network and a biological scale-free network

SpectralNET provides an easy-to-use interface for creating a randomly generated small-world network. All that is required is to supply the desired number of nodes, the desired number of neighbors to which to connect each node, and the desired random probability that an edge is re-wired. For this example we create a network with 300 nodes in which each node is connected to four neighbors, and edges are rewired with 4% probability.

The default view of the graph is its Fruchterman-Reingold display, which, as noted above, uses force-directed placement to draw graph nodes (Figure

Fruchterman-Reingold display of a small-world network

**Fruchterman-Reingold display of a small-world network**. Fruchterman-Reingold display of a randomly generated small-world graph. The node selection panel and node information panel are visible to the left of the display.

Laplacian embedding of a small-world network

**Laplacian embedding of a small-world network**. Laplacian embedding of the randomly generated small-world network depicted in Figure 2, as drawn by SpectralNET.

Real-world biological networks are also amenable to topological analysis using Laplacian embeddings. In order to generate a suitable biological network to analyze, the MIPS Mammalian Protein-Protein Interaction Database

Virtual Reality Modeling Language (VRML) diagram of a human protein interaction network

**Virtual Reality Modeling Language (VRML) diagram of a human protein interaction network**. Laplacian embedding of a scale-free biological network generated from a curated online database of protein interactions in humans (MIPS Mammalian Protein-Protein Interaction Database). For data see

In addition to the graphical display of networks, SpectralNET enables analysis of spectral properties of input networks, which can shed light on graph topology. One way this can be achieved is to compare a small-world network similar to, but not identical to, the randomly generated small-world network described above. This graph is a small-world network created by attaching complete subgraphs, varying in size from three to six nodes, to nodes arrayed in a ring (see

Small-world network definition file. Network definition file for a 33-node small-world network with attached complete subgraphs.

Click here for file

Laplacian embedding of an uploaded small-world network

**Laplacian embedding of an uploaded small-world network**. Laplacian embedding of a small-world network (n = 33) created by attaching complete subgraphs to nodes arrayed in a ring. The subgraphs each appear as a single point because their constituent nodes have identical connectivity profiles, yielding identical Laplacian eigenvector components.

Comparison of spectral properties of two small-world networks

**Comparison of spectral properties of two small-world networks**. Plots of spectral density of the adjacency and Laplacian eigenvalues for a randomly-generated Erdos-Rényi graph, a randomly-generated Watts-Strogatz graph, a randomly-generated Barabási-Albert graph, and the small-world network depicted in Figure 4 consisting of complete subgraphs attached to nodes arrayed in a ring. The input small-world network is most similar to the randomly-generated Watts-Strogatz network, since they have the most similar topologies.

Dimensionality reduction of a real-world chemical dataset to analyze QSAR

In addition to performing spectral analysis of networks, SpectralNET can also perform dimensionality reduction on chemical datasets to analyze quantitative structure activity relationships (QSAR). In this example, we upload a set of chemical descriptor data into SpectralNET and analyze it using the Laplacian Eigenmap algorithm originally developed by Belkin and Niyogi

The Laplacian Eigenmap algorithm in SpectralNET connects these small molecules to their

where _{ij }represents the weight of an edge connecting edges

The resultant Laplacian embedding of the graph, which can be viewed by selecting the "Laplacian Embedding" radio button underneath the graph view pane, is the reduced dimensionality result of the Laplacian Eigenmap algorithm (Figure

Laplacian Eigenmap result for a molecular descriptor dataset

**Laplacian Eigenmap result for a molecular descriptor dataset**. A network of small molecules encoded as molecular descriptors, connected by similarity and displayed using the Laplacian Eigenmap algorithm, which plots each small molecule according to its corresponding Laplacian eigenvector components. Small molecules are colored according to the value of their minimized energy, one of the molecular descriptors of the original dataset.

Because Laplacian Eigenmaps is a local, rather than global, algorithm, it seeks to preserve local topological features of the data in its reduced-dimensionality space

Comparison of chemical structures from Laplacian Eigenmap clusters

**Comparison of chemical structures from Laplacian Eigenmap clusters**. Comparison of chemical structures from the example real-world dataset of molecular descriptors depicted in Figure 5, taken either (A) from the group labeled "A", (B) from the group labeled "B", or (C) at random from the entire set.

Principal Components Analysis result for a molecular descriptor dataset

**Principal Components Analysis result for a molecular descriptor dataset**. The network of small molecules depicted in Figure 5, displayed using the first two principal components of the data as derived from PCA. Small molecules are colored according to the values of their minimized energies.

Conclusion

SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. The software allows users to analyze idealized random networks or uploaded real-world datasets, and exposes metrics like the clustering coefficient, average distance, and degree distribution in an easy-to-use graphical manner. In addition, SpectralNET calculates and plots eigenspectra for three important matrices related to the network and provides several powerful graph visualizations.

SpectralNET is available as both a standalone .NET executable and an ASP.NET web application. Source code is available by request from the author.

Availability and requirements

**Project name: **SpectralNET

**Project home page: **

**Operating system(s): **Windows

**Programming language: **C#

**Other requirements: **The .NET framework v1.1 or higher

**License: **The SpectralNET software is provided "as is" with no guarantee or warranty of any kind. SpectralNET is freely redistributable in binary format for all non-commercial use. Source code is available to non-commercial users by request of the primary author. Any other use of the software requires special permission from the primary author.

**Any restriction to use by non-academics: **Contact authors

Authors' contributions

JF developed and tested the software, wrote the initial version of the manuscript, and co-designed the software; PC provided feedback and data for molecular descriptor analysis, assisted with design of the software, and edited the manuscript; SS provided project guidance and edited the manuscript; SH initially conceived of and co-designed the software and edited the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We gratefully acknowledge the Broad Institute of Harvard University and MIT, the National Cancer Institute (Initiative for Chemical Genetics), and the National Institute of General Medical Sciences (Center of Excellence for Chemical Methodology and Library Development) for support of this research. S.L.S. is an Investigator at the Howard Hughes Medical Institute.