Abstract
Background
Structural measures for networks have been extensively developed, but many of them have not yet demonstrated their sustainably. That means, it remains often unclear whether a particular measure is useful and feasible to solve a particular problem in network biology. Exemplarily, the classification of complex biological networks can be named, for which structural measures are used leading to a minimal classification error. Hence, there is a strong need to provide freely available software packages to calculate and demonstrate the appropriate usage of structural graph measures in network biology.
Results
Here, we discuss topological network descriptors that are implemented in the Rpackage QuACN and demonstrate their behavior and characteristics by applying them to a set of example graphs. Moreover, we show a representative application to illustrate their capabilities for classifying biological networks. In particular, we infer gene regulatory networks from microarray data and classify them by methods provided by QuACN. Note that QuACN is the first freely available software written in R containing a large number of structural graph measures.
Conclusion
The R package QuACN is under ongoing development and we add promising groups of topological network descriptors continuously. The package can be used to answer intriguing research questions in network biology, e.g., classifying biological data or identifying meaningful biological features, by analyzing the topology of biological networks.
Background
Understanding the structure and dynamics of biological systems has been a major task in systems biology [1]. In the early years of computational biology, the main task was to investigate the individual properties of intracellular components and collect this information in large databases [2]. Palsson defines biological systems as interactions of their components [3]. Furthermore, the development of highthroughput technologies made it possible to study these complex systems in a quantitative manner [4]. Moreover, gene networks, whose nodes represent gene products and the edges correspond to molecular interactions, serve as means to study the biological function by representing and analyzing highthroughput data [2].
Network inference plays a major role in network biology, as there exist various methods to infer networks from highthroughput data [59]. By using the WGCNA package [10] it is possible to create correlation networks. One can use the minet package [6] to infer networks based on mutual information. Other packages [1113] offer methods to infer networks using different kinds of graphical models. Moreover, Altay and EmmertStreib introduced the C3NET algorithm to infer the conservative causal core of gene networks and compared them to other approaches [5]. Their study shows the importance of correctly creating robust and valid networks from biological data. Note that it is crucial to choose suitable methods for inferring networks from biological data, in order to take the nature and constraints of the underlying problem into account [5]. After inferring gene networks, it is often important to analyze them structurally to conclude statements about the underlying topology [14,15]. Moreover, the structural analysis of biological networks can be useful to extract biological knowledge that may not be revealed by studying the raw data [16]. Typical problems aim at identifying of topological interesting nodes or characterizing the networks by means of their structure. Therefore, we provide an R package called QuACN [17] providing a selection of new topological network descriptors. Such descriptors are numerical graph invariants that quantitatively characterize the structure of the underlying network. Note, that the authors use the words descriptor, measure, or index as synonym for topological network descriptors.
Quantifying the complexity of networks appears in different scientific disciplines and has been a challenging research topic during the last decades [15]. Importantly, little is known about the structural interpretation of topological network descriptors [14,15]. This relates to informationtheoretic measures [14,1821] that had been used to determine the entropy of the graph topology. Other topological network descriptors had been used also in mathematical and medical chemistry including drug design to analyze and characterize the structure of chemical compounds (QSAR/QSPR) [15,2224].
In more biologically motivated work, Xia et al. [25] used the vertex degree of proteinprotein interaction (PPI) networks to correlate the structural complexity of proteins and the organismal complexity with the complexity of the underlying PPI network. They show that the PPI domain coverage significantly correlates with the vertex degrees of the PPI networks [25]. In another study, Mazurie et al. [26] used different network measures to link the structure and complexity of metabolic reactions (interacting pathways) to the phylogeny of species. Their results show that a small set of descriptors reproduces the phylogenetic distances accurately [26].
Numerous network measures have been developed, but it would be out of the scope of this paper to explain them in detail. For further investigation see the recently and up to date review due to Dehmer and Mowshowitz [27]. Apart from informationtheoretic measures, Todeschini et al. [24] provides a compelling overview of available network descriptors. But from [24], the feasibility and properties of a large number of descriptors remain untackled.
QuACN provides a selection of topological network descriptors. It offers the possibility to apply the indices in a standardized and intuitive manner. Thus, it can support the scientific community to investigate these methods in different kinds of biological applications. A typical setup for a study to analyze biological networks structurally is illustrated in Figure 1. It shows a general workflow to analyze microarray studies using a network approach with topological network measures.
Figure 1. Illustrative figure of a structural network analysis of microarray data. This figure illustrates a typical workflow in network biology to analyze microarray data. After inferring a network from microarray data, it is often important to analyze it structurally to conclude statements about the underlying topology [14,15]. To underpin statements about the topology it can be necessary to validate them biologically. Also, this workflow can be adapted for different kinds of biological data.
Of course, there also exist freely available tools, e.g., PowerMV [28] or JOELib [29] to calculate network descriptors. However, these tools are designed for quantitative structureactivity relationship (QSAR). Thus, they do not support common exchange standards for biological data. Compared to commercial software tools as Dragon [30] or PreADMET [31], QuACN is published under an open source license (LPGL) and freely available. Therefore, it offers the possibility to adjust and further develop the existing indices or even add additional descriptors to the package. Compared to the Rpackages igraph [32] and RBGL [33], which contain a few basic descriptors, QuACN contains a selection of more sophisticated network descriptors (i.e., the group of entropybased descriptors). To our best knowledge, it is the only available software package that contains sophisticated measures such as the parametric graph entropies (Dehmer entropy) [34]. We recommend QuACN to investigate largescale complex networks. Further, we expect that the package will be helpful for exploring questions concerning the structure of biological networks in the context of systems biology.
Generally, quantitative network analysis [35] is a nontrivial task, since it is necessary to understand the methods in detail to interpret the results correctly. This manuscript addresses readers who want to analyze networks structurally. Its aim is to guide the reader to correctly apply the methods provided by QuACN [17]. This manuscript does not deal with the issue of inferring robust and valid networks. Neither does it explain the network measures in detail nor how to interpret the results of the topological networks descriptors, as this would go beyond the scope of this paper. Dehmer et al. dealt with these questions extensively [15,27]. This paper is structured as follows: The section Implementation gives an overview about the topological network descriptors, implemented in the Rpackage QuACN. The section Results and Discussion illustrates how to apply the topological descriptors to concrete networks. Also, we show the behavior of selected measures using small example graphs. Moreover, we demonstrate their performance by applying them to biological networks. Further, we illustrate possible use cases using topological network descriptors for performing a quantitative analysis of biological networks. The section Summary and Outlook concludes and summarizes the paper and outlines future developments.
Implementation
We implemented a selection of topological network descriptors discussed in [15,27]. Table 1 gives an overview about all implemented network measures with the name of the function to call the corresponding descriptors in R. For a detailed description of all implemented descriptors in QuACN, see the package vignette or additional literature [24,27].
Table 1. Overview about the implemented topological network descriptors
The measures can be categorized within the following groups:
Descriptors based on distances in a graph
This class contains measures that use distances between nodes to capture the structural complexity of the underlying network. A famous and classical representative of this group is the Wiener index [36] that has been defined by the sum of all distances within the network. We also integrated a group of basic distancebased descriptors introduced by Skorobogatov and Dobrynin [37].
Descriptors based on other graph invariants
The descriptors in this class use other graph invariants than distances (e.g. degree, number of vertices, number of edges, etc.) to characterize the structural complexity of complex biological networks. For example, the Zagreb group indices [38] are based on the degree of the vertices. The normalized edge complexity [39] is calculated by using the adjacency matrix and the number of vertices.
Information measures
For an extensive overview of measures of this class, see [16,20,27].
• Partitionbased graph entropy descriptors
These measures use an arbitrary graph invariant and an equivalence criteria to induce partitions. A probability value is calculated for each partition to determine the entropy, based on the entropy formula due to Shannon [19]. The topological information content introduced by Rashevsky [14] and reformulated by Trucco [21] is based on partitions of vertices that are in the same vertex orbit, to calculate the entropy of a graph. Additionally, Mowshowitz [19] investigated mathematical properties of the index to characterize product graphs and other sophisticated measures such as the chromatic information content of a graph.
• Parametric graph entropy measures
Measures of this class [27,34] assign a probability value to each vertex of a graph, using socalled information functionals (IFs) which capture structural information of the network. A special information functional quantifies the structural information by using the cardinalities of the corresponding jspheres [34]. The derived probability distribution is used to calculate the entropy, which has been called Dehmer entropy [34].
As mentioned above, it is not the aim of this manuscript to describe all descriptors in detail. For a better understanding of the used descriptors see the vignette of QuACN and the extensive work of Dehmer and Mowshowitz [27] on information measures for networks.
QuACN is entirely written in R and detailed help is available according to the R documentation standards.
Results
The examples below show the functionality of QuACN by using a selection of small example graphs, which are shown by Figure 2. Our goal is to show how the methods work and to apply the measures to a multitude of complex networks that may lead to novel applications in the field.
Figure 2. Small example graphs. This figure lists 6 small example graphs to illustrate the correct application of the topological network descriptors implemented in QuACN.
Example Graphs
To demonstrate the usefulness of topological network descriptors, we consider Figure 2, showing six undirected example graphs. An undirected graph or network G = (V, E) consists of a nonempty vertex set V. E is called the edge set of G and is the set of unordered pairs of elements of V. We calculate exemplary a set of descriptors consisting of the Wiener index W(G) [36], the Balabanlike index X(G) [40], the topological information content I_{orb}(G) [14,21] and the Dehmer entropy I_{f}v (G) [34]. The results are shown in Table 2.
Table 2. Selected descriptors for the small example graphs
Calling the corresponding methods in R can be done in different ways. The following example shows how to calculate the Wiener index from the graphNELobject g, representing the example graph (a) in Figure 2.
> wiener(g)
[1]56
As all descriptors are implemented as Rfunctions it is possible to easily calculate them for a set of graphs using the methods from the applyfamily.
> sapply(glist,balabanlike2)
(a) (b) (c)
0.5978703 0.6932045 0.8190124
(d) (e) (f)
1.0491707 1.1451745 1.8204321
Note that each descriptor has at least two parameters as listed in Table 3. However, passing the distance matrix to the corresponding function is optional. If the parameter remains empty or is set to NULL the distance matrix will be calculated within each function. If calculating more than one descriptor for one graph, it is recommended to calculate the distance matrix separately and pass it to each method, instead of recalculating it again. Particularly when using large networks it can save a lot of time to calculate the distance matrix only once. It will enhance the performance of the calculations significantly. We demonstrate the precalculation of the distance matrix in the next example, where we calculate four descriptors for the example graphs in Figure 2. The results of the below listed function call are listed in Table 2.
Table 3. Common parameters for each function in QuACN
> descriptors < sapply(glist, function(g){
+ dm < distanceMatrix(g)
+ result = list()
+ result[["Wiener"]] < wiener(g, dist = dm)
+ result[["BalabanLike2"]] < balabanlike2(g, dist = dm)
+ result[["topologicalInfoContent"]] <
+ topologicalInfoContent(g, dist = dm)$Iorb
+ result[["Dehmer_jsphere"]] <
+ infoTheoreticGCM(g,
+ dist = dm,
+ coeff="exp",
+ infofunct="sphere",
+ lambda = 1000)$entropy
+ return(result)
+ })
Calling topological information content [14,19,21] and the Dehmer entropy [34] returns a list of different variables. In the example we only use the entropy value of the descriptor. The call of the function works like all other methods, but it returns a list of different values. To explain the result of this function we apply it to graph (c) in Figure 2:
> topologicalInfoContent(glist[[3]])
$entropy
[1]1.378783
$orbits
[1]4 2 1
The implementation of the topological information content returns a list containing the entropy ($entropy) and the number of nodes within the same orbit ($orbits). This information can be used for different other applications, e.g. to determine a graph prototype, see [41].
The numerical results of the foregoing example can be seen in Table 2. The visual representation of the normalized results in Figure 3 shows the different behavior of the topological network descriptors using the example graphs. The example graphs start with a linear graph (a) and the branching of the graphs increases towards (f). In this context, branching correlates with the number of terminal vertices (endvertices) [42]. The Wiener index is known as an index to detect molecular branching [24], and one can see that the Wiener index represents increasing branching with decreasing values. Furthermore we can see in this example, that the Balabanlike index X(G) also detects branching well. Note, that its values are just given in a reverse order. The topological information content is based on partitions of vertices that are in the same vertex orbit. But calculating I_{orb }shows that the quantity does not reflect branching properly. As known, I_{orb }is a symmetrybased measure rather than an index for structural complexity [27]. In this example, the Dehmer entropy with monotonously decreasing weighting parameter c_{i }and the information functional using the jspheres, neither reflects branching appropriately. The information functional using the jspheres [34] itself has been used to investigate the information spread in a network [43,44]. However, with a different parameter setting, the Dehmer entropy reflects branching of certain networks meaningfully [45].
Figure 3. Visualization of normalized values for selected descriptors for the small example graphs. This figure illustrates the behavior of selected topological network descriptors applied to the small example graphs listed in Figure 2.
However, this simple but demonstrative example indicates that not every topological information index is suitable for a particular problem. It is a challenging task to derive general statements about the structural complexity captured by such measures [15]. It is even harder to connect biological properties with topological network descriptors. Despite the fact that we often do not know the biological interpretation of topological network measures exactly, they can be helpful in a broad range of biological questions. For example, classifying biological data or identifying meaningful biological features, by analyzing the topology of biological networks.
To conclude this section, we want to emphasize that one has to understand the selected descriptors and measures in detail to interpret the results correctly. Note that topological network analysis is a nontrivial task and one has to know specific properties of the descriptors to solve a particular problem dealing with networks. One example is the group of Balabanlike indices X(G) and U(G). For a graph with two vertices connected with one edge the index is defined as infinite. That is also returned by the QuACNmethod but accompanied by a warning:
> g = new("graphNEL")
> # add nodes
> g = addNode("1",g)
> g = addNode("2",g)
> g = addEdge("1","2",g,1)
> balabanlike1(g)
[1]Inf
Warning message:
In balabanlike1(g): Graphs with
V < 3 result in: Inf!
It is important to know how the different descriptors are defined, when processing and interpreting the results. Note, that not each combination of networks and descriptors could be tested and considered within the exception handling. Keep in mind that applying QuACN to concrete networks can result in special values (i.e.: infinite (Inf), not available (NA) or not a number (NaN)).
The next section shows an example of a possible application of QuACN with biological networks. We will also use this chapters to explain the usage of more complex descriptors implemented in QuACN.
Supervised Machine Learning for Prostate Cancer Networks
In this section, we present an application of topological network descriptors to classify gene networks inferred from gene expression data. Note, we do not aim to justify networkbased approaches itself and compare them to alternative approaches. In fact, a large body of literature dealing with networks does exist, i.e., see [2,4,16].
This example was chosen to explain a possible application of topological network descriptors on biological data. Therefore, we will focus on the methodical usage of the network measures and not on the biological interpretation of the results.
To perform our analysis, we selected seven public available studies of prostate cancer from NCBI GEO and EBI Arrayexpress and inferred networks using the C3NET inference method [5]. This resulted in seven networks representing benign tissue (from the control group) and seven networks representing cancer tissue. Then we extracted subgraphs from these networks based on the gene ontology (GO) database [46]. For each network and each GOterm we extracted one subgraph containing the genes associated with this specific GOterm. This resulted in a total of 159 networks representing benign tissue and 108 networks representing cancer tissue. The numbers are different because the network structure of and is different and, hence, not all pathways are captured by these networks. Whenever a subnetwork contained less than 10 genes associated with a GOterm, we excluded this pathway from the analysis. The obtained network sets can be seen as an approximation of two populations. One population represents benign and the second cancerous molecular interactions.
Additionally, we calculated all topological network descriptors available in QuACN, as feature vectors for each of these networks. Afterwards, we performed feature selection and classification using random forest with 10fold crossvalidation (CV). In order to correct the selection bias, an external cross validation is applied to the selection process [47]. In particular, we performed the selection process within each CVloop [48]. We trained the classifier to classify cancer networks versus benign networks, what lead to a mean classification performance of a Fscore of 0.80 and an accuracy of 0.74. This demonstrates that the topological network descriptors, integrated in QuACN, are able to capture group specific structural features meaningfully to distinguish between networks representing prostate cancer and benign tissue. Importantly, this result is not trivial as one could easily show by using other measures or only a particular fraction thereof, the classification task would result in a random classification. Hence, this result would not be feasible in practice.
As already mentioned we won't focus on a biological representation of the results, as it is the aim of this publication to discuss the methodical perspective of the presented Rpackage.
One of the measures that showed a significant group effect was the Dehmer Entropy [43]. The Dehmer entropy is a complex measure with several parameters. It is possible to choose the information functional f(v_{i}), the weighting parameter c_{i }and the scaling constant λ [49]. The means of these parameters has been discussed in [43]. The user can specify four different information functionals using jspheres, path lengths, vertex centrality or degreedegree associations [43,49]. We implemented different presettings for the weighting parameter c_{i}: constant, linear, quadratic or exponential. A customized setting for c_{i }can also be declared. The following example shows how to call the function to calculate a Dehmer entropy. The information functional using jspheres with an exponential setting for c_{i }and a scaling constant λ = 2500 are used.
> infoTheoreticGCM(gl[[3]], infofunct="sphere",
+ coeff="exp", lambda = 2500)
$entropy
[1]2.743221
$distance
[1]160.3339
$pis
1 2 3 4
0.1057720 0.1952924 0.1863273 0.1952924
5 6 7
0.1057720 0.1057720 0.1057720
$fvis
1 2 3 4
7.882673 14.554200 13.886071 14.554200
5 6 7
7.882673 7.882673 7.882673
This function returns a list containing a more comprehensive result than the other measures. Certainly, the list contains the Dehmer entropy denoted by $entropy. The list entry named $distance contains the distance of the entropy from maximum entropy [43]. It also returns the results of calculating the information functional ($fvis) and the corresponding probability distribution ($pis). The probability distribution can later be used for further analysis, i.e. estimating the graph prototype of a set of networks [41].
Conclusion
The freely available open source Rpackage QuACN contains a selection of topological network descriptors. The aim of this manuscript was to explain, how to apply the implemented descriptors correctly to complex biological networks using R. To provide a basic understanding of the application we demonstrated the behavior of the indices by applying them to small example networks. Moreover, we presented an application for supervised machine learning from biological networks by using topological network descriptors. Within these examples we demonstrated the correct usage of the methods included in QuACN. Machine learning is not the only application that topological network descriptors can be used for. They also can be utilized to compare networks. In this sense, Kugler et. al. [41] calculated the KullbackLeibler divergence to perform an integrative network analysis.
Topological network descriptors have been standard methods in the field of quantitative structure property activity relationship (QSAR/QSPR) [22,34]. The methods implemented in QuACN had already been used for QSAR/QSPR applications, see [22,34]. Further applications of informationtheoretic measures had been discussed by Dehmer and Mowshowitz [27].
The indices integrated in QuACN can also be efficiently applied on large networks as their calculation requires polynomial time complexity. However, there also exist some indices whose algorithms are NPcomplete (e.g., descriptors based on the subgraph isomorphism problem [50] or the Hosoya index [51]), but they have not been integrated in the package. Importantly, not every index is suitable for any application in network biology and it strongly depends on the underlying research question which measures can be considered as appropriate.
Using the concept of advanced network descriptors is relatively new in systems biology. Advanced network descriptors are able to quantify specific topological characteristics of the underlying network but the interpretation of the structural properties of the applied measures is still an ongoing task [15]. However, modeling biological systems as networks had become an important task in recent systems biology research and created a need for methods to analyze them structurally. Therefore, the topological network measures provided by QuACN can stimulate the research in this field. However, a thorough analysis to investigate the behavior of topological information indices on biological networks is planed to be performed.
As future work, we plan to apply the integrated measures on various biological research questions, and to extend the range of functions with new promising descriptors for coming versions of QuACN. The next step is to integrate a group of already existing polynomialbased descriptors [22,52]. Finally, we are convinced that this package will turn out to be useful for a community dealing with network biology [16].
Availability and requirements
Project name: QuACN  Quantitative Analysis of Complex Networks
Project home page: http://cran.rproject.org/web/packages/QuACN/ webcite
Operating system(s): Platform independent
Programming language: R (http://www.rproject.org webcite)
License: LGPL
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
LAJM and KGK implemented and tested the Rpackage, performed the analysis and interpreted the results. LAJM, KGK, AG, FES and MD and wrote the manuscript. MD supervised the study. All authors read and approved the final manuscript.
Acknowledgements
Matthias Dehmer thanks the Austrian Science Funds for supporting this work (project P22029N13). This work was also partly supported by the Tiroler Wissenschaftsfonds and the Standortagentur Tirol (Tiroler Zukunftsstiftung). We thank Matthias Wieser and Andreas Dander who helped to develop the R package.
References

Kitano H: Systems Biology: A Brief Overview.
Science (New York, NY) 2002, 295:16624. Publisher Full Text

EmmertStreib F, Glazko GV: Network Biology: A Direct Approach to Study Biological Function.
Wiley Interdisciplinary Reviews. Systems biology and medicine 2010, 127.

Palsson B: Systems Biology: Properties of Reconstructed Networks. Cambridge University Press; 2006.

Barabási AL, Oltvai ZN: Network Biology: Understanding the Cell's Functional Organization.
Nature Rreviews Genetics 2004, 5(2):10113. Publisher Full Text

Altay G, EmmertStreib F: Inferring the Conservative Causal Core of Gene Regulatory Networks.
BMC Systems Biology 2010, 4:132. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Meyer PE, Lafitte F, Bontempi G: minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.
BMC Bioinformatics 2008, 9:461. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Adourian A, Jennings E, Balasubramanian R, Hines WM, Damian D, Plasterer TN, Clish CB, Stroobant P, McBurney R, Verheij ER, Bobeldijk I, van der Greef J, Lindberg J, Kenne K, Andersson U, Hellmold H, Nilsson K, Salter H, SchuppeKoistinen I: Correlation Network Analysis for Data Integration and Biomarker Selection.
Molecular BioSystems 2008, 4(3):249259. PubMed Abstract  Publisher Full Text

Meyer PE, Marbach D, Roy S, kellis M: InformationTheoretic Inference of Gene Networks Using Backward Elimination.
Conference on Bioinformatics & Computational Biology (BIOCOMP'10), Las Vegas/USA 2010, II:700705. PubMed Abstract  Publisher Full Text

Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.

Langfelder P, Horvath S: WGCNA: An R Package for Weighted Correlation Network Analysis.
BMC Bioinformatics 2008, 9:559. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

OpgenRhein R, Strimmer K: From correlation to causation networks: a simple approximate learning algorithm and its application to highdimensional plant gene expression data.
BMC Systems Biology 2007, 1:37. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Scutari M: Learning Bayesian Networks with the bnlearn R Package.
Journal of Statistical Software 2010, 35(3):122. PubMed Abstract  PubMed Central Full Text

Rashevsky N: Life, Information Theory, and Topology.
Bulletin of Mathematical Biophysics 1955, 17:229235. Publisher Full Text

Dehmer M, Barbarini N, Varmuza K, Graber A: A Large Scale Analysis of InformationTheoretic Network Complexity Measures Using Chemical Structures.

EmmertStreib F, Dehmer M: Networks for Systems Biology: Conceptual Connection of Data and Function.
IET Systems Biology 2011, 5(3):185207. PubMed Abstract  Publisher Full Text

Mueller LA, Kugler KG, Dander A, Graber A, Dehmer M: QuACN: An R Package for Analyzing Complex Biological Networks Quantitatively.
Bioinformatics 2011, 27:140141. PubMed Abstract  Publisher Full Text

Dehmer M, Borgert S, EmmertStreib F: Entropy Bounds for Hierarchical Molecular Networks.
PloS ONE 2008, 3(8):e3079.. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mowshowitz A: Entropy and the Complexity of the Graphs I: An Index of the RelativeComplexity of a Graph.
Bulletin of Mathematical Biophysics 1968, 30:175204. PubMed Abstract  Publisher Full Text

Bonchev D: Information Theoretic Indices for Characterization of Chemical Structures. Research Studies Press, Chichester; 1983.

Dehmer M, Sivakumar L, Varmuza K: Uniquely Discriminating Molecular Structures Using Novel Eigenvalue Based Descriptors.
MATCH Communications in Mathematical and in Computer Chemistry 2012, 67:147172.

Bonchev D, Mekenyan O, Trinajstić N: Isomer Discrimination by Topological Information Approach.
Journal of Computational Chemistry 1981, 2(2):127148. Publisher Full Text

Todeschini R, Consonni V, Mannhold R: Handbook of Molecular Descriptors. WileyVCH [Weinheim, Germany]; 2002.

Xia K, Fu Z, Hou L, Han J: Impacts of ProteinProtein Interaction Domains on Organism and Network Complexity.
Genome Research 2008, 18(9):1500. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mazurie A, Bonchev D, Schwikowski B, Buck G: Phylogenetic Distances are Encoded in Networks of Interacting Pathways.
Bioinformatics 2008, 24(22):2579. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Dehmer M, Mowshowitz A: A History of Graph Entropy Measures.
Information Sciences 2011, 181:5778. Publisher Full Text

Liu K, Feng J, Young S: PowerMV: A Software Environment for Molecular Viewing, Descriptor Generation, Data Analysis and Hit Evaluation.
Journal of Chemical Information and Modeling 2005, 45(2):515522. PubMed Abstract  Publisher Full Text

Wegner J, Zell A: JOELib: A Java Based Computational Chemistry Package.
6th Darmstädter MolecularModelling Workshop 2002. PubMed Abstract  PubMed Central Full Text

Todeschini R, Consonni V, Mauri A, Pavan M: Software Dragon: Calculation of Molecular Descriptors, Department of Environmental Sciences.

Lee SK, Lee IH, Kim HJ, Chang GS, Chung JE, No KT: The PreADME Approach: WebBased Program for Rapid Prediction of PhysicoChemical, Drug Absorption and DrugLike Properties.
euro QSAR 2002  Designing Drugs and Crop Protectants: Processes Problems and Solutions 2002.

Csardi G, Nepusz T: The igraph Software Package for Complex Network Research, Complex Systems:1695.

RBGL: An Interface to the BOOST Graph Library.
[R package version 1.2]

Dehmer M, Varmuza K, Borgert S, EmmertStreib F: On Entropybased Molecular Descriptors: Statistical Analysis of Real and Synthetic Chemical Structures.
Journal of Chemical Information and Modeling 2009, 49:16551663. PubMed Abstract  Publisher Full Text

Dehmer M, EmmertStreib F (Eds): Analysis of Complex Networks: From Biology to Linguistics. Wiley VCH Publishing; 2009.

Wiener H: Structural Determination of Paraffin Boiling Points.
Journal of the American Chemical Society 1947, 69:1720. PubMed Abstract  Publisher Full Text

Skorobogatov VA, Dobrynin AA: Metrical Analysis of Graphs.
MATCH Communications in Mathematical and in Computer Chemistry 1988, 23:105155.

Diudea MV, Gutman I, Jäntschi L: Molecular Topology. Nova Publishing [New York, NY, USA]; 2001.

Bonchev D, Rouvray DH: Complexity in Chemistry, Biology, and Ecology. Mathematical and Computational Chemistry, Springer [New York, NY, USA]; 2005.

Balaban AT, Balaban TS: New Vertex Invariants and Topological Indices of Chemical Graphs Based on Information on Distances.
Journal of Mathematical Chemistry 1991, 8:383397. Publisher Full Text

Kugler K, Mueller L, Graber A, Dehmer M: Integrative Network Biology: Graph Prototyping for CoExpression Cancer Networks.
PLoS ONE 2011, 6(7):e22843. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bonchev D, Trinajstić N: Information Theory, Distance Matrix and Molecular Branching.
Journal of Chemical Physics 1977, 67:45174533. Publisher Full Text

Dehmer M: Information Processing in Complex Networks: Graph Entropy and Information Functionals.
Applied Mathematics and Computation 2008, 201:8294. Publisher Full Text

Dehmer M: InformationTheoretic Concepts for the Analysis of Complex Networks.
Applied Artificial Intelligence 2008, 22(7):684706. Publisher Full Text

Dehmer M, EmmertStreib F: The Structural Information Content of Chemical Networks.
Zeitschrift für Naturforschung A 2008, 63:155158. PubMed Abstract

Harris M, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al.: The Gene Ontology (GO) database and informatics resource.

Ambroise C, McLachlan G: Selection Bias in Gene Extraction on the Basis of Microarray GeneExpression Data.
Proceedings of the National Academy of Sciences of the United States of America 2002, 99(10):6562. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Varma S, Simon R: Bias in Error Estimation When Using CrossValidation for Model Selection.
BMC Bioinformatics 2006, 7:91. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Dehmer M, EmmertStreib F, Tsoy Y, Varmuza K: Quantifying Structural Complexity of Graphs: Information Measures in Mathematical Chemistry. In Quantum Frontiers of Atoms and Molecules.. Edited by Putz M. Nova Publishing; 2011:479498.

Eppstein D: Subgraph Isomorphism in Planar Graphs and Related Problems.
Journal of Graph Algorithms and Applications 1999, 3(3):127.

Hosoya H: Topological index. A newly proposed quantity characterizing the topological nature of structural isomers of saturated hydrocarbons.
Bulletin of the Chemical Society of Japan 1971, 44(9):23322339. Publisher Full Text

EllisMonaghan J, Merino C: Graph polynomials and their applications I: The Tutte polynomial.

Balaban AT, Ivanciuc O: Historical Development of Topological Indices. In Topological Indices and Related Descriptors in QSAR and QSPAR. Edited by Devillers J, Balaban AT. Gordon and Breach Science Publishers [Amsterdam, The Netherlands]; 1999:2157.

Balaban AT: Highly Discriminating Distancebased Topological Index.
Chemical Physics Letters 1982, 89:399404. Publisher Full Text

Doyle JK, Garver JE: Mean Distance in a Graph.
Discrete Mathematics 1977, 17:147154. Publisher Full Text

Schultz HP, Schultz EB, Schultz TP: Topological organic chemistry. 4. Graph theory, matrix permanents, and topological indices of alkanes.
Journal of Chemical Information and Computer Sciences 1992, 32:6972. Publisher Full Text

Li X, Gutman I: Mathematical Aspects of RandićType Molecular Structure Descriptors. Mathematical Chemistry Monographs, University of Kragujevac and Faculty of Science Kragujevac; 2006.

Bertz SH: The First General Index of Molecular Complexity.
Journal of the American Chemical Society 1981, 103:32413243. Publisher Full Text

Raychaudhury C, Ray SK, Ghosh JJ, Roy AB, Basak SC: Discrimination of Isomeric Structures Using Information Theoretic Topological Indices.
Journal of Computational Chemistry 1984, 5:581588. Publisher Full Text